# SFT Training Pipeline
In addition to the HuggingFace/Alignment Handbook, the following "tools" were used:
- **Unsloth** for faster training with less memory
- **QLoRA** for parameter-efficient fine-tuning
- **Optuna** for hyperparameter optimization
- **WandB** for experiment tracking and visualization

## Pipeline Structure
1. Setup & Configuration
2. Load Model & Tokenizer (Unsloth)
3. Prepare Dataset
4. Train Model
5. Hyperparameter Search
6. Save & Test Model

## Set Up & Import
Clone GitHub Repo for it to be run on GoogleColab

In [9]:
!git clone https://github.com/Ally-Ha/pilot_act-cai_model0_SFT.git
%cd pilot_act-cai_model0_SFT
!pip install -r requirements.txt


Cloning into 'pilot_act-cai_model0_SFT'...
remote: Enumerating objects: 23, done.[K
remote: Counting objects: 100% (23/23), done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 23 (delta 4), reused 13 (delta 0), pack-reused 0 (from 0)[K
Receiving objects: 100% (23/23), 20.35 KiB | 6.78 MiB/s, done.
Resolving deltas: 100% (4/4), done.
/content/pilot_act-cai_model0_SFT/pilot_act-cai_model0_SFT


In [10]:
import logging
import os
import sys
import torch

# Configure logging
logging.basicConfig(
    format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
    level=logging.INFO,
    handlers=[logging.StreamHandler(sys.stdout)],
)
logger = logging.getLogger(__name__)

from src import (
    SFTScriptConfig,
    get_model_and_tokenizer,
    apply_peft,
    load_and_split_dataset,
    prepare_dataset,
    create_training_args,
    create_trainer,
    train,
    run_hpo,
    prepare_for_inference,
)

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

PyTorch version: 2.10.0+cu128
CUDA available: True


## 1. Configuration
The config follows alignment-handbook's structure with sections for: model, lora, data, training.

In [11]:
CONFIG_PATH = "recipes/SFT/config_pilot.yaml"
config = SFTScriptConfig.from_yaml(CONFIG_PATH)

# Display configuration
print("Model Config")
print(f"  Model: {config.model.model_name_or_path}")
print(f"  Max seq length: {config.model.max_seq_length}")
print(f"  Load in 4-bit: {config.model.load_in_4bit}")

print("\nLoRA Config")
print(f"  Rank (r): {config.lora.r}")
print(f"  Alpha: {config.lora.lora_alpha}")
print(f"  Dropout: {config.lora.lora_dropout}")

print("\nData Config")
print(f"  Dataset: {config.data.dataset_id}")
print(f"  Test split size: {config.data.test_split_size}")

print("\nTraining Config")
print(f"  Output dir: {config.training.output_dir}")
print(f"  Learning rate: {config.training.learning_rate}")
print(f"  Batch size: {config.training.per_device_train_batch_size}")
print(f"  Epochs: {config.training.num_train_epochs}")

Model Config
  Model: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
  Max seq length: 2048
  Load in 4-bit: True

LoRA Config
  Rank (r): 16
  Alpha: 32
  Dropout: 0.05

Data Config
  Dataset: ShenLab/MentalChat16K
  Test split size: 1000

Training Config
  Output dir: data/llama-3.1-8b-instruct-sft-pilot
  Learning rate: 2e-05
  Batch size: 4
  Epochs: 1


## 2. Initialize WandB for Experiment Tracking

In [12]:
import wandb
wandb.login()

# Initialize run
wandb.init(
    entity="alha8035-stockholm-university",
    project="pilot_model0_sft",
    config=config.to_dict(),
    tags=["sft", "qlora", "unsloth"],
)



## 3. Load Model & Tokenizer (with Unsloth)

In [13]:
model, tokenizer = get_model_and_tokenizer(
    model_name=config.model.model_name_or_path,
    max_seq_length=config.model.max_seq_length,
    load_in_4bit=config.model.load_in_4bit,
)

print(f"Model loaded: {config.model.model_name_or_path}")
print(f"Model dtype: {model.dtype}")
print(f"Tokenizer vocab size: {len(tokenizer)}")

==((====))==  Unsloth 2026.1.4: Fast Llama patching. Transformers: 4.57.6.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.34. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


tokenizer_config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Model loaded: unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit
Model dtype: torch.float16
Tokenizer vocab size: 128256


In [14]:
#Apply PEFT/LoRA using Unsloth
model = apply_peft(
    model,
    r=config.lora.r,
    lora_alpha=config.lora.lora_alpha,
    lora_dropout=config.lora.lora_dropout,
    target_modules=config.lora.target_modules,
    bias=config.lora.bias,
    use_gradient_checkpointing=config.lora.use_gradient_checkpointing,
    random_state=config.lora.random_state,
)

# Print trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
print(f"Trainable parameters: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)")

Trainable parameters: 41,943,040 (0.92%)


## 4. Prepare Dataset

Load and preprocess the dataset following alignment-handbook's data pipeline.

In [16]:
dataset = load_and_split_dataset(
    dataset_id=config.data.dataset_id,
    dataset_config=config.data.dataset_config,
    dataset_split=config.data.dataset_split,
    test_split_size=config.data.test_split_size,
    seed=config.data.seed,
)

#pilot testing, small subset
TRAIN_SUBSET = 1000
TEST_SUBSET = 200

dataset["train"] = dataset["train"].select(range(min(TRAIN_SUBSET, len(dataset["train"]))))
if "test" in dataset:
    dataset["test"] = dataset["test"].select(range(min(TEST_SUBSET, len(dataset["test"]))))

# Prepare dataset (format to messages, apply chat template)
dataset = prepare_dataset(dataset, tokenizer, num_proc=config.data.num_proc)

print(f"Train samples: {len(dataset['train'])}")
print(f"Test samples: {len(dataset.get('test', []))}")


Map (num_proc=4):   0%|          | 0/1000 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/200 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/1000 [00:00<?, ? examples/s]

Map (num_proc=4):   0%|          | 0/200 [00:00<?, ? examples/s]

Train samples: 1000
Test samples: 200


## 5. Train Model

Create trainer and run training following alignment-handbook's training loop.

In [18]:
# adaption for colab
config.training.bf16 = False
config.training.fp16 = True

training_args = create_training_args(
    output_dir=config.training.output_dir,
    learning_rate=config.training.learning_rate,
    per_device_train_batch_size=config.training.per_device_train_batch_size,
    gradient_accumulation_steps=config.training.gradient_accumulation_steps,
    num_train_epochs=config.training.num_train_epochs,
    max_seq_length=config.model.max_seq_length,
    eval_strategy=config.training.eval_strategy,
    eval_steps=config.training.eval_steps,
    save_steps=config.training.save_steps,
    logging_steps=config.training.logging_steps,
    warmup_ratio=config.training.warmup_ratio,
    weight_decay=config.training.weight_decay,
    lr_scheduler_type=config.training.lr_scheduler_type,
    optim=config.training.optim,
    bf16=config.training.bf16,
    gradient_checkpointing=config.training.gradient_checkpointing,
    save_total_limit=config.training.save_total_limit,
    seed=config.training.seed,
    report_to=config.training.report_to,
)

trainer = create_trainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset["train"],
    eval_dataset=dataset.get("test"),
    training_args=training_args,
)

print("Trainer created successfully!")

Unsloth: Tokenizing ["text"] (num_proc=4):   0%|          | 0/1000 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=4):   0%|          | 0/200 [00:00<?, ? examples/s]

ü¶• Unsloth: Padding-free auto-enabled, enabling faster training.
Trainer created successfully!


In [19]:
# Run Training
print("Starting training...")
train_result = trainer.train()

# Log final metrics
print(f"\nTraining Complete")
print(f"Final train loss: {train_result.training_loss:.4f}")

# Evaluate if test set exists
if dataset.get("test") is not None:
    eval_metrics = trainer.evaluate()
    print(f"Eval loss: {eval_metrics['eval_loss']:.4f}")

The model is already on multiple devices. Skipping the move to device specified in `args`.


Starting training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,000 | Num Epochs = 1 | Total steps = 63
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 4 x 1) = 16
 "-____-"     Trainable parameters = 41,943,040 of 8,072,204,288 (0.52% trained)


Epoch,Training Loss,Validation Loss
1,1.0882,1.088706


Unsloth: Not an error, but LlamaForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


0,1
eval/loss,‚ñÅ
eval/runtime,‚ñÅ
eval/samples_per_second,‚ñÅ
eval/steps_per_second,‚ñÅ
train/epoch,‚ñÅ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÑ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñá‚ñà‚ñà‚ñà
train/global_step,‚ñÅ‚ñÇ‚ñÇ‚ñÉ‚ñÉ‚ñÑ‚ñÖ‚ñÖ‚ñÜ‚ñÜ‚ñá‚ñà‚ñà‚ñà
train/grad_norm,‚ñà‚ñÉ‚ñÉ‚ñÑ‚ñÇ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÅ‚ñÅ
train/learning_rate,‚ñÖ‚ñà‚ñà‚ñá‚ñá‚ñÜ‚ñÖ‚ñÑ‚ñÉ‚ñÇ‚ñÅ‚ñÅ
train/loss,‚ñà‚ñá‚ñÜ‚ñÖ‚ñÉ‚ñÉ‚ñÇ‚ñÇ‚ñÅ‚ñÅ‚ñÅ‚ñÅ

0,1
eval/loss,1.08871
eval/runtime,109.0594
eval/samples_per_second,1.834
eval/steps_per_second,0.458
total_flos,2.527556181017395e+16
train/epoch,1
train/global_step,63
train/grad_norm,0.40256
train/learning_rate,0.0
train/loss,1.0882



Training Complete
Final train loss: 1.3145


Error: You must call wandb.init() before wandb.log()

## 6. Save Model

In [20]:
# Save Model and Tokenizer
OUTPUT_DIR = config.training.output_dir

trainer.save_model(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

print(f"Model saved to {OUTPUT_DIR}")

# Finish WandB run
wandb.finish()

Model saved to data/llama-3.1-8b-instruct-sft-pilot


## 8. Quick Inference Test

Test the trained model with a sample prompt.

In [33]:
from unsloth import FastLanguageModel

FastLanguageModel.for_inference(model)

# Test prompt
test_input = "I've been feeling really anxious lately about my job. I keep thinking I'm going to get fired even though there's no evidence of that."
system_prompt = "You are a helpful mental health counselling assistant, please answer the mental health questions based on the patient's description.  The assistant gives helpful, comprehensive, and appropriate answers to the user's questions."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": test_input}
]

# Apply chat template
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Generate response
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
    )

response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(f"\nUser Input: {test_input}")
print(f"\nModel Response: {response}")


User Input: I've been feeling really anxious lately about my job. I keep thinking I'm going to get fired even though there's no evidence of that.

Model Response: It sounds like you're experiencing a lot of uncertainty and worry about your job security. This can be a really stressful and overwhelming feeling, especially when there's no concrete reason to believe that you're in danger of being fired. It's normal to have some level of concern about your job, but when it starts to interfere with your daily life and causes you significant distress, it may be worth exploring further.

One possible explanation for your anxiety could be the fear of the unknown. When we're uncertain about something, our minds can start to create worst-case scenarios, making us feel more anxious. In this case, your mind might be creating scenarios where you get fired, which can lead to feelings of panic and anxiety.

Another possibility is that you might be experiencing some underlying issues related to your j