### Unsloth Fine-tuning Configuration
This code sets up a fine-tuning pipeline for a language model using the Unsloth library, which optimizes transformer models for faster training and inference. The model will be trained on conversation data to generate responses in different tones.
Configuration Parameters

pythonMODEL_NAME = "unsloth/zephyr-sft-bnb-4bit"

OUTPUT_DIR = "./results_optimized_all_tones_dynamic" 

MAX_SEQ_LENGTH = 1024

LoRA Parameters

Rank (r): 16
Alpha: 16
Dropout: 0
Target Modules: Attention layers and MLP components

Training Parameters

Batch Size: 2 per device

Gradient Accumulation: 4 steps

Learning Rate: 2e-4

Epochs: 3

Warmup Steps: 10

Optimizer: AdamW (8-bit)

Weight Decay: 0.01

Scheduler: Linear

Random Seed: 3407

Response Types
The model will be trained to generate content in four distinct tones:

Mixed (general)
Love
Philosophical
Poetic

Evaluation Strategy

Evaluation every 100 steps ---

Model checkpoint saving every 100 steps ---

Comparison sampling with 5 examples ---

Target comparison tone: poetic ---

In [None]:
import torch
from transformers import DataCollatorForLanguageModeling
from unsloth import FastLanguageModel
from datasets import load_dataset, Dataset, concatenate_datasets
from transformers import TrainingArguments, EvalPrediction
from peft import PeftModel
from trl import SFTTrainer

from unsloth import is_bfloat16_supported
import os
import math
import json
import gc 

MODEL_NAME = "unsloth/zephyr-sft-bnb-4bit"
OUTPUT_DIR = "./results_optimized_all_tones_dynamic" 
DATA_TRAIN_PATH = "E:/papers-i-implement/poet/data/conversation_data.json"
DATA_TEST_PATH = "E:/papers-i-implement/poet/data/conversation_data_test.json"
MAX_SEQ_LENGTH = 1024
LORA_R = 16
LORA_ALPHA = 16
LORA_DROPOUT = 0
LORA_TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
RESPONSE_TYPES = ["mixed", "love", "philosophical", "poetic"]

PER_DEVICE_TRAIN_BATCH_SIZE = 2
PER_DEVICE_EVAL_BATCH_SIZE = 2
GRADIENT_ACCUMULATION_STEPS = 4
LEARNING_RATE = 2e-4
NUM_TRAIN_EPOCHS = 3
WARMUP_STEPS = 10
LOGGING_STEPS = 50
EVAL_STEPS = 100
SAVE_STEPS = 100
OPTIMIZER = "adamw_8bit"
WEIGHT_DECAY = 0.01
LR_SCHEDULER_TYPE = "linear"
SEED = 3407
EVALUATION_STRATEGY = "steps"

NUM_COMPARISON_SAMPLES = 5
COMPARISON_OUTPUT_FILE = "model_comparison_outputs_dynamic.json"
TARGET_COMPARISON_TONE = "poetic"


Please restructure your imports with 'import unsloth' at the top of your file.
  from unsloth import FastLanguageModel # Using Unsloth's class


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.

🦥 Unsloth Zoo will now patch everything to make training faster!


### Model and Tokenizer Initialization
This section handles loading the base model and tokenizer, setting up special tokens, and configuring the processing environment for data preparation.
Processing Configuration

pythonNUM_PROC = 1  # Sequential processing

Model Loading
The code initializes the pre-trained Zephyr model using Unsloth's optimization framework:

Uses 4-bit quantization for memory efficiency

Configures for maximum sequence length of 1024 tokens

Automatically maps model across available devices

Special Token Setup
The code includes safeguards to ensure essential special tokens are properly defined:

EOS Token Check:

Adds <|endoftext|> if no end-of-sequence token exists

Critical for proper sequence termination


PAD Token Configuration:

Falls back to using EOS token for padding if no pad token exists

Updates model configuration to recognize the pad token ID



This initialization process ensures the model is properly configured before fine-tuning begins, with appropriate tokenization boundaries and memory-efficient loading.

In [None]:

# Data Loading Config
NUM_PROC = 1
print(f"Using {NUM_PROC} processes for data loading/processing (forced sequential).")

# --- Load Base Model and Tokenizer ---
print(f"Loading base model: {MODEL_NAME}")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=MAX_SEQ_LENGTH,
    dtype=None, 
    load_in_4bit=True,
    device_map="auto" 
)
print("Base model and tokenizer loaded.")

# --- Set EOS and PAD Tokens ---
if tokenizer.eos_token is None:
    print("Warning: EOS token not found, adding '<|endoftext|>' as EOS token.")
    tokenizer.add_special_tokens({'eos_token': '<|endoftext|>'})
EOS_TOKEN = tokenizer.eos_token

if tokenizer.pad_token is None:
    print("Warning: PAD token not found. Setting PAD token to EOS token.")
    tokenizer.pad_token = tokenizer.eos_token
    if hasattr(model, 'config') and hasattr(model.config, 'pad_token_id'):
        model.config.pad_token_id = tokenizer.pad_token_id
print(f"EOS token: {EOS_TOKEN}, PAD token: {tokenizer.pad_token}")



Using 1 processes for data loading/processing (forced sequential).
Loading base model: unsloth/zephyr-sft-bnb-4bit


  GPU_BUFFERS = tuple([torch.empty(2*256*2048, dtype = dtype, device = f"cuda:{i}") for i in range(n_gpus)])


==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.50.3.
   \\   /|    NVIDIA GeForce RTX 3050 6GB Laptop GPU. Num GPUs = 1. Max memory: 6.0 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.6.0+cu126. CUDA: 8.6. CUDA Toolkit: 12.6. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Base model and tokenizer loaded.
EOS token: </s>, PAD token: <unk>


### Baseline Evaluation Process
This section handles the selection of test samples for model comparison and generates baseline outputs using the pre-fine-tuned model.
Comparison Sample Selection

pythonNUM_COMPARISON_SAMPLES = 5

TARGET_COMPARISON_TONE = "poetic"

The code loads a subset of test data to create a comparison benchmark:

Extracts prompts and reference responses from the test dataset

Specifically targets samples with the "poetic" tone for evaluation

Implements error handling for robust data loading

Baseline Generation
Before fine-tuning begins, the code captures how the base model responds to the selected prompts:

Prompt Formatting:

Structures each prompt with explicit tone instructions
Format: USER: {input text} TONE:poetic ASSISTANT:


Generation Parameters:

Maximum of 256 new tokens per response

Uses sampling with temperature (top-k=50, top-p=0.9)

Properly handles EOS and padding tokens


Memory Management:

Implements explicit tensor cleanup within the generation loop

Forces garbage collection and CUDA cache clearing

Designed to prevent memory issues on limited hardware



This baseline will serve as a comparison point to measure improvement after fine-tuning, specifically focusing on the model's ability to generate poetic responses.

In [None]:

# --- Load Comparison Prompts (CPU) ---
print(f"Loading raw test data to select {NUM_COMPARISON_SAMPLES} comparison samples...")
comparison_prompts_data = []
comparison_prompts = []
try:
    comparison_raw_data = load_dataset("json", data_files=DATA_TEST_PATH, split=f"train[:{NUM_COMPARISON_SAMPLES * 2}]")
    for example in comparison_raw_data:
        if 'conversation' in example and 'input' in example['conversation']:
             comparison_prompts_data.append({
                 "input": example['conversation']['input'],
                 "reference_output": example['conversation'].get('responses', {}).get(TARGET_COMPARISON_TONE, None)
             })
        if len(comparison_prompts_data) >= NUM_COMPARISON_SAMPLES:
            break
    comparison_prompts = [item['input'] for item in comparison_prompts_data]
    print(f"Selected {len(comparison_prompts)} prompts for comparison.")
except Exception as e:
    print(f"Error loading test data for comparison samples: {e}")


# --- Baseline Model Generation ---
# This uses the currently loaded base model
print("\n--- Generating Baseline Outputs (Before Fine-tuning) ---")
baseline_outputs = []
if comparison_prompts:
    model.eval()
    with torch.no_grad():
        for i, prompt_text in enumerate(comparison_prompts):
            formatted_prompt = f"USER: {prompt_text}\n TONE:{TARGET_COMPARISON_TONE} \nASSISTANT: "
            # Use tokenizer directly for potentially better performance with FastTokenizer
            inputs = tokenizer(formatted_prompt, return_tensors="pt", truncation=True, max_length=MAX_SEQ_LENGTH // 2).to(model.device) # Use model's current device

            outputs = model.generate(
                **inputs, max_new_tokens=256, eos_token_id=tokenizer.eos_token_id,
                pad_token_id=tokenizer.pad_token_id, do_sample=True, top_k=50, top_p=0.9
            )
            output_text = tokenizer.decode(outputs[0, inputs['input_ids'].shape[1]:], skip_special_tokens=True)
            baseline_outputs.append(output_text.strip())
            print(f"Baseline Sample {i+1}/{len(comparison_prompts)} Generated.")
            # Clean up tensors explicitly inside loop for safety on low VRAM
            del inputs, outputs
    gc.collect()
    torch.cuda.empty_cache()
else:
    print("Skipping baseline generation as no comparison prompts were loaded.")
print("--- Baseline Generation Complete ---")



Loading raw test data to select 5 comparison samples...
Selected 5 prompts for comparison.

--- Generating Baseline Outputs (Before Fine-tuning) ---
Baseline Sample 1/5 Generated.
Baseline Sample 2/5 Generated.
Baseline Sample 3/5 Generated.
Baseline Sample 4/5 Generated.
Baseline Sample 5/5 Generated.
--- Baseline Generation Complete ---


### Model Architecture and Data Processing
This section configures the model for Parameter-Efficient Fine-Tuning (PEFT) and prepares the training datasets with multi-tone formatting.

LoRA Configuration

pythonLORA_R = 16

LORA_ALPHA = 16

LORA_DROPOUT = 0

LORA_TARGET_MODULES = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]

The code applies Low-Rank Adaptation (LoRA) to the base model:

Targets key projection matrices in both attention and MLP components
Uses Unsloth's optimized gradient checkpointing for memory efficiency
Maintains reproducibility with fixed random seed

Dataset Preparation
The code processes conversation data into instruction-following format across multiple response tones:

Format Structure:

USER: {input}

TONE:{response_type}

ASSISTANT: {response}{eos_token}

Multi-Tone Processing:

Creates separate datasets for each tone (mixed, love, philosophical, poetic)
Filters out examples missing responses for any particular tone
Combines datasets while preserving tone distribution


Memory Management:

Explicitly removes intermediate processing objects
Forces garbage collection to free memory



This approach enables the model to learn different response styles based on the explicit tone instruction, creating a dynamic response capability controlled through the prompt.

In [None]:

# --- Apply PEFT with LoRA ---
# Applies LoRA adapters to the existing 'model' object
print("\nApplying PEFT (LoRA)...")
model = FastLanguageModel.get_peft_model(
    model, 
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    lora_dropout=LORA_DROPOUT,
    target_modules=LORA_TARGET_MODULES,
    bias="none",
    use_gradient_checkpointing="unsloth", 
    random_state=SEED,
)
print("PEFT applied. Model is now PEFT-enabled.")

# --- Load and Preprocess Full Datasets (CPU) ---
print("\nLoading and preprocessing full datasets...")
# --- Define Preprocessing Function ---
def format_prompt(example, response_type="mixed"):
    conv = example.get('conversation')
    if not conv or 'input' not in conv or 'responses' not in conv or response_type not in conv['responses']: return {"text": None}
    user_input = conv['input']
    response = conv['responses'][response_type]
    formatted_text = f"USER: {user_input}\n TONE:{response_type} \nASSISTANT: {response}{tokenizer.eos_token}" 
    return {"text": formatted_text}

# --- Preprocess and Combine Datasets ---
all_train_datasets = []
all_eval_datasets = []
raw_train_data = load_dataset("json", data_files=DATA_TRAIN_PATH, split="train")
raw_test_data = load_dataset("json", data_files=DATA_TEST_PATH, split="train")
for r_type in RESPONSE_TYPES:
    # Process Training Data
    processed_train = raw_train_data.map(lambda x: format_prompt(x, r_type), num_proc=NUM_PROC, remove_columns=list(raw_train_data.features)).filter(lambda x: x['text'] is not None)
    all_train_datasets.append(processed_train)
    # Process Evaluation Data
    processed_eval = raw_test_data.map(lambda x: format_prompt(x, r_type), num_proc=NUM_PROC, remove_columns=list(raw_test_data.features)).filter(lambda x: x['text'] is not None)
    all_eval_datasets.append(processed_eval)
train_dataset = concatenate_datasets(all_train_datasets).shuffle(seed=SEED)
eval_dataset = concatenate_datasets(all_eval_datasets).shuffle(seed=SEED) # Shuffle eval too
print(f"Total Train dataset size: {len(train_dataset)}")
print(f"Total Eval dataset size: {len(eval_dataset)}")
del raw_train_data, raw_test_data, all_train_datasets, all_eval_datasets, processed_train, processed_eval # Clean up raw data objects
gc.collect()




Applying PEFT (LoRA)...


Unsloth 2025.3.19 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


PEFT applied. Model is now PEFT-enabled.

Loading and preprocessing full datasets...
Total Train dataset size: 92008
Total Eval dataset size: 10400


82

### Dataset Sampling and Validation
This section implements a controlled dataset size reduction for faster experimentation and performs data quality validation.
Dataset Size Control

pythonNUM_TRAIN_SAMPLES = 10000

NUM_TEST_SAMPLES = 3000

The code deliberately limits the dataset size for rapid prototyping:

Reduces training set to 10,000 examples
Limits evaluation set to 3,000 examples

Includes safeguards for datasets smaller than the requested size

Data Quality Verification
Critical validation steps ensure dataset integrity before training:

Empty String Filtering:

Removes any examples with null or empty text fields
Reports dataset sizes before and after filtering
Verifies datasets remain non-empty after cleaning


Failure Protection:

Includes explicit program termination if datasets become empty
Suggests troubleshooting the data source or formatting function



This controlled dataset reduction enables faster development cycles while maintaining enough examples to validate the model's ability to generate responses across different tones. The explicit verification steps prevent training failures due to data quality issues.

In [None]:

NUM_TRAIN_SAMPLES = 10000
NUM_TEST_SAMPLES = 3000
print(f"\n---!!! WARNING: Selecting only {NUM_TRAIN_SAMPLES} samples for TRAIN and EVAL for quick testing! !!!---")

if len(train_dataset) >= NUM_TRAIN_SAMPLES:
    # Use .select() to get the first N samples
    train_dataset = train_dataset.select(range(NUM_TRAIN_SAMPLES))
else:
     print(f"Warning: Train dataset has less than {NUM_TRAIN_SAMPLES} samples. Using all {len(train_dataset)} samples.")

if len(eval_dataset) >= NUM_TEST_SAMPLES:
    eval_dataset = eval_dataset.select(range(NUM_TEST_SAMPLES))
else:
     print(f"Warning: Eval dataset has less than {NUM_TEST_SAMPLES} samples. Using all {len(eval_dataset)} samples.")

print(f"Using {len(train_dataset)} train samples for testing.")
print(f"Using {len(eval_dataset)} eval samples for testing.\n")

print(f"Dataset size before filtering empty strings: {len(train_dataset)}")
train_dataset = train_dataset.filter(lambda example: example.get('text') is not None and len(example['text']) > 0)
print(f"Dataset size after filtering empty strings: {len(train_dataset)}")
# Repeat for eval_dataset 
print(f"Eval Dataset size before filtering empty strings: {len(eval_dataset)}")
eval_dataset = eval_dataset.filter(lambda example: example.get('text') is not None and len(example['text']) > 0)
print(f"Eval Dataset size after filtering empty strings: {len(eval_dataset)}")


# Ensure datasets are not empty after filtering
if len(train_dataset) == 0 or len(eval_dataset) == 0:
    print("Error: Dataset became empty after filtering empty strings. Check data source or format_prompt.")
    exit()



Using 10000 train samples for testing.
Using 3000 eval samples for testing.

Dataset size before filtering empty strings: 10000
Dataset size after filtering empty strings: 10000
Eval Dataset size before filtering empty strings: 3000
Eval Dataset size after filtering empty strings: 3000


### Training Configuration
This section sets up the evaluation metrics and training parameters for the supervised fine-tuning process.
Evaluation Metrics
pythondef compute_metrics(p: EvalPrediction):
    # Calculates perplexity from prediction logits and label IDs
The code implements a custom evaluation function that:

Computes cross-entropy loss on shifted prediction sequences
Converts loss to perplexity (exp(loss)) for model quality assessment
Includes explicit tensor cleanup to prevent memory leaks

Training Setup
The training configuration balances efficiency with performance:

Training Parameters:

3 epochs with linear learning rate schedule

8-bit AdamW optimizer with weight decay

Evaluation every 100 steps

Model checkpoint saving based on evaluation performance


Optimization Features:

Gradient checkpointing to reduce memory usage

Keeps only the 10 best checkpoints to manage storage

Loads the best model at end based on lowest perplexity


SFT Trainer Configuration:

Uses text packing to maximize batch efficiency
Enforces sequential data processing for reproducibility
Handles the full train-evaluate-save workflow



This configuration enables efficient fine-tuning while tracking model quality through perplexity measurements, allowing the training process to select the best-performing checkpoint.

In [None]:


def compute_metrics(p: EvalPrediction):
    logits = torch.tensor(p.predictions)
    labels = torch.tensor(p.label_ids)
    shift_logits = logits[..., :-1, :].contiguous()
    shift_labels = labels[..., 1:].contiguous()
    loss_fct = torch.nn.CrossEntropyLoss(ignore_index=-100)
    loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
    perplexity = math.exp(loss.item())
    # Clean up tensors used in calculation
    del logits, labels, shift_logits, shift_labels, loss
    return {"perplexity": perplexity}

print("Defining training arguments...")
training_args = TrainingArguments(
    output_dir=OUTPUT_DIR,
    num_train_epochs=NUM_TRAIN_EPOCHS,
    learning_rate=LEARNING_RATE,
    optim=OPTIMIZER, 
    weight_decay=WEIGHT_DECAY,
    seed=SEED,
    save_strategy=EVALUATION_STRATEGY,
    eval_strategy=EVALUATION_STRATEGY,
    eval_steps=EVAL_STEPS,
    save_steps=SAVE_STEPS,
    load_best_model_at_end=True, 
    greater_is_better=False,
    gradient_checkpointing=True, 
      save_total_limit=10, 
)

print("Initializing SFTTrainer...")
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    dataset_text_field="text",
    max_seq_length=MAX_SEQ_LENGTH,
    dataset_num_proc=1,
    packing=True, 
    args=training_args,
)


print("SFTTrainer initialized.")

Initializing SFTTrainer...
Unsloth: Hugging Face's packing is currently buggy - we're disabling it for now!


Unsloth: Tokenizing ["text"]:   0%|          | 0/3000 [00:00<?, ? examples/s]

Unsloth: Hugging Face's packing is currently buggy - we're disabling it for now!
SFTTrainer initialized.


Model Training Process
This section executes the fine-tuning process and handles model saving, with appropriate error handling and memory management.
Training Execution
pythontrainer_stats = trainer.train()

The code runs the training loop with several important safeguards:

Memory Preparation:

Forces garbage collection before training begins

Clears CUDA cache to maximize available GPU memory

Wraps training in try-except to handle potential failures


Model Preservation:

Saves the final model adapters and tokenizer configuration

Creates a separate directory for the final model state

Reports training statistics upon completion



Best Model Tracking
pythonif training_args.load_best_model_at_end:

    best_model_checkpoint_path = trainer.state.best_model_checkpoint
    
The code implements a checkpoint management strategy:

Records the path to the best-performing checkpoint based on evaluation metrics
Issues a warning if the best model tracking is disabled
Ensures the optimal model can be loaded for inference after training

This training section represents the core computational work of the fine-tuning process, where the model adapters are adjusted to learn the tone-specific response patterns from the training data.

In [None]:

# --- Train the Model ---
print("\n--- Starting Training ---")
best_model_checkpoint_path = None 
try:
    gc.collect()
    torch.cuda.empty_cache() 
    trainer_stats = trainer.train()
    print("--- Training Complete ---")
    print(f"Training Stats: {trainer_stats}")
    print(f"Saving final LoRA adapters to {OUTPUT_DIR}/final_model")
    trainer.save_model(f"{OUTPUT_DIR}/final_model")
    tokenizer.save_pretrained(f"{OUTPUT_DIR}/final_model")
    print("Model adapters and tokenizer saved.")

    if training_args.load_best_model_at_end:
         best_model_checkpoint_path = trainer.state.best_model_checkpoint
         print(f"Best model checkpoint found at: {best_model_checkpoint_path}")
    else:
         print("Warning: load_best_model_at_end=False. Post-training generation will use the *last* model state.")
except Exception as e:
    print(f"An error occurred during training: {e}")



--- Starting Training ---


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 10,000 | Num Epochs = 3 | Total steps = 3,750
O^O/ \_/ \    Batch size per device = 8 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (8 x 1 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/4,000,000,000 (1.05% trained)
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Currently logged in as: ankitdev (ankitdev-chandigarh-university) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin


Step,Training Loss,Validation Loss
100,No log,1.356879
200,No log,1.258892
300,No log,1.244021
400,No log,1.203526
500,1.088400,1.189502
600,1.088400,1.194674
700,1.088400,1.16252
800,1.088400,1.165029
900,1.088400,1.147934
1000,0.965300,1.148425


Unsloth: Will smartly offload gradients to save VRAM!
--- Training Complete ---
Training Stats: TrainOutput(global_step=3750, training_loss=0.6746180623372395, metrics={'train_runtime': 59663.7118, 'train_samples_per_second': 0.503, 'train_steps_per_second': 0.063, 'total_flos': 2.4909770123083776e+17, 'train_loss': 0.6746180623372395})
Saving final LoRA adapters to ./results_optimized_all_tones_dynamic/final_model
Model adapters and tokenizer saved.
Best model checkpoint found at: ./results_optimized_all_tones_dynamic\checkpoint-1200
