# Milestone 2: LLM Fine-tuning with LoRA

This notebook demonstrates:
1. Setting up efficient fine-tuning with QLoRA (4-bit quantization)
2. Configuring LoRA parameters
3. Fine-tuning the Qwen model on the Alpaca subset
4. Evaluating the fine-tuned model
5. Saving adapter weights

## 1. Environment Setup and Library Installation

In [31]:
# Install required libraries for efficient fine-tuning
!pip install -q transformers datasets torch accelerate
!pip install -q peft bitsandbytes  # For LoRA and quantization
!pip install -q trl  # For training utilities
!pip install -q scipy  # Additional dependency

## 2. Import Libraries

In [32]:
import torch
from datasets import load_from_disk
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from peft import (
    LoraConfig,
    get_peft_model,
    prepare_model_for_kbit_training,
    PeftModel
)
import json
from datetime import datetime

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.9.0+cu126
CUDA available: True
CUDA device: NVIDIA A100-SXM4-40GB


## 3. Load Prepared Dataset

In [33]:
# Load the formatted dataset from Milestone 1
print("Loading formatted dataset...")
dataset = load_from_disk('formatted_subset_data')
print(f"✓ Loaded {len(dataset)} examples")
print(f"Dataset columns: {dataset.column_names}")

# Display a sample
print("\nSample formatted text:")
print("=" * 80)
print(dataset[0]['text'][:500] + "...")
print("=" * 80)

Loading formatted dataset...
✓ Loaded 100 examples
Dataset columns: ['output', 'input', 'instruction', 'text']

Sample formatted text:
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Give three tips for staying healthy.

### Response:
1. Eat a balanced and nutritious diet: Make sure your meals are inclusive of a variety of fruits and vegetables, lean protein, whole grains, and healthy fats. This helps to provide your body with the essential nutrients to function at its best and can help prevent chronic diseases.

2. Engage in regular physical activity: ...


## 4. Load Model

In [None]:
model_name = "Qwen/Qwen3-0.6B-Base"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print(f"✓ Tokenizer loaded")

# Load model
device_map = "auto" if torch.cuda.is_available() else None

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map=device_map,
    trust_remote_code=True,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
)

print(f"✓ Model loaded successfully")
print(f"  Device map: {device_map}")

## 5. LoRA Configuration

Set up LoRA (Low-Rank Adaptation) parameters for efficient fine-tuning.

In [None]:
# Configure LoRA
lora_config = LoraConfig(
    r=32,                          # LoRA rank - controls the dimensionality of the low-rank matrices
    lora_alpha=64,                 # LoRA scaling factor (typically 2*r)
    target_modules=[               # Target attention modules for Qwen
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_dropout=0.2,              # Dropout for LoRA layers
    bias="none",                   # Don't train bias parameters
    task_type="CAUSAL_LM",         # Task type: Causal Language Modeling
)

print("\nLoRA Configuration:")
print(f"  Rank (r): {lora_config.r}")
print(f"  Alpha: {lora_config.lora_alpha}")
print(f"  Target modules: {lora_config.target_modules}")
print(f"  Dropout: {lora_config.lora_dropout}")
print(f"  Task type: {lora_config.task_type}")


LoRA Configuration:
  Rank (r): 32
  Alpha: 64
  Target modules: {'o_proj', 'up_proj', 'q_proj', 'gate_proj', 'v_proj', 'k_proj', 'down_proj'}
  Dropout: 0.2
  Task type: CAUSAL_LM


## 6. Apply LoRA to Model

In [37]:
# Apply LoRA configuration to model
model = get_peft_model(model, lora_config)

# Print trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())

print("\nModel Parameter Statistics:")
print(f"  Total parameters: {total_params:,} ({total_params/1e6:.2f}M)")
print(f"  Trainable parameters: {trainable_params:,} ({trainable_params/1e6:.2f}M)")
print(f"  Trainable %: {100 * trainable_params / total_params:.2f}%")
print("\n✓ LoRA adapters applied to model")


Model Parameter Statistics:
  Total parameters: 616,235,008 (616.24M)
  Trainable parameters: 20,185,088 (20.19M)
  Trainable %: 3.28%

✓ LoRA adapters applied to model


## 7. Prepare Dataset for Training

In [38]:
def tokenize_function(examples):
    """
    Tokenize the text examples.
    """
    return tokenizer(
        examples['text'],
        truncation=True,
        max_length=512,
        padding="max_length",
    )

# Tokenize the dataset
print("Tokenizing dataset...")
tokenized_dataset = dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=dataset.column_names,
    desc="Tokenizing"
)

print(f"✓ Dataset tokenized")
print(f"  Number of examples: {len(tokenized_dataset)}")
print(f"  Features: {tokenized_dataset.features}")

Tokenizing dataset...
✓ Dataset tokenized
  Number of examples: 100
  Features: {'input_ids': List(Value('int32')), 'attention_mask': List(Value('int8'))}


## 8. Training Configuration

In [39]:
# Define training arguments
output_dir = "./results"

training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=1,
    per_device_train_batch_size=4,         # Batch size per device
    gradient_accumulation_steps=4,         # Accumulate gradients over 4 steps
    learning_rate=2e-5,                    # Learning rate
    fp16=torch.cuda.is_available(),        # Use mixed precision if CUDA available
    logging_steps=5,                       # Log every 5 steps
    save_strategy="epoch",                 # Save checkpoint at end of epoch
    optim="paged_adamw_8bit" if torch.cuda.is_available() else "adamw_torch",
    warmup_steps=10,                       # Warmup steps
    report_to="none",                      # Don't report to any service
    logging_dir="./logs",
)

print("Training Configuration:")
print(f"  Epochs: {training_args.num_train_epochs}")
print(f"  Batch size: {training_args.per_device_train_batch_size}")
print(f"  Gradient accumulation steps: {training_args.gradient_accumulation_steps}")
print(f"  Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"  Learning rate: {training_args.learning_rate}")
print(f"  FP16: {training_args.fp16}")
print(f"  Optimizer: {training_args.optim}")

Training Configuration:
  Epochs: 1
  Batch size: 4
  Gradient accumulation steps: 4
  Effective batch size: 16
  Learning rate: 2e-05
  FP16: True
  Optimizer: OptimizerNames.PAGED_ADAMW_8BIT


## 9. Initialize Trainer

In [40]:
# Create data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # We're doing causal language modeling, not masked
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    data_collator=data_collator,
)

print("✓ Trainer initialized")

The model is already on multiple devices. Skipping the move to device specified in `args`.


✓ Trainer initialized


## 10. Fine-tuning Process

Train the model for 1 epoch on the 100-example subset.

In [41]:
print("\n" + "="*80)
print("Starting fine-tuning...")
print("="*80)

start_time = datetime.now()

# Train the model
train_result = trainer.train()

end_time = datetime.now()
training_duration = (end_time - start_time).total_seconds()

print("\n" + "="*80)
print("✓ Fine-tuning completed!")
print("="*80)
print(f"\nTraining Summary:")
print(f"  Duration: {training_duration:.2f} seconds ({training_duration/60:.2f} minutes)")
print(f"  Final training loss: {train_result.training_loss:.4f}")
print(f"  Training samples: {len(tokenized_dataset)}")


Starting fine-tuning...


Step,Training Loss
5,1.4605



✓ Fine-tuning completed!

Training Summary:
  Duration: 9.72 seconds (0.16 minutes)
  Final training loss: 1.5358
  Training samples: 100


## 11. Save LoRA Adapter Weights

In [42]:
# Save the LoRA adapters
adapter_path = "./adapters"
model.save_pretrained(adapter_path)
tokenizer.save_pretrained(adapter_path)

print(f"✓ LoRA adapter weights saved to: {adapter_path}")
print(f"  Files saved:")
import os
for file in os.listdir(adapter_path):
    print(f"    - {file}")

✓ LoRA adapter weights saved to: ./adapters
  Files saved:
    - adapter_model.safetensors
    - special_tokens_map.json
    - added_tokens.json
    - adapter_config.json
    - tokenizer_config.json
    - merges.txt
    - chat_template.jinja
    - README.md
    - vocab.json
    - tokenizer.json


## 12. Save Training Logs

In [43]:
# Create training log
training_log = {
    "model_name": model_name,
    "training_duration_seconds": training_duration,
    "final_training_loss": train_result.training_loss,
    "num_epochs": training_args.num_train_epochs,
    "batch_size": training_args.per_device_train_batch_size,
    "gradient_accumulation_steps": training_args.gradient_accumulation_steps,
    "learning_rate": training_args.learning_rate,
    "lora_rank": lora_config.r,
    "lora_alpha": lora_config.lora_alpha,
    "num_training_samples": len(tokenized_dataset),
    "trainable_parameters": trainable_params,
    "total_parameters": total_params,
    "trainable_percentage": 100 * trainable_params / total_params,
    "timestamp": datetime.now().isoformat(),
}

# Save as JSON
with open("training_log.json", "w") as f:
    json.dump(training_log, f, indent=2)

# Save as text
with open("training_log.txt", "w") as f:
    f.write("Training Log\n")
    f.write("="*80 + "\n\n")
    for key, value in training_log.items():
        f.write(f"{key}: {value}\n")

print("✓ Training logs saved to training_log.txt and training_log.json")

✓ Training logs saved to training_log.txt and training_log.json


## 13. Model Evaluation

Test the fine-tuned model with diverse prompts.

In [44]:
# Test prompts (not in training data)
test_prompts = [
    """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Explain what artificial intelligence is in simple terms.

### Response:
""",
    """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Write a haiku about the ocean.

### Response:
""",
    """Below is an instruction that describes a task, possibly with an input, that needs to be completed. Write a response that appropriately completes the request.

### Instruction:
Classify the sentiment of the following text.

### Input:
I love this product! It works perfectly.

### Response:
"""
]

print("\n" + "="*80)
print("Testing Fine-tuned Model")
print("="*80)

device = "cuda" if torch.cuda.is_available() else "cpu"

for i, prompt in enumerate(test_prompts, 1):
    print(f"\n--- Test {i} ---")
    print(f"Prompt:\n{prompt}")

    inputs = tokenizer(prompt, return_tensors="pt").to(device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=100,
            temperature=0.7,
            do_sample=True,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id
        )

    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract only the generated response (after the prompt)
    response = generated_text[len(prompt):]

    print(f"\nGenerated Response:\n{response}")
    print("\n" + "-"*80)


Testing Fine-tuned Model

--- Test 1 ---
Prompt:
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Explain what artificial intelligence is in simple terms.

### Response:


Generated Response:
Artificial Intelligence (AI) is a field of study that focuses on creating machines and programs that can perform tasks that typically require human intelligence, such as learning, problem-solving, and decision-making. In simpler terms, AI involves creating software or systems that can understand, interpret, and respond to information in a way that mimics human intelligence.

For example, AI can be used to:
1. **Learning from data**: AI systems can analyze large amounts of data and learn patterns or trends over time.

--------------------------------------------------------------------------------

--- Test 2 ---
Prompt:
Below is an instruction that describes a task. Write a response that appropriately completes the request