# CAMARA QoD API Fine-tuning with QLoRA

This notebook demonstrates fine-tuning a small language model (Phi-3-Mini) to become an expert assistant for the CAMARA Quality on Demand API.

**Hardware Requirements:** T4 GPU (Google Colab Free Tier compatible)

**Training Method:** QLoRA with Unsloth for 2x faster training

## 1. Setup and Installation

In [None]:
# Install required packages
!pip install -q -U psutil unsloth transformers datasets trl peft accelerate bitsandbytes

In [None]:
# Import libraries
from unsloth import FastLanguageModel
import torch
from datasets import load_dataset
from trl import SFTTrainer
from transformers import TrainingArguments
import json

# Set random seed for reproducibility
torch.manual_seed(42)

## 2. Load Base Model with QLoRA Configuration

In [None]:
# Model configuration
MODEL_NAME = "microsoft/Phi-3-mini-4k-instruct"
MAX_SEQ_LENGTH = 2048
LOAD_IN_4BIT = True  # Enable 4-bit quantization

# Load model with Unsloth optimization
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=MODEL_NAME,
    max_seq_length=MAX_SEQ_LENGTH,
    dtype=None,  # Auto-detect
    load_in_4bit=LOAD_IN_4BIT,
)

print(f"Model loaded: {MODEL_NAME}")
print(f"Tokenizer vocabulary size: {len(tokenizer)}")

## 3. Configure LoRA Adapters

In [None]:
# Apply PEFT (Parameter-Efficient Fine-Tuning) with LoRA
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # LoRA rank
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    use_gradient_checkpointing="unsloth",  # Unsloth optimization
    random_state=42,
)

print("LoRA adapters configured successfully!")

## 4. Load and Format Training Data

In [None]:
# Define the instruction prompt template
alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
    """Format examples into instruction-input-response format"""
    instructions = examples["instruction"]
    inputs = examples["input"]
    outputs = examples["response"]
    texts = []
    
    for instruction, input_text, output in zip(instructions, inputs, outputs):
        text = alpaca_prompt.format(instruction, input_text, output) + EOS_TOKEN
        texts.append(text)
    
    return {"text": texts}

print("Prompt template defined.")

In [None]:
# Load dataset from JSONL file
# Upload sft_dataset.jsonl to Colab first
dataset = load_dataset("json", data_files="sft_dataset.jsonl", split="train")

# Apply formatting
dataset = dataset.map(formatting_prompts_func, batched=True)

print(f"Dataset loaded: {len(dataset)} examples")
print("\nSample formatted example:")
print(dataset[0]["text"][:500] + "...")

## 5. Test Base Model (Before Fine-tuning)

In [None]:
# Test query
test_query = """I'm at a crowded stadium and need better upload for a 4K stream. My phone number is +14155551234 and I'm streaming to server 198.51.100.50 for the next 2 hours."""

# Format test input
test_input = alpaca_prompt.format(
    "You are an expert assistant for the CAMARA Quality on Demand (QoD) API. Convert user requests into valid API calls.",
    test_query,
    ""
)

# Generate response from base model
FastLanguageModel.for_inference(model)
inputs = tokenizer([test_input], return_tensors="pt").to("cuda")

print("=== BASE MODEL OUTPUT (Before Fine-tuning) ===")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
response_before = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response_before.split("### Response:")[-1].strip())

# Save for comparison
response_before_saved = response_before.split("### Response:")[-1].strip()

## 6. Configure Training Arguments

In [None]:
# Training configuration
training_args = TrainingArguments(
    output_dir="./camara_qod_model",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=not torch.cuda.is_bf16_supported(),
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=10,
    save_strategy="epoch",
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=42,
)

print("Training arguments configured.")

## 7. Initialize SFT Trainer

In [None]:
# Create SFT trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=MAX_SEQ_LENGTH,
    args=training_args,
    packing=False,  # Can enable for efficiency if needed
)

print("SFT Trainer initialized!")

## 8. Fine-tune the Model

In [None]:
# Start training
print("Starting fine-tuning...")
trainer_stats = trainer.train()

print("\n=== Training Complete ===")
print(f"Training loss: {trainer_stats.training_loss:.4f}")
print(f"Training time: {trainer_stats.metrics['train_runtime']:.2f} seconds")

## 9. Test Fine-tuned Model

In [None]:
# Test with the same query
FastLanguageModel.for_inference(model)
inputs = tokenizer([test_input], return_tensors="pt").to("cuda")

print("=== FINE-TUNED MODEL OUTPUT (After Training) ===")
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
response_after = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response_after.split("### Response:")[-1].strip())

## 10. Validate JSON Output

In [None]:
# Extract and validate JSON
try:
    response_text = response_after.split("### Response:")[-1].strip()
    json_obj = json.loads(response_text)
    print("✅ Valid JSON output!")
    print("\nParsed structure:")
    print(json.dumps(json_obj, indent=2))
    
    # Check for required CAMARA fields
    required_fields = ["device", "applicationServer", "qosProfile", "duration"]
    missing_fields = [f for f in required_fields if f not in json_obj]
    
    if missing_fields:
        print(f"\n⚠️ Missing required fields: {missing_fields}")
    else:
        print("\n✅ All required CAMARA fields present!")
        
except json.JSONDecodeError as e:
    print(f"❌ Invalid JSON: {e}")
    print("Response:", response_text)

## 11. Additional Test Cases

In [None]:
# Test various scenarios
test_cases = [
    "Need ultra-low latency for VR gaming. Device IP 203.0.113.75, server 192.0.2.200, 3 hours.",
    "Video conference with IPv6 2001:db8::1 to server 2001:db8:1234::1 for 45 minutes.",
    "IoT sensor data upload from phone +12025551111 to cloud 10.0.0.100, 15 minutes.",
]

for i, query in enumerate(test_cases, 1):
    print(f"\n{'='*60}")
    print(f"Test Case {i}: {query}")
    print('='*60)
    
    test_prompt = alpaca_prompt.format(
        "You are an expert assistant for the CAMARA Quality on Demand (QoD) API. Convert user requests into valid API calls.",
        query,
        ""
    )
    
    inputs = tokenizer([test_prompt], return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.3)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    result = response.split("### Response:")[-1].strip()
    print(result)
    
    # Validate JSON
    try:
        json.loads(result)
        print("\n✅ Valid JSON")
    except:
        print("\n❌ Invalid JSON")

## 12. Save Model

In [None]:
# Save fine-tuned model and tokenizer
model.save_pretrained("camara_qod_lora_model")
tokenizer.save_pretrained("camara_qod_lora_model")

print("Model saved to: camara_qod_lora_model/")

# Optional: Save to HuggingFace Hub
# model.push_to_hub("your-username/camara-qod-phi3-lora")
# tokenizer.push_to_hub("your-username/camara-qod-phi3-lora")

## 13. Export Merged Model (Optional)

In [None]:
# Merge LoRA weights into base model for standalone deployment
model.save_pretrained_merged(
    "camara_qod_merged_model",
    tokenizer,
    save_method="merged_16bit",
)

print("Merged model saved to: camara_qod_merged_model/")

## Summary

This notebook demonstrated:
1. ✅ Loading Phi-3-Mini with 4-bit quantization
2. ✅ Configuring QLoRA (r=16, alpha=16, dropout=0.05)
3. ✅ Fine-tuning on 50 CAMARA QoD examples
4. ✅ Validating outputs against API specification
5. ✅ Saving model for deployment

**Next Steps:**
- Run DPO training with preference dataset
- Deploy model to inference API
- Create performance report