# Gamatrain LLM Fine-Tuning Demo

This notebook demonstrates how to fine-tune Qwen2-1.5B on Gamatrain's educational content using QLoRA.

**Hardware Requirements:**
- Google Colab Free (T4 GPU) - ✅ Sufficient
- RAM: ~12GB
- Time: ~30-60 minutes for 1000 samples

**What this does:**
1. Installs required libraries
2. Loads Qwen2-1.5B model with 4-bit quantization
3. Prepares sample educational dataset
4. Fine-tunes using QLoRA
5. Tests the fine-tuned model
6. Saves adapters for deployment

In [None]:
# Check GPU availability
!nvidia-smi

## Step 1: Install Dependencies

In [None]:
%%capture
# Install required packages
!pip install -q -U \
    transformers \
    datasets \
    peft \
    bitsandbytes \
    trl \
    accelerate \
    scipy

print("✅ Dependencies installed successfully!")

In [None]:
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline
)
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
from trl import SFTTrainer
from datasets import Dataset
import json

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## Step 2: Prepare Dataset

Load the `gamatrain_finetune_data.jsonl` file generated by the extraction script.

In [None]:
# Load Gamatrain dataset
import json
import os

dataset_file = "gamatrain_finetune_data.jsonl"
data = []

if os.path.exists(dataset_file):
    print(f"Loading data from {dataset_file}...")
    with open(dataset_file, 'r', encoding='utf-8') as f:
        for line in f:
            if line.strip():
                data.append(json.loads(line))
    print(f"Loaded {len(data)} samples from file.")
else:
    print("⚠️ Dataset file not found. Using sample data for demonstration.")
    # Sample educational dataset mimicking Gamatrain content
    data = [
        {
            "messages": [
                {"role": "system", "content": "You are Gamatrain AI, an intelligent educational assistant."},
                {"role": "user", "content": "Explain the fundamentals of machine learning."},
                {"role": "assistant", "content": "Machine learning is a subset of AI..."}
            ]
        },
        # Add more samples if needed
    ]

# Format for Qwen/ChatML training
def format_chat_template(row):
    # Convert messages to Qwen chat format
    # This is a simplified version, ideally use tokenizer.apply_chat_template
    messages = row.get('messages', [])
    formatted = ""
    for msg in messages:
        role = msg['role']
        content = msg['content']
        formatted += f"<|im_start|>{role}\n{content}<|im_end|>\n"
    formatted += "<|im_start|>assistant\n"
    return formatted

# Create dataset
formatted_data = [{'text': format_chat_template(item)} for item in data]
dataset = Dataset.from_list(formatted_data)

print(f"Dataset size: {len(dataset)} samples")
print("\nSample formatted text:")
print(dataset[0]['text'])

## Step 3: Load Model with 4-bit Quantization

In [None]:
# Model configuration
model_name = "Qwen/Qwen2-1.5B-Instruct"

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Load model
print("Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("✅ Model and tokenizer loaded!")
print(f"Model size: {model.get_memory_footprint() / 1e9:.2f} GB")

## Step 4: Configure LoRA

In [None]:
# Prepare model for training
model = prepare_model_for_kbit_training(model)

# LoRA configuration
lora_config = LoraConfig(
    r=16,  # Rank
    lora_alpha=32,  # Scaling factor
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # Which layers to adapt
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

print("\n✅ LoRA configured!")

## Step 5: Training Configuration

In [None]:
# Training arguments
training_args = TrainingArguments(
    output_dir="./qwen2-gamatrain-lora",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_strategy="epoch",
    warmup_steps=50,
    lr_scheduler_type="cosine",
    optim="paged_adamw_8bit",
    report_to="none",
)

# Initialize trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    tokenizer=tokenizer,
    args=training_args,
    dataset_text_field="text",
    max_seq_length=512,
)

print("✅ Trainer configured!")

## Step 6: Fine-Tune the Model

In [None]:
# Start training
print("Starting training...")
print("This will take approximately 10-30 minutes depending on dataset size.\n")

trainer.train()

print("\n✅ Training complete!")

## Step 7: Save the Fine-Tuned Model

In [None]:
# Save LoRA adapters
output_dir = "./qwen2-gamatrain-final"
trainer.model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"✅ Model saved to {output_dir}")
print("\nAdapter size:")
!du -sh {output_dir}

## Step 8: Test the Fine-Tuned Model

In [None]:
# Test inference
def generate_response(instruction, max_length=200):
    prompt = f"""<|im_start|>user
{instruction}<|im_end|>
<|im_start|>assistant
"""
    
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=max_length,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract just the response part
    if "<|im_start|>assistant" in response:
        response = response.split("<|im_start|>assistant")[-1].strip()
    return response

# Test questions
test_questions = [
    "What is deep learning?",
    "Explain the concept of gradient descent.",
    "What are the applications of machine learning in healthcare?"
]

print("Testing fine-tuned model:\n")
for question in test_questions:
    print(f"Q: {question}")
    print(f"A: {generate_response(question)}")
    print("-" * 80 + "\n")

## Step 9: Export for Production

The adapters can now be:
1. Merged with the base model for deployment
2. Used with PEFT for efficient serving
3. Converted to GGUF for llama.cpp/Ollama

In [None]:
# Optional: Merge adapters with base model for standalone deployment
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.float16
)

# Merge with adapters
merged_model = PeftModel.from_pretrained(base_model, output_dir)
merged_model = merged_model.merge_and_unload()

# Save merged model
merged_output_dir = "./qwen2-gamatrain-merged"
merged_model.save_pretrained(merged_output_dir)
tokenizer.save_pretrained(merged_output_dir)

print(f"✅ Merged model saved to {merged_output_dir}")
print("\nThis model can now be deployed to your VPS!")

## Step 10: Upload to Hugging Face Hub (Optional)

You can upload your fine-tuned model to Hugging Face for easy deployment.

In [None]:
# Uncomment and run if you want to upload to HF Hub
# from huggingface_hub import login
# 
# # Login to HF
# login()
# 
# # Push to hub
# model.push_to_hub("your-username/qwen2-gamatrain")
# tokenizer.push_to_hub("your-username/qwen2-gamatrain")
# 
# print("✅ Model uploaded to Hugging Face Hub!")

## Next Steps

1. **Download the model:** Download the adapters or merged model from Colab
2. **Deploy to VPS:** Upload to your VPS and serve with Ollama/llama.cpp/vLLM
3. **Integrate with Nuxt:** Use the API integration code from the research document
4. **Monitor performance:** Track response quality and iterate
5. **Expand dataset:** Add more Gamatrain content and retrain periodically

---

## Tips for Better Results

- **Dataset size:** Use at least 1,000+ high-quality examples
- **Data quality:** Clean, accurate, and diverse examples are crucial
- **Hyperparameters:** Experiment with learning rate, epochs, and LoRA rank
- **Evaluation:** Test on held-out validation set to avoid overfitting
- **Iterative improvement:** Collect user feedback and retrain regularly