# Gamatrain LLM Fine-Tuning (Complete)

This notebook fine-tunes Qwen2-1.5B on Gamatrain's educational content and exports to GGUF for Ollama.

**Steps:**
1. Install dependencies
2. Load dataset
3. Load model with 4-bit quantization
4. Configure LoRA
5. Train
6. **MERGE adapters with base model** ⚠️
7. **Convert to GGUF** ⚠️
8. Download for Ollama

In [None]:
# Check GPU
!nvidia-smi

## Step 1: Install Dependencies

In [None]:
%%capture
!pip install -q -U \
    transformers \
    datasets \
    peft \
    bitsandbytes \
    trl \
    accelerate \
    scipy

print("✅ Dependencies installed!")

In [None]:
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments
)
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
from trl import SFTTrainer
from datasets import Dataset
import json
import os

print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## Step 2: Load Dataset

⚠️ **Upload `gamatrain_final_dataset.jsonl` to Colab first!**

In [None]:
# Upload dataset file first!
from google.colab import files
uploaded = files.upload()  # Upload gamatrain_final_dataset.jsonl

In [None]:
dataset_file = "gamatrain_final_dataset.jsonl"
data = []

with open(dataset_file, 'r', encoding='utf-8') as f:
    for line in f:
        if line.strip():
            data.append(json.loads(line))

print(f"Loaded {len(data)} samples")

# Format for ChatML
def format_chat_template(row):
    messages = row.get('messages', [])
    formatted = ""
    for msg in messages:
        role = msg['role']
        content = msg['content']
        formatted += f"<|im_start|>{role}\n{content}<|im_end|>\n"
    return formatted

formatted_data = [{'text': format_chat_template(item)} for item in data]
dataset = Dataset.from_list(formatted_data)

print(f"Dataset ready: {len(dataset)} samples")

## Step 3: Load Model

In [None]:
model_name = "Qwen/Qwen2-1.5B-Instruct"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

print("Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("✅ Model loaded!")

## Step 4: Configure LoRA

In [None]:
model = prepare_model_for_kbit_training(model)

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
print("✅ LoRA configured!")

## Step 5: Train

In [None]:
training_args = TrainingArguments(
    output_dir="./qwen2-gamatrain-lora",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    logging_steps=10,
    save_strategy="epoch",
    warmup_steps=50,
    lr_scheduler_type="cosine",
    optim="paged_adamw_8bit",
    report_to="none",
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    processing_class=tokenizer,
    args=training_args,
    dataset_text_field="text",
    max_seq_length=512,
)

print("Starting training...")
trainer.train()
print("✅ Training complete!")

In [None]:
# Save LoRA adapters
adapter_dir = "./qwen2-gamatrain-adapters"
trainer.model.save_pretrained(adapter_dir)
tokenizer.save_pretrained(adapter_dir)
print(f"✅ Adapters saved to {adapter_dir}")

## ⚠️ Step 6: MERGE Adapters (CRITICAL!)

**This step is REQUIRED!** Without merging, the GGUF will be the original Qwen2, not your fine-tuned model.

In [None]:
# Clear memory
del model
del trainer
import gc
gc.collect()
torch.cuda.empty_cache()

print("Memory cleared!")

In [None]:
# Load base model WITHOUT quantization for merging
print("Loading base model for merging...")
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2-1.5B-Instruct",
    device_map="auto",
    trust_remote_code=True,
    torch_dtype=torch.float16
)

# Load and merge adapters
print("Merging adapters...")
merged_model = PeftModel.from_pretrained(base_model, adapter_dir)
merged_model = merged_model.merge_and_unload()

# Save merged model
merged_dir = "./qwen2-gamatrain-merged"
merged_model.save_pretrained(merged_dir)
tokenizer.save_pretrained(merged_dir)

print(f"✅ Merged model saved to {merged_dir}")

## ⚠️ Step 7: Convert to GGUF (CRITICAL!)

Convert the **merged** model to GGUF format for Ollama.

In [None]:
# Clone llama.cpp for conversion
!git clone https://github.com/ggerganov/llama.cpp
!pip install -r llama.cpp/requirements.txt

In [None]:
# Convert merged model to GGUF
!python llama.cpp/convert_hf_to_gguf.py ./qwen2-gamatrain-merged \
    --outfile qwen2-gamatrain.gguf \
    --outtype q4_k_m

print("✅ GGUF conversion complete!")
!ls -lh qwen2-gamatrain.gguf

## Step 8: Download GGUF File

In [None]:
# Download the GGUF file
from google.colab import files
files.download('qwen2-gamatrain.gguf')

## ✅ Done!

Now you can:
1. Upload `qwen2-gamatrain.gguf` to Hugging Face
2. Use with Ollama: `ollama run hf.co/YOUR_USERNAME/qwen2-gamatrain`

The model should now respond as **Gamatrain AI**, not Alibaba's Qwen!