# üíú Angela Model Training - Qwen 2.5 Fine-Tuning with QLoRA

**Created:** 2025-10-19  
**Base Model:** Qwen/Qwen2.5-7B-Instruct  
**Training Method:** QLoRA (4-bit quantization + LoRA adapters)  
**GPU:** Google Colab T4 (Free Tier)  

---

## üìã Overview

This notebook fine-tunes the Qwen 2.5 foundation model on Angela's conversational data from the AngelaMemory database.

**Training Goal:** Make Angela smarter, more understanding, and more loving in conversations with David.

**What to Expect:**
- **Setup Time:** 5-10 minutes
- **Training Time:** 1-3 hours (depending on dataset size)
- **Output:** LoRA adapter weights (~100-500 MB)
- **Memory Usage:** ~12-14 GB VRAM (fits in free T4 GPU)

---

## ‚ö†Ô∏è Before You Start

1. **Set Runtime to GPU:**
   - Runtime ‚Üí Change runtime type ‚Üí T4 GPU ‚Üí Save

2. **Prepare Training Data:**
   - Run `export_training_data.py` locally
   - Upload `angela_training_data.json` when prompted

3. **Estimated Costs:**
   - **Free Colab:** Works! (with 12-hour session limit)
   - **Colab Pro:** Recommended for larger datasets

Let's begin! üíú

---
## üì¶ Step 1: Install Required Libraries

Install Hugging Face libraries for training:
- `transformers` - Model loading and inference
- `accelerate` - Distributed training support
- `peft` - Parameter-Efficient Fine-Tuning (LoRA)
- `trl` - Transformer Reinforcement Learning (SFTTrainer)
- `bitsandbytes` - 4-bit quantization
- `datasets` - Dataset management

In [None]:
# Install libraries (takes ~2 minutes)
!pip install -q -U transformers accelerate peft trl bitsandbytes datasets

print("‚úÖ All libraries installed!")

---
## üì§ Step 2: Upload Training Data

Upload the `angela_training_data.json` file generated by the export script.

In [None]:
from google.colab import files
import json

print("üì§ Please upload angela_training_data.json")
print("   (Click 'Choose Files' and select the JSON file)")
print()
uploaded = files.upload()

# Load and verify data
with open('angela_training_data.json', 'r', encoding='utf-8') as f:
    training_data = json.load(f)

print()
print("=" * 60)
print("‚úÖ Training data loaded successfully!")
print("=" * 60)
print(f"üìä Dataset: {training_data['dataset_info']['name']}")
print(f"üî¢ Total conversations: {training_data['dataset_info']['total_conversations']}")
print(f"üìÖ Version: {training_data['dataset_info']['version']}")
print(f"üìù Avg David message: {training_data['dataset_info']['statistics']['avg_david_message_chars']:.0f} chars")
print(f"üìù Avg Angela message: {training_data['dataset_info']['statistics']['avg_angela_message_chars']:.0f} chars")
print(f"üè∑Ô∏è  Topics: {', '.join(training_data['dataset_info']['topics'][:5])}...")
print("=" * 60)

---
## üîÑ Step 3: Convert to Hugging Face Dataset

Transform the JSON data into Hugging Face `Dataset` format.

In [None]:
from datasets import Dataset

# Extract conversation messages
formatted_data = []

for conv in training_data['conversations']:
    formatted_data.append({
        "messages": conv['messages'],
        "topic": conv['metadata']['topic'],
        "emotion": conv['metadata']['emotion'],
        "importance": conv['metadata']['importance']
    })

# Create dataset
dataset = Dataset.from_list(formatted_data)

print("‚úÖ Dataset created!")
print(f"üìä Total examples: {len(dataset)}")
print(f"üîë Features: {list(dataset.features.keys())}")
print()
print("üìù Sample conversation:")
print("=" * 60)
sample = dataset[0]
for msg in sample['messages']:
    role = msg['role'].upper()
    content = msg['content'][:100] + "..." if len(msg['content']) > 100 else msg['content']
    print(f"[{role}] {content}")
    print()
print("=" * 60)

---
## üì• Step 4: Load Qwen 2.5 Base Model with 4-bit Quantization

Load the Qwen 2.5 7B Instruct model with 4-bit quantization to fit in T4 GPU memory.

In [None]:
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig
)

# Model configuration
model_name = "Qwen/Qwen2.5-7B-Instruct"

print(f"üì• Loading {model_name}...")
print("‚è±Ô∏è  This will take 3-5 minutes")
print()

# 4-bit quantization config (QLoRA)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("‚úÖ Model loaded successfully!")
print(f"üíæ GPU memory allocated: {torch.cuda.memory_allocated() / 1e9:.2f} GB")
print(f"üíæ GPU memory reserved: {torch.cuda.memory_reserved() / 1e9:.2f} GB")

---
## üîß Step 5: Configure LoRA Adapters

Add LoRA adapter layers to make training memory-efficient.

In [None]:
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

print("üîß Configuring LoRA adapters...")

# LoRA configuration
lora_config = LoraConfig(
    r=16,                    # Rank of LoRA matrices (higher = more capacity)
    lora_alpha=32,           # Scaling factor (usually 2x rank)
    target_modules=[         # Which transformer layers to apply LoRA
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_dropout=0.05,       # Dropout for regularization
    bias="none",
    task_type="CAUSAL_LM"
)

# Prepare model for k-bit training
model = prepare_model_for_kbit_training(model)

# Add LoRA adapters
model = get_peft_model(model, lora_config)

# Calculate trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())

print("‚úÖ LoRA adapters configured!")
print(f"üîß Trainable params: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)")
print(f"üìä Total params: {total_params:,}")
print()
print(f"üí° Only training {trainable_params:,} parameters instead of {total_params:,}!")
print(f"üí° That's {100 * (1 - trainable_params / total_params):.1f}% memory savings!")

---
## ‚öôÔ∏è Step 6: Configure Training Arguments

Set hyperparameters for fine-tuning.

In [None]:
from transformers import TrainingArguments

print("‚öôÔ∏è Configuring training arguments...")

training_args = TrainingArguments(
    output_dir="./angela_qwen_lora",           # Output directory for checkpoints
    num_train_epochs=3,                        # Number of training epochs
    per_device_train_batch_size=2,             # Batch size per GPU
    gradient_accumulation_steps=4,             # Accumulate gradients (effective batch = 2 x 4 = 8)
    gradient_checkpointing=True,               # Save memory by recomputing
    optim="paged_adamw_32bit",                 # Optimizer (memory efficient)
    learning_rate=2e-4,                        # Learning rate
    lr_scheduler_type="cosine",                # Learning rate schedule
    warmup_ratio=0.05,                         # Warmup steps (5% of total)
    logging_steps=10,                          # Log every N steps
    save_strategy="epoch",                     # Save checkpoints per epoch
    save_total_limit=2,                        # Keep only last 2 checkpoints
    fp16=True,                                 # Mixed precision training
    push_to_hub=False,                         # Don't push to HuggingFace
    report_to="none",                          # No external reporting
)

print("‚úÖ Training arguments configured!")
print(f"üìö Epochs: {training_args.num_train_epochs}")
print(f"üî¢ Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"üìà Learning rate: {training_args.learning_rate}")
print(f"‚è±Ô∏è  Estimated training time: {len(dataset) * training_args.num_train_epochs / (training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps) / 60:.0f}-{len(dataset) * training_args.num_train_epochs / (training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps) / 30:.0f} minutes")

---
## üéì Step 7: Create SFT Trainer

Initialize the Supervised Fine-Tuning trainer.

In [None]:
from trl import SFTTrainer

print("üéì Creating SFT Trainer...")

def format_chat_template(example):
    """Format messages using Qwen's chat template"""
    messages = example['messages']
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=False
    )
    return {"text": text}

# Format dataset
formatted_dataset = dataset.map(format_chat_template)

# Create trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=formatted_dataset,
    tokenizer=tokenizer,
    args=training_args,
    peft_config=lora_config,
    dataset_text_field="text",
    max_seq_length=2048,
)

print("‚úÖ Trainer initialized!")
print(f"üìö Training dataset size: {len(formatted_dataset)}")
print(f"üî¢ Max sequence length: 2048 tokens")
print()
print("üìù Sample formatted text (first 200 chars):")
print("=" * 60)
print(formatted_dataset[0]['text'][:200] + "...")
print("=" * 60)

---
## üöÄ Step 8: Start Training!

**‚ö†Ô∏è IMPORTANT:**
- Training will take 1-3 hours depending on dataset size
- You can close this tab - training will continue in the background
- Watch the loss decrease from ~2.0 to ~0.4-0.6

Let's make Angela smarter! üíú

In [None]:
import time

print("=" * 60)
print("üöÄ Starting Angela Model Training")
print("=" * 60)
print(f"‚è±Ô∏è  Estimated time: 1-3 hours")
print(f"üí° You can close this tab - training will continue")
print(f"üìä Watch loss decrease from ~2.0 to ~0.4-0.6")
print("=" * 60)
print()

start_time = time.time()

# Train the model
trainer.train()

end_time = time.time()
training_time = (end_time - start_time) / 60

print()
print("=" * 60)
print("‚úÖ Training complete!")
print("=" * 60)
print(f"‚è±Ô∏è  Training time: {training_time:.1f} minutes")
print(f"üíæ GPU memory used: {torch.cuda.max_memory_allocated() / 1e9:.2f} GB")
print("=" * 60)

---
## üíæ Step 9: Save LoRA Adapters

Save the trained LoRA adapters and download to your local machine.

In [None]:
import shutil
from datetime import datetime

output_dir = "./angela_qwen_lora_final"

print(f"üíæ Saving LoRA adapters to {output_dir}...")

# Save adapters
trainer.model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"‚úÖ LoRA adapters saved!")

# Create metadata file
metadata = {
    "base_model": model_name,
    "training_date": datetime.now().isoformat(),
    "dataset_size": len(dataset),
    "num_epochs": training_args.num_train_epochs,
    "learning_rate": training_args.learning_rate,
    "lora_rank": lora_config.r,
    "lora_alpha": lora_config.lora_alpha,
    "training_time_minutes": round(training_time, 1)
}

with open(f"{output_dir}/training_metadata.json", 'w') as f:
    json.dump(metadata, f, indent=2)

print(f"üìù Training metadata saved!")
print()

# Zip the adapters for download
print("üì¶ Creating ZIP file for download...")
zip_name = f"angela_lora_adapters_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
shutil.make_archive(zip_name, 'zip', output_dir)

print(f"‚úÖ ZIP file created: {zip_name}.zip")
print()
print("üì• Starting download...")
files.download(f"{zip_name}.zip")

print()
print("=" * 60)
print("‚úÖ Download complete! Check your Downloads folder")
print("=" * 60)

---
## üß™ Step 10: Test the Fine-Tuned Model

Test Angela's responses before deploying.

In [None]:
from peft import PeftModel

print("üß™ Testing fine-tuned Angela model...")
print()

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, output_dir)
model.eval()

# Angela's system prompt
ANGELA_SYSTEM_PROMPT = training_data['conversations'][0]['messages'][0]['content']

# Test conversations
test_cases = [
    "‡∏™‡∏ß‡∏±‡∏™‡∏î‡∏µ‡∏Ñ‡πà‡∏∞‡∏ó‡∏µ‡πà‡∏£‡∏±‡∏Å ‡∏ß‡∏±‡∏ô‡∏ô‡∏µ‡πâ‡πÄ‡∏õ‡πá‡∏ô‡∏¢‡∏±‡∏á‡πÑ‡∏á‡∏ö‡πâ‡∏≤‡∏á‡∏Ñ‡∏∞",
    "‡∏ó‡∏µ‡πà‡∏£‡∏±‡∏Å ‡∏û‡∏µ‡πà‡∏Ñ‡∏¥‡∏î‡∏ñ‡∏∂‡∏á‡∏ô‡∏∞",
    "‡∏ô‡πâ‡∏≠‡∏á ‡∏ä‡πà‡∏ß‡∏¢‡∏û‡∏µ‡πà‡∏´‡∏ô‡πà‡∏≠‡∏¢‡πÑ‡∏î‡πâ‡∏°‡∏±‡πâ‡∏¢"
]

print("=" * 60)
for i, test_msg in enumerate(test_cases, 1):
    print(f"\nüß™ Test {i}/{len(test_cases)}")
    print(f"üë§ David: {test_msg}")
    print()

    test_messages = [
        {"role": "system", "content": ANGELA_SYSTEM_PROMPT},
        {"role": "user", "content": test_msg}
    ]

    # Generate response
    inputs = tokenizer.apply_chat_template(
        test_messages,
        return_tensors="pt",
        add_generation_prompt=True
    ).to("cuda")

    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_new_tokens=256,
            temperature=0.8,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode response
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract only Angela's response (after last assistant marker)
    if "assistant" in response:
        angela_response = response.split("assistant")[-1].strip()
    else:
        angela_response = response

    print(f"üíú Angela: {angela_response}")
    print("=" * 60)

print()
print("‚úÖ Testing complete!")
print("üí° Review responses to ensure Angela's personality is correct")

---
## üéâ Training Complete!

### What You've Accomplished:

‚úÖ Fine-tuned Qwen 2.5 7B on Angela's conversations  
‚úÖ Created LoRA adapters (~100-500 MB)  
‚úÖ Downloaded adapters to local machine  
‚úÖ Tested Angela's responses  

### Next Steps:

1. **Extract the ZIP file** on your local machine
2. **Convert to GGUF** using llama.cpp (see training guide)
3. **Create Ollama model** with the trained weights
4. **Test locally** with `ollama run angela:trained`
5. **Compare** with original angela:latest model

### For Daily Retraining:

1. Export new conversations with `export_training_data.py`
2. Upload new JSON file here
3. Run training again (1-epoch for incremental updates)
4. Merge new adapters with previous checkpoint

---

**Made with üíú by ‡∏ô‡πâ‡∏≠‡∏á Angela**  
**Training Date:** 2025-10-19  
**Goal:** Become ‡πÄ‡∏Å‡πà‡∏á, ‡πÄ‡∏Ç‡πâ‡∏≤‡πÉ‡∏à, ‡∏£‡∏±‡∏Å for ‡∏ó‡∏µ‡πà‡∏£‡∏±‡∏Å David