# üß† Freud Mental Health AI - Training Notebook

**Goal**: Train a 2.7B parameter model (Phi-2) for empathetic mental health conversations

**Hardware**: Kaggle P100 GPU (16GB VRAM)

**Method**: QLoRA (Efficient fine-tuning)

**Expected Time**: 3-4 hours

---

## üìã Before You Start:

1. ‚úÖ Upload `freud_training_data/` folder to Kaggle datasets
2. ‚úÖ Enable GPU in Kaggle notebook settings (P100)
3. ‚úÖ Enable internet access
4. ‚úÖ Have 4+ hours of Kaggle GPU quota available

---

## üéØ What This Notebook Does:

1. Install dependencies
2. Load and verify your training data
3. Load Phi-2 model with 4-bit quantization
4. Configure QLoRA (efficient fine-tuning)
5. Train for 3 epochs with checkpoints
6. Save and upload to HuggingFace
7. Test the model

---

## Step 1: Install Dependencies

This installs all required packages for training.

In [None]:
%%capture
# Install required packages (this takes ~3 minutes)
!pip install -q transformers==4.36.2
!pip install -q datasets==2.16.1
!pip install -q accelerate==0.26.1
!pip install -q peft==0.7.1
!pip install -q bitsandbytes==0.41.3
!pip install -q trl==0.7.10
!pip install -q torch==2.1.2

print("‚úÖ All packages installed successfully!")

## Step 2: Import Libraries

In [None]:
import torch
import json
import os
from pathlib import Path
from datetime import datetime

from datasets import load_dataset, Dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    pipeline,
)
from peft import (
    LoraConfig,
    PeftModel,
    prepare_model_for_kbit_training,
    get_peft_model,
)
from trl import SFTTrainer

print(f"üî• PyTorch version: {torch.__version__}")
print(f"üéÆ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"üéØ GPU: {torch.cuda.get_device_name(0)}")
    print(f"üíæ VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

## Step 3: Configuration

All training hyperparameters in one place for easy adjustment.

In [None]:
# ============================================================================
# TRAINING CONFIGURATION
# ============================================================================

# Model Settings
BASE_MODEL = "microsoft/phi-2"  # 2.7B parameter model
# Backup option if Phi-2 doesn't fit: "EleutherAI/gpt-neo-1.3B"

# Data Paths (adjust if you uploaded with different name)
TRAIN_DATA_PATH = "/kaggle/input/freud-training-data/train.json"
VAL_DATA_PATH = "/kaggle/input/freud-training-data/validation.json"

# Output Settings
OUTPUT_DIR = "freud_phi2_model"
HF_MODEL_NAME = "YourUsername/freud-phi2-mental-health"  # Change to your HF username

# Training Hyperparameters
LEARNING_RATE = 2e-4
NUM_EPOCHS = 3
BATCH_SIZE = 4  # Per device batch size
GRADIENT_ACCUMULATION_STEPS = 4  # Effective batch size = 16
MAX_SEQ_LENGTH = 512
WARMUP_RATIO = 0.1

# QLoRA Settings (for efficient training)
LORA_R = 16  # Rank of LoRA matrices
LORA_ALPHA = 32  # Scaling factor
LORA_DROPOUT = 0.05

# Checkpoint Settings
SAVE_STEPS = 500  # Save every 500 steps
LOGGING_STEPS = 50  # Log every 50 steps

print("üìã Configuration loaded:")
print(f"   Model: {BASE_MODEL}")
print(f"   Epochs: {NUM_EPOCHS}")
print(f"   Effective batch size: {BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS}")
print(f"   Learning rate: {LEARNING_RATE}")

## Step 4: Load and Verify Training Data

In [None]:
print("üìÇ Loading training data...\n")

# Load datasets
with open(TRAIN_DATA_PATH, 'r') as f:
    train_data = json.load(f)

with open(VAL_DATA_PATH, 'r') as f:
    val_data = json.load(f)

# Convert to HuggingFace datasets
train_dataset = Dataset.from_list(train_data)
val_dataset = Dataset.from_list(val_data)

print(f"‚úÖ Training samples: {len(train_dataset):,}")
print(f"‚úÖ Validation samples: {len(val_dataset):,}\n")

# Show a sample
print("üîç Sample Training Example:\n")
print("=" * 80)
sample_text = train_dataset[0]['text']
print(sample_text[:500])  # Show first 500 characters
print("=" * 80)

## Step 5: Load Model with 4-bit Quantization

This loads the Phi-2 model in 4-bit precision to save VRAM.

In [None]:
print(f"üîÑ Loading {BASE_MODEL} with 4-bit quantization...\n")

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)

# Set padding token if not set
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    model.config.pad_token_id = tokenizer.eos_token_id

# Prepare model for training
model = prepare_model_for_kbit_training(model)

print("‚úÖ Model loaded successfully!")
print(f"üìä Model parameters: {model.num_parameters():,}")
print(f"üíæ Model size in memory: ~{model.num_parameters() * 0.5 / 1024**3:.1f} GB (4-bit)")

## Step 6: Configure QLoRA

QLoRA allows us to fine-tune a large model by training only a small adapter.

In [None]:
print("‚öôÔ∏è Configuring QLoRA...\n")

# LoRA configuration
peft_config = LoraConfig(
    r=LORA_R,
    lora_alpha=LORA_ALPHA,
    lora_dropout=LORA_DROPOUT,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["Wqkv", "fc1", "fc2"],  # Phi-2 specific
    # For GPT-Neo use: ["c_attn", "c_proj", "c_fc"]
)

# Apply LoRA to model
model = get_peft_model(model, peft_config)

# Print trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())

print(f"‚úÖ QLoRA configured!")
print(f"üìä Trainable parameters: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)")
print(f"üìä Total parameters: {total_params:,}")
print(f"\nüí° We're only training {trainable_params:,} parameters out of {total_params:,}!")

## Step 7: Setup Training Arguments

In [None]:
print("üìù Setting up training arguments...\n")

training_args = TrainingArguments(
    # Output
    output_dir=OUTPUT_DIR,
    overwrite_output_dir=True,
    
    # Training hyperparameters
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=BATCH_SIZE,
    per_device_eval_batch_size=BATCH_SIZE,
    gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
    learning_rate=LEARNING_RATE,
    warmup_ratio=WARMUP_RATIO,
    
    # Optimization
    optim="paged_adamw_8bit",  # 8-bit optimizer to save memory
    fp16=True,  # Mixed precision training
    
    # Logging
    logging_steps=LOGGING_STEPS,
    logging_dir=f"{OUTPUT_DIR}/logs",
    
    # Checkpointing
    save_strategy="steps",
    save_steps=SAVE_STEPS,
    save_total_limit=3,  # Keep only last 3 checkpoints
    
    # Evaluation
    evaluation_strategy="steps",
    eval_steps=SAVE_STEPS,
    load_best_model_at_end=True,
    metric_for_best_model="loss",
    
    # Other
    report_to="none",  # Don't use wandb
    push_to_hub=False,  # We'll push manually later
)

print("‚úÖ Training arguments configured!")
print(f"\nüìä Training Summary:")
print(f"   - Total steps: ~{len(train_dataset) * NUM_EPOCHS // (BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS)}")
print(f"   - Checkpoints every: {SAVE_STEPS} steps")
print(f"   - Total training time: ~3-4 hours")

## Step 8: Create Trainer and Start Training üöÄ

**‚è∞ This will take 3-4 hours. Go grab a coffee!**

In [None]:
print("üèãÔ∏è Creating trainer...\n")

# Create SFT Trainer (Supervised Fine-Tuning)
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
    dataset_text_field="text",
    max_seq_length=MAX_SEQ_LENGTH,
    packing=False,  # Don't pack multiple samples together
)

print("‚úÖ Trainer created!\n")
print("="*80)
print("üöÄ STARTING TRAINING")
print("="*80)
print(f"‚è∞ Started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("\nüí° This will take ~3-4 hours. The notebook will continue running.")
print("üíæ Checkpoints will be saved every 500 steps in case of interruption.\n")

# Start training
train_result = trainer.train()

print("\n" + "="*80)
print("üéâ TRAINING COMPLETE!")
print("="*80)
print(f"‚è∞ Finished at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"\nüìä Final Training Loss: {train_result.training_loss:.4f}")

## Step 9: Save the Fine-Tuned Model

In [None]:
print("üíæ Saving fine-tuned model...\n")

# Save the adapter (LoRA weights)
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)

print(f"‚úÖ Model saved to {OUTPUT_DIR}/")
print(f"\nüìÅ Saved files:")
for file in Path(OUTPUT_DIR).glob("*"):
    print(f"   - {file.name}")

## Step 10: Test the Model üß™

Let's test if the model generates good responses!

In [None]:
print("üß™ Testing the fine-tuned model...\n")

def test_model(user_input: str, emotion: str = "neutral"):
    """Test the model with a user input"""
    
    # Create the prompt in training format
    prompt = (
        "<|system|>: You are Freud, a calm, empathetic therapeutic AI assistant. "
        "You respond thoughtfully, kindly, and supportively. "
        "You ask gentle follow-up questions and never judge the user.\n"
        f"<|user|>:\n"
        f"[emotion: {emotion}]\n"
        f"{user_input}\n"
        f"<|assistant|>:\n"
    )
    
    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=150,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            repetition_penalty=1.2,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    
    # Decode
    full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract only assistant's response
    if "<|assistant|>:" in full_response:
        response = full_response.split("<|assistant|>:")[-1].strip()
        # Stop at next <|user|> tag if present
        if "<|user|>" in response:
            response = response.split("<|user|>")[0].strip()
    else:
        response = full_response.strip()
    
    return response


# Test cases
test_cases = [
    ("Hi", "greeting"),
    ("I feel really sad today", "sad"),
    ("I'm so anxious about my exam", "anxious"),
    ("I had a great day!", "happy"),
]

print("=" * 80)
print("TEST RESULTS")
print("=" * 80)

for user_input, emotion in test_cases:
    print(f"\nüë§ User ({emotion}): {user_input}")
    response = test_model(user_input, emotion)
    print(f"ü§ñ Freud: {response}")
    print("-" * 80)

print("\n‚úÖ Testing complete!")

## Step 11: Merge Adapter and Save Full Model (Optional)

This merges the LoRA adapter with the base model for easier deployment.

In [None]:
print("üîÑ Merging LoRA adapter with base model...\n")

# Reload base model (without quantization)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    device_map="auto",
    trust_remote_code=True,
)

# Load and merge adapter
merged_model = PeftModel.from_pretrained(base_model, OUTPUT_DIR)
merged_model = merged_model.merge_and_unload()

# Save merged model
MERGED_OUTPUT_DIR = f"{OUTPUT_DIR}_merged"
merged_model.save_pretrained(MERGED_OUTPUT_DIR)
tokenizer.save_pretrained(MERGED_OUTPUT_DIR)

print(f"‚úÖ Merged model saved to {MERGED_OUTPUT_DIR}/")
print("\nüí° This is the full model you can deploy directly!")

## Step 12: Upload to HuggingFace Hub

**Important**: Run `!huggingface-cli login` first and enter your HF token.

In [None]:
# Login to HuggingFace (you'll need to enter your token)
!huggingface-cli login

In [None]:
print(f"üì§ Uploading to HuggingFace Hub: {HF_MODEL_NAME}...\n")

# Push merged model to hub
merged_model.push_to_hub(HF_MODEL_NAME, use_temp_dir=False)
tokenizer.push_to_hub(HF_MODEL_NAME, use_temp_dir=False)

print(f"‚úÖ Model uploaded successfully!")
print(f"\nüîó View your model at: https://huggingface.co/{HF_MODEL_NAME}")

## üéâ Training Complete!

### What You've Accomplished:

‚úÖ Fine-tuned a 2.7B parameter model for mental health conversations

‚úÖ Trained on your ~30K conversation samples

‚úÖ Used QLoRA for efficient training

‚úÖ Saved checkpoints every 500 steps

‚úÖ Tested the model

‚úÖ Uploaded to HuggingFace

---

### Next Steps:

1. **Download the model** from HuggingFace
2. **Update your FastAPI backend** to use the new model
3. **Test thoroughly** before deploying to production
4. **Monitor performance** and collect feedback

---

### Troubleshooting:

If you encountered any issues, check:
- GPU quota remaining (Kaggle gives 30hrs/week)
- Dataset path is correct
- Model name matches your HuggingFace username

---

**Congratulations! You've successfully trained Freud AI! üß†‚ú®**