# ü§ñ TinyLlama Personal Fine-tuning with LoRA

This notebook demonstrates how to fine-tune the TinyLlama-1.1B-Chat model using synthetic personal data to create a personalized AI assistant.

**What we'll do:**
1. Generate synthetic personal data for "John Doe"
2. Fine-tune TinyLlama using LoRA for memory efficiency
3. Create an interactive chatbot interface

**Expected runtime:** 15-30 minutes on Colab T4 GPU

---

## üì¶ Section 1: Install Required Packages

In [None]:
# Install required packages for model fine-tuning
!pip install -q torch torchvision torchaudio
!pip install -q transformers>=4.36.0
!pip install -q peft>=0.7.0
!pip install -q datasets
!pip install -q accelerate
!pip install -q bitsandbytes
!pip install -q trl
!pip install -q gradio
!pip install -q numpy pandas

print("‚úÖ All packages installed successfully!")

## üîß Section 2: Import Libraries and Setup

In [None]:
import torch
import json
import pandas as pd
from datetime import datetime
from typing import List, Dict, Any
from datasets import Dataset
from transformers import (
    AutoTokenizer, 
    AutoModelForCausalLM, 
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model, TaskType
import gradio as gr
import warnings
warnings.filterwarnings("ignore")

# Check GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"üöÄ Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB")

## üìù Section 3: Generate Synthetic Personal Data

In [None]:
class PersonalDataGenerator:
    """Generates comprehensive synthetic personal data for a fictional person"""
    
    def __init__(self, person_name: str = "John Doe"):
        self.person_name = person_name
        self.training_data = []
    
    def generate_chat_logs(self) -> List[Dict[str, str]]:
        """Generate chat conversation data - EXPANDED"""
        return [
            {
                "prompt": f"What did {self.person_name} talk about with his friends yesterday?",
                "response": f"{self.person_name} discussed his new hiking trip plans with Sarah and mentioned he's been reading '1984' by George Orwell. He also talked about trying a new coffee shop downtown."
            },
            {
                "prompt": f"How does {self.person_name} usually respond when friends ask for advice?",
                "response": f"{self.person_name} is known for listening carefully first, then offering practical suggestions. He often says 'Have you considered...' and likes to share relevant experiences from his own life."
            },
            {
                "prompt": f"What's {self.person_name}'s texting style like?",
                "response": f"{self.person_name} uses proper punctuation in texts but isn't formal. He often uses üòä and üëç emojis, and frequently sends links to interesting articles he finds."
            },
            {
                "prompt": f"Tell me about {self.person_name}'s group chat conversations",
                "response": f"In group chats, {self.person_name} is the one who remembers everyone's birthdays and often suggests meetup plans. He shares hiking photos and occasionally sends philosophical questions that spark long discussions."
            },
            {
                "prompt": f"How does {self.person_name} communicate when he's excited about something?",
                "response": f"When {self.person_name} is excited, he uses more exclamation points and shares detailed stories. He loves to explain why something is fascinating to him and often says 'You have to check this out!'"
            },
            {
                "prompt": f"What does {self.person_name} do when someone in the group is feeling down?",
                "response": f"{self.person_name} immediately offers to talk privately or suggests meeting for coffee. He's great at asking thoughtful questions like 'What's really bothering you?' and never rushes to give solutions."
            }
        ]
    
    def generate_social_media_posts(self) -> List[Dict[str, str]]:
        """Generate social media style content - EXPANDED"""
        return [
            {
                "prompt": f"What kind of posts does {self.person_name} share on social media?",
                "response": f"{self.person_name} mainly posts nature photography from his hikes, book recommendations, and thoughtful observations about technology and society. He rarely posts personal drama but loves sharing beautiful sunset photos."
            },
            {
                "prompt": f"How does {self.person_name} engage with others' social media posts?",
                "response": f"{self.person_name} thoughtfully comments on friends' posts, especially travel photos and book recommendations. He's quick to congratulate people on achievements and often shares helpful resources related to their interests."
            },
            {
                "prompt": f"What would {self.person_name} post about his weekend?",
                "response": f"Just spent an amazing morning hiking the Blue Ridge Trail! üèîÔ∏è The sunrise was incredible. Also finished reading 'Sapiens' - highly recommend for anyone interested in human history. Coffee and good books make the perfect Sunday afternoon. #hiking #books #nature"
            },
            {
                "prompt": f"Describe {self.person_name}'s social media personality",
                "response": f"{self.person_name} is authentic and positive on social media. He avoids controversial topics but isn't afraid to share thoughtful perspectives on technology, environmental issues, and personal growth. His posts often inspire meaningful conversations."
            },
            {
                "prompt": f"What does {self.person_name} post when he discovers a new book?",
                "response": f"Just picked up 'The Power of Now' by Eckhart Tolle. Three pages in and already underlining passages! üìö‚ú® Anyone else read this? Would love to hear your thoughts. There's something about books that make you question everything you thought you knew."
            },
            {
                "prompt": f"How does {self.person_name} share his hiking experiences?",
                "response": f"Reached the summit just as the mist cleared - nature's perfect timing! üåÑ Reminded me why I love these solo adventures. Sometimes you need silence to hear your own thoughts clearly. The trail teaches patience in ways the city never could."
            }
        ]
    
    def generate_diary_entries(self) -> List[Dict[str, str]]:
        """Generate first-person diary/journal content - EXPANDED"""
        return [
            {
                "prompt": f"What does {self.person_name} write about in his personal journal?",
                "response": f"I've been reflecting on how technology shapes our relationships. Today's hike reminded me why I need to disconnect more often. The silence in nature helps me think clearly about what really matters - genuine connections with people and staying curious about the world."
            },
            {
                "prompt": f"How does {self.person_name} express his thoughts about personal growth?",
                "response": f"I'm learning that growth happens in small, daily choices. Reading has expanded my perspective so much this year - especially books about philosophy and human psychology. I want to be more intentional about how I spend my time and energy."
            },
            {
                "prompt": f"What are {self.person_name}'s private thoughts about his future?",
                "response": f"I'm excited about the possibilities ahead, but also mindful not to rush. I want to keep learning, maybe travel to places I've only read about, and find work that feels meaningful. Balance seems to be the key - ambition paired with presence."
            },
            {
                "prompt": f"How does {self.person_name} process difficult emotions in his journal?",
                "response": f"When I'm overwhelmed, writing helps me untangle my thoughts. I try to approach my feelings with curiosity rather than judgment. Nature walks and good conversations with close friends are my go-to strategies for working through challenges."
            },
            {
                "prompt": f"What does {self.person_name} write about after a meaningful conversation?",
                "response": f"Had such a deep conversation with Maria today about purpose and fulfillment. She asked me what I'd do if money wasn't a factor, and I realized I've been so focused on practical concerns that I've forgotten to dream. I want to make space for both - being responsible and staying open to possibilities."
            },
            {
                "prompt": f"How does {self.person_name} reflect on his reading habits?",
                "response": f"I notice I'm drawn to books that challenge my assumptions. Today I realized that every book I've loved has taught me something about human nature or made me question how I see the world. Reading isn't just entertainment for me - it's how I grow and stay connected to ideas bigger than my daily routine."
            }
        ]
    
    def generate_qa_prompts(self) -> List[Dict[str, str]]:
        """Generate Q&A style training data - EXPANDED"""
        return [
            {
                "prompt": f"What is {self.person_name}'s favorite book and why?",
                "response": f"{self.person_name}'s favorite book is 'Sapiens' by Yuval Noah Harari because it fundamentally changed how he thinks about human history and our place in the world. He loves books that combine storytelling with big ideas."
            },
            {
                "prompt": f"What does {self.person_name} think about artificial intelligence?",
                "response": f"{self.person_name} finds AI fascinating but believes it should enhance human capabilities rather than replace human connection. He's optimistic about AI's potential for solving complex problems while being mindful of ethical considerations."
            },
            {
                "prompt": f"What is {self.person_name}'s favorite weekend activity?",
                "response": f"{self.person_name} loves spending weekends hiking in the mountains. He finds that being in nature helps him recharge and gain perspective. He often combines hiking with photography and reading in scenic spots."
            },
            {
                "prompt": f"How does {self.person_name} approach learning new things?",
                "response": f"{self.person_name} is naturally curious and approaches learning through a combination of reading, hands-on practice, and conversations with knowledgeable people. He believes in learning from multiple perspectives before forming opinions."
            },
            {
                "prompt": f"What are {self.person_name}'s core values?",
                "response": f"{self.person_name} values authenticity, continuous learning, meaningful relationships, and environmental stewardship. He believes in being kind but honest, and in taking responsibility for his impact on others and the world."
            },
            {
                "prompt": f"What kind of music does {self.person_name} enjoy?",
                "response": f"{self.person_name} enjoys indie folk and ambient electronic music. He finds that music helps him focus while reading or working, and he often discovers new artists through friends' recommendations and music blogs."
            },
            {
                "prompt": f"How does {self.person_name} handle stress?",
                "response": f"{self.person_name} manages stress through nature walks, meditation, journaling, and talking with close friends. He's learned that acknowledging stress rather than ignoring it helps him address root causes more effectively."
            },
            {
                "prompt": f"What is {self.person_name}'s philosophy on work-life balance?",
                "response": f"{self.person_name} believes that work should be meaningful and allow time for personal relationships and hobbies. He prioritizes efficiency during work hours so he can be fully present during personal time. He sees work-life integration rather than strict separation."
            },
            {
                "prompt": f"What's {self.person_name}'s morning routine like?",
                "response": f"{self.person_name} starts his day with 10 minutes of meditation, then coffee while reading. He avoids checking his phone first thing in the morning, preferring to ease into the day mindfully. On weekends, he often goes for early morning hikes."
            },
            {
                "prompt": f"How does {self.person_name} choose what to read next?",
                "response": f"{self.person_name} keeps a running list of book recommendations from friends, podcasts, and articles. He tries to balance fiction and non-fiction, and often picks books that challenge his current thinking. He's not afraid to abandon a book if it's not engaging him."
            },
            {
                "prompt": f"What's {self.person_name}'s perspective on social media?",
                "response": f"{self.person_name} uses social media intentionally - to stay connected with friends and discover interesting content. He's mindful of not getting caught in endless scrolling and prefers quality interactions over quantity. He sees it as a tool, not a distraction."
            },
            {
                "prompt": f"What does {self.person_name} do when he feels stuck or uninspired?",
                "response": f"When {self.person_name} feels stuck, he goes for long walks without a destination, calls a friend he hasn't spoken to in a while, or picks up a book from a completely different genre. He's learned that inspiration often comes when he stops trying so hard to find it."
            }
        ]
    
    def generate_all_data(self) -> List[Dict[str, str]]:
        """Combine all data types into one dataset"""
        all_data = []
        all_data.extend(self.generate_chat_logs())
        all_data.extend(self.generate_social_media_posts())
        all_data.extend(self.generate_diary_entries())
        all_data.extend(self.generate_qa_prompts())
        
        print(f"‚úÖ Generated {len(all_data)} training examples for {self.person_name}")
        return all_data

# Generate the expanded training data
generator = PersonalDataGenerator("John Doe")
training_data_dict = generator.generate_all_data()

# Display a sample
print("\nüìã Sample training data:")
for i, item in enumerate(training_data_dict[:3]):
    print(f"\n{i+1}. Prompt: {item['prompt']}")
    print(f"   Response: {item['response']}")

print(f"\nüìä Dataset expanded to {len(training_data_dict)} examples for better personalization!")

## üîÑ Section 4: Format Data into Training Dataset

In [None]:
def format_instruction_data(data: List[Dict[str, str]]) -> List[str]:
    """Format data into instruction-following format for TinyLlama"""
    formatted_data = []
    
    for item in data:
        # Use a chat template similar to TinyLlama's expected format
        formatted_text = f"<|system|>\nYou are a helpful assistant that knows about John Doe's personality, preferences, and experiences.</s>\n<|user|>\n{item['prompt']}</s>\n<|assistant|>\n{item['response']}</s>"
        formatted_data.append(formatted_text)
    
    return formatted_data

# Format the data
formatted_data = format_instruction_data(training_data_dict)

print(f"‚úÖ Formatted {len(formatted_data)} training examples")
print("\nüìã Sample formatted data:")
print(formatted_data[0][:200] + "...")

## ü§ñ Section 5: Load and Configure Model

In [None]:
# Model configuration
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

print(f"üîÑ Loading {model_name}...")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Add pad token if it doesn't exist
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load model with memory optimization
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    load_in_8bit=True,  # Use 8-bit for memory efficiency
    trust_remote_code=True
)

print("‚úÖ Model and tokenizer loaded successfully!")
print(f"Model parameters: {model.num_parameters():,}")

## ‚öôÔ∏è Section 6: Setup LoRA Configuration

In [None]:
# Configure LoRA for efficient fine-tuning with optimized parameters
print("‚öôÔ∏è LORA CONFIGURATION PARAMETERS EXPLAINED:")
print("="*80)
print("""
üîß UNDERSTANDING LORA PARAMETERS:

üìä RANK (r) - The adaptation capacity
‚Ä¢ Controls how much the model can change from original weights
‚Ä¢ Higher rank = more capacity to learn, but more parameters and memory
‚Ä¢ Range: 4-128. Common values: 8, 16, 32, 64
‚Ä¢ For personalization: 16-64 works well
‚Ä¢ If model not learning enough: increase r (32‚Üí64)
‚Ä¢ If overfitting: decrease r (32‚Üí16)
‚Ä¢ Current setting: r=32 (good balance for personal data)

‚ö° LORA_ALPHA - The scaling factor
‚Ä¢ Controls how strongly LoRA adaptations affect the model
‚Ä¢ Higher alpha = stronger fine-tuning effect
‚Ä¢ Typical ratio: alpha = 2√ór (e.g., r=32, alpha=64)
‚Ä¢ Range: 8-128. Should be ‚â• r for good results
‚Ä¢ If model responses too generic: increase alpha (64‚Üí128)
‚Ä¢ If model becoming unstable: decrease alpha (64‚Üí32)
‚Ä¢ Current setting: alpha=64 (2√ór, strong adaptation)

üõ°Ô∏è LORA_DROPOUT - Regularization during training
‚Ä¢ Prevents overfitting by randomly dropping LoRA connections
‚Ä¢ Range: 0.0-0.3. Lower = less regularization
‚Ä¢ For small datasets: 0.05-0.1 (light regularization)
‚Ä¢ For large datasets: 0.1-0.2 (more regularization)
‚Ä¢ If overfitting: increase dropout (0.05‚Üí0.1)
‚Ä¢ If underfitting: decrease dropout (0.1‚Üí0.05)
‚Ä¢ Current setting: 0.05 (minimal dropout for small dataset)

üéØ TARGET_MODULES - Which layers to adapt
‚Ä¢ Determines which parts of the model get fine-tuned
‚Ä¢ More modules = stronger adaptation but more memory
‚Ä¢ Common choices:
  - Conservative: ["q_proj", "v_proj"] (attention only)
  - Balanced: ["q_proj", "v_proj", "k_proj", "o_proj"] (all attention)
  - Aggressive: + ["gate_proj", "up_proj", "down_proj"] (+ MLP layers)
‚Ä¢ For strong personalization: include MLP layers
‚Ä¢ Current setting: All attention + MLP (maximum adaptation)

üîó BIAS - Whether to adapt bias terms
‚Ä¢ "none": Don't adapt bias (most common, memory efficient)
‚Ä¢ "all": Adapt all bias terms (more capacity, more memory)
‚Ä¢ "lora_only": Only adapt LoRA bias terms
‚Ä¢ For most cases: "none" is sufficient
‚Ä¢ Current setting: "none" (efficient)

üìÅ MODULES_TO_SAVE - Additional modules to save
‚Ä¢ Saves non-LoRA modules that might change during training
‚Ä¢ Usually None for LoRA-only fine-tuning
‚Ä¢ Set to ["embed_tokens", "lm_head"] if training from scratch
‚Ä¢ Current setting: None (LoRA-only training)
""")

lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    
    # RANK - Adaptation capacity (4-128, higher = more learning capacity)
    r=32,  # Increased from 16 for better adaptation
    
    # ALPHA - Scaling factor (typically 2√ór, higher = stronger fine-tuning)
    lora_alpha=64,  # 2√ór ratio for strong adaptation
    
    # DROPOUT - Regularization (0.0-0.3, lower = less regularization)
    lora_dropout=0.05,  # Light regularization for small dataset
    
    # TARGET_MODULES - Which layers to adapt (more = stronger adaptation)
    target_modules=[
        "q_proj", "v_proj", "k_proj", "o_proj",  # All attention layers
        "gate_proj", "up_proj", "down_proj"      # MLP layers for strong adaptation
    ],
    
    # BIAS - Bias adaptation strategy
    bias="none",  # Don't adapt bias (memory efficient)
    
    # TECHNICAL PARAMETERS (usually don't change these)
    fan_in_fan_out=False,  # For certain model architectures
    modules_to_save=None,  # Additional modules to save
)

# Apply LoRA to the model
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()

print("\n‚úÖ Enhanced LoRA configuration applied!")
print("üìà Current settings explained:")
print(f"  ‚Ä¢ Rank (r=32): High capacity for learning John's personality")
print(f"  ‚Ä¢ Alpha (64): Strong fine-tuning effect (2√ó rank)")
print(f"  ‚Ä¢ Dropout (0.05): Light regularization for 30-example dataset")
print(f"  ‚Ä¢ Target modules: All attention + MLP layers (maximum adaptation)")
print(f"  ‚Ä¢ Bias: None (memory efficient)")

print(f"\nüîß LoRA Fine-tuning Guide:")
print(f"  üìä MODEL NOT PERSONALIZING?")
print(f"    ‚Üí Increase rank: r=32‚Üí64 (more adaptation capacity)")
print(f"    ‚Üí Increase alpha: alpha=64‚Üí128 (stronger effect)")
print(f"    ‚Üí Add more target_modules (if not using MLP already)")
print(f"")
print(f"  ‚ö†Ô∏è  MODEL OVERFITTING?")
print(f"    ‚Üí Increase dropout: lora_dropout=0.05‚Üí0.1")
print(f"    ‚Üí Decrease rank: r=32‚Üí16 (less capacity)")
print(f"    ‚Üí Decrease alpha: alpha=64‚Üí32 (weaker effect)")
print(f"")
print(f"  üíæ MEMORY ISSUES?")
print(f"    ‚Üí Decrease rank: r=32‚Üí16")
print(f"    ‚Üí Remove MLP layers from target_modules")
print(f"    ‚Üí Keep bias='none'")
print(f"")
print(f"  üéØ PARAMETER COMBINATIONS FOR DIFFERENT GOALS:")
print(f"    ‚Ä¢ Conservative: r=8, alpha=16, dropout=0.1, only attention")
print(f"    ‚Ä¢ Balanced: r=16, alpha=32, dropout=0.05, attention only")
print(f"    ‚Ä¢ Aggressive: r=64, alpha=128, dropout=0.05, attention+MLP ‚Üê For strong personalization")
print(f"    ‚Ä¢ Current: r=32, alpha=64, dropout=0.05, attention+MLP ‚Üê Good balance")

print("\n" + "="*80)
print("üìä TRAINING LOSS ANALYSIS & TARGET VALUES")
print("="*80)

print("""
üéØ UNDERSTANDING TRAINING LOSS:
Training loss measures how well the model predicts the next token in your training data.
Lower loss = better learning, but the absolute value depends on several factors.

üìà TYPICAL LOSS PROGRESSION FOR FINE-TUNING:
‚Ä¢ Initial Loss: 2.0-4.0 (model hasn't learned your data yet)
‚Ä¢ Good Progress: Steady decrease over epochs
‚Ä¢ Target Loss: 0.1-0.5 for small personal datasets
‚Ä¢ Excellent Loss: <0.2 (strong personalization)

üîç INTERPRETING YOUR TRAINING RESULTS:
Based on your output - [40/40 01:15, Epoch 5/5]:
‚Ä¢ Step 5:  0.991 ‚Üí Starting to learn
‚Ä¢ Step 10: 0.913 ‚Üí Good progress  
‚Ä¢ Step 15: 0.826 ‚Üí Steady improvement
‚Ä¢ Step 20: 0.533 ‚Üí Major breakthrough! 
‚Ä¢ Step 25: 0.468 ‚Üí Excellent learning
‚Ä¢ Step 30: 0.210 ‚Üí Outstanding! 
‚Ä¢ Step 35: 0.165 ‚Üí Near-perfect
‚Ä¢ Step 40: 0.155 ‚Üí EXCELLENT RESULT! ‚úÖ

‚úÖ YOUR LOSS ANALYSIS:
Final loss of 0.155 is EXCELLENT for personalization!
This indicates strong learning of John's personality patterns.

üéØ LOSS TARGET RANGES:
‚Ä¢ 0.50-1.00: Decent learning, may need more epochs
‚Ä¢ 0.20-0.50: Good personalization, should work well
‚Ä¢ 0.10-0.20: Excellent personalization ‚Üê YOUR RESULT!
‚Ä¢ <0.10:     Perfect but risk of overfitting

‚ö†Ô∏è  WARNING SIGNS IN LOSS:
‚Ä¢ Loss not decreasing: Learning rate too low, increase to 1e-3
‚Ä¢ Loss exploding (>5.0): Learning rate too high, decrease to 1e-4
‚Ä¢ Loss plateaus early: Need more epochs or higher learning rate
‚Ä¢ Loss jumps around: Batch size too small, increase gradient_accumulation_steps

üîß OPTIMIZATION BASED ON LOSS:
‚Ä¢ Loss >0.5 after 5 epochs: Increase learning_rate to 1e-3
‚Ä¢ Loss <0.1 after 2 epochs: Risk of overfitting, add weight_decay=0.05
‚Ä¢ Loss decreasing too slowly: Increase epochs to 7-10
‚Ä¢ Loss unstable: Increase warmup_steps to 100

üìö LOSS vs PERSONALIZATION QUALITY:
‚Ä¢ Loss 0.8-1.0: Generic responses, limited personalization
‚Ä¢ Loss 0.3-0.7: Some personality traits learned
‚Ä¢ Loss 0.1-0.3: Good personalization, clear personality ‚Üê EXPECTED FOR YOU
‚Ä¢ Loss <0.1:   Strong personalization, may overfit on small datasets

üéâ CONCLUSION FOR YOUR TRAINING:
Your final loss of 0.155 suggests EXCELLENT personalization!
The model should respond with John's specific preferences, hiking interests,
reading habits, and personality traits. This is ideal for a 30-example dataset.
""")

print("="*80)

## üî§ Section 7: Tokenize Dataset

In [None]:
def prepare_dataset(training_data: List[str], tokenizer, max_length: int = 512):
    """Tokenize and prepare dataset for training"""
    
    def tokenize_function(examples):
        # Tokenize the text - handle both single strings and lists
        texts = examples["text"]
        if isinstance(texts, str):
            texts = [texts]
        
        # Tokenize with proper settings
        tokenized = tokenizer(
            texts,
            truncation=True,
            padding=True,  # Enable padding
            max_length=max_length,
            return_tensors=None
        )
        
        # For causal LM, labels are the same as input_ids
        tokenized["labels"] = tokenized["input_ids"].copy()
        return tokenized
    
    # Create dataset
    dataset = Dataset.from_dict({"text": training_data})
    
    # Tokenize the dataset
    tokenized_dataset = dataset.map(
        tokenize_function, 
        batched=True,
        remove_columns=dataset.column_names  # Remove original text column
    )
    
    print(f"‚úÖ Dataset prepared with {len(tokenized_dataset)} examples")
    return tokenized_dataset

# Prepare the tokenized dataset
train_dataset = prepare_dataset(formatted_data, tokenizer)

# Show dataset info
print(f"\nüìä Dataset statistics:")
print(f"Total examples: {len(train_dataset)}")
print(f"Features: {train_dataset.features}")

# Show a sample of the tokenized data
print(f"\nüìã Sample tokenized data:")
sample = train_dataset[0]
print(f"Input IDs shape: {len(sample['input_ids'])}")
print(f"Labels shape: {len(sample['labels'])}")
print(f"Attention mask shape: {len(sample['attention_mask'])}")

## üéØ Section 8: Configure Training Arguments

In [None]:
# Training arguments optimized for better personalization
output_dir = "./tinyllama-personal-lora"

training_args = TrainingArguments(
    output_dir=output_dir,
    
    # EPOCHS - How many times to go through the entire dataset
    # More epochs = more learning, but risk of overfitting
    # For small datasets: 3-10 epochs. For large datasets: 1-3 epochs
    # If model isn't learning: increase epochs
    # If model is memorizing/overfitting: decrease epochs
    num_train_epochs=5,  # Increased from 3 to 5 for better adaptation
    
    # BATCH SIZE - How many examples to process at once
    # Smaller batch = more gradient updates, better for small datasets
    # Larger batch = more stable gradients, faster training
    # For personalization: use smaller batches (1-4)
    # If GPU memory error: decrease batch size
    per_device_train_batch_size=1,  # Small for more frequent updates
    
    # GRADIENT ACCUMULATION - Simulate larger batch size without memory cost
    # Effective batch size = per_device_train_batch_size √ó gradient_accumulation_steps
    # Good effective batch sizes: 4-16 for small models
    # If training is unstable: increase this value
    gradient_accumulation_steps=4,  # Effective batch size = 1√ó4 = 4
    
    # WARMUP STEPS - Gradually increase learning rate at start
    # Prevents large updates that could destabilize training
    # For small datasets: 10-100 steps. For large datasets: 500-2000 steps
    # If training loss spikes early: increase warmup
    warmup_steps=50,  # Reduced for small dataset (was 100)
    
    # LEARNING RATE - How big steps to take during optimization
    # Higher LR = faster learning but risk of instability
    # Lower LR = stable but slow learning
    # For fine-tuning: 1e-5 to 1e-3. For strong personalization: 5e-4 to 1e-3
    # If not learning: increase LR. If training explodes: decrease LR
    learning_rate=5e-4,  # Increased for stronger adaptation (was 2e-4)
    
    # WEIGHT DECAY - Regularization to prevent overfitting
    # Higher values = more regularization, less overfitting
    # Typical range: 0.01 to 0.1
    # If overfitting: increase weight_decay. If underfitting: decrease or remove
    weight_decay=0.01,  # Light regularization
    
    # LEARNING RATE SCHEDULER - How LR changes during training
    # "linear": decreases linearly to 0
    # "cosine": decreases in cosine curve (smoother)
    # "constant": stays the same
    # Cosine often works best for fine-tuning
    lr_scheduler_type="cosine",  # Smooth decay for better convergence
    
    # PRECISION - Trade memory for speed/accuracy
    # fp16=True: Half precision, uses less memory, slightly less accurate
    # fp16=False: Full precision, more memory, more accurate
    # Always use fp16=True on consumer GPUs for memory savings
    fp16=True,  # Essential for memory efficiency on consumer GPUs
    
    # LOGGING - How often to print training progress
    # Lower values = more frequent updates
    # Good range: 5-50 steps depending on dataset size
    logging_steps=5,  # Frequent logging for small dataset
    
    # SAVING STRATEGY - When to save model checkpoints
    # "epoch": save at end of each epoch
    # "steps": save every N steps
    # "no": don't save during training
    save_strategy="epoch",
    
    # EVALUATION STRATEGY - When to run validation (if you have eval data)
    # "epoch": evaluate at end of each epoch
    # "steps": evaluate every N steps  
    # "no": no evaluation during training
    eval_strategy="no",  # No validation set for this example
    
    # MEMORY OPTIMIZATIONS
    remove_unused_columns=False,  # Keep all data columns
    dataloader_drop_last=True,    # Drop incomplete batches for consistency
    
    # GRADIENT CLIPPING - Prevent exploding gradients
    # Clips gradients if their norm exceeds this value
    # Typical range: 0.5 to 2.0
    # If training becomes unstable: try 0.5 or 1.0
    max_grad_norm=1.0,  # Stability for small model fine-tuning
    
    # CHECKPOINT MANAGEMENT
    save_total_limit=2,        # Only keep 2 most recent checkpoints (saves disk space)
    load_best_model_at_end=False,  # Don't load best model (no eval data)
    
    # EXTERNAL INTEGRATIONS
    push_to_hub=False,         # Don't upload to Hugging Face Hub
    report_to=None,            # Disable wandb/tensorboard logging
    
    # METRICS (only relevant if doing evaluation)
    metric_for_best_model=None,
    greater_is_better=None,
)

# Data collator with proper padding
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # Causal LM (predict next token), not masked LM (fill blanks)
    pad_to_multiple_of=8,  # Pad sequences to multiples of 8 for GPU efficiency
)

print("‚úÖ Enhanced training arguments configured!")
print(f"üìä Training configuration:")
print(f"  ‚Ä¢ Epochs: {training_args.num_train_epochs} (how many times through dataset)")
print(f"  ‚Ä¢ Learning rate: {training_args.learning_rate} (step size for optimization)")
print(f"  ‚Ä¢ Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"  ‚Ä¢ Warmup steps: {training_args.warmup_steps} (gradual LR increase)")
print(f"  ‚Ä¢ LR scheduler: {training_args.lr_scheduler_type} (how LR changes)")
print(f"  ‚Ä¢ Weight decay: {training_args.weight_decay} (regularization strength)")
print(f"  ‚Ä¢ Max grad norm: {training_args.max_grad_norm} (gradient clipping)")

print(f"\nüîß Fine-tuning tips:")
print(f"  ‚Ä¢ Model not personalizing? ‚Üí Increase epochs (7-10) or learning_rate (1e-3)")
print(f"  ‚Ä¢ Training unstable? ‚Üí Decrease learning_rate (1e-4) or increase warmup_steps (100)")
print(f"  ‚Ä¢ Overfitting? ‚Üí Increase weight_decay (0.05) or decrease epochs")
print(f"  ‚Ä¢ GPU memory error? ‚Üí Decrease per_device_train_batch_size to 1")
print(f"  ‚Ä¢ Training too slow? ‚Üí Increase per_device_train_batch_size or gradient_accumulation_steps")

print(f"\nOutput directory: {output_dir}")
print(f"Tokenizer pad token: {tokenizer.pad_token}")
print(f"Tokenizer pad token ID: {tokenizer.pad_token_id}")

## üöÄ Section 9: Fine-tune the Model

In [None]:
# Initialize trainer
trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    data_collator=data_collator,
)

print("üöÄ Starting training...")
print("This may take 15-30 minutes depending on your GPU.\n")

# Start training
trainer.train()

print("\n‚úÖ Training completed!")

## üíæ Section 10: Save Fine-tuned Model

In [None]:
# Save the fine-tuned model
trainer.save_model()
tokenizer.save_pretrained(output_dir)

print(f"‚úÖ Model saved to {output_dir}")

# List saved files
import os
saved_files = os.listdir(output_dir)
print(f"\nüìÅ Saved files: {saved_files}")

## üßπ Section 10.5: Memory Cleanup and Model Preparation

Before testing the fine-tuned model, let's clear GPU memory and prepare the model for inference to avoid device placement errors.

In [None]:
# Clear GPU cache and optimize memory usage
import gc

print("üßπ Cleaning up memory before inference...")

# Clear PyTorch cache
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print(f"‚úÖ GPU memory cleared")
    print(f"üíæ GPU memory allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
    print(f"üíæ GPU memory reserved: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")

# Garbage collection
gc.collect()

# Ensure model is in eval mode and properly on device
peft_model.eval()
print("‚úÖ Model set to evaluation mode")

# Check and fix device placement for PEFT models
target_device = device if torch.cuda.is_available() else "cpu"
print(f"üéØ Target device: {target_device}")

# Move the entire model to target device
peft_model = peft_model.to(target_device)

# Check all model components are on the correct device
print(f"üìç Model device: {next(peft_model.parameters()).device}")
print(f"üìç Model dtype: {next(peft_model.parameters()).dtype}")

# Additional device checks for PEFT models
try:
    # Check base model device
    base_model_device = next(peft_model.base_model.parameters()).device
    print(f"üìç Base model device: {base_model_device}")
    
    # Check if there are any parameters on different devices
    devices = set()
    for name, param in peft_model.named_parameters():
        devices.add(param.device)
    
    print(f"üìç All parameter devices: {devices}")
    
    if len(devices) > 1:
        print("‚ö†Ô∏è  Warning: Model has parameters on multiple devices!")
        print("üîß Attempting to consolidate to single device...")
        peft_model = peft_model.to(target_device)
        
except Exception as e:
    print(f"‚ö†Ô∏è  Could not check all device placements: {e}")

# Final verification
print(f"üîç Final model device check: {next(peft_model.parameters()).device}")
print(f"üîç Tokenizer device compatibility: tokenizer works with any device")

print("\nüéØ Ready for inference testing!")

## üß™ Section 11: Compare Base Model vs Fine-tuned Model

Let's test both the original base model and your fine-tuned LoRA model with the same prompts to see the difference in personalization.

In [None]:
# First, load a separate base model without quantization for comparison
print("üîÑ Loading separate base model for comparison (without quantization)...")

# Load a clean base model without 8-bit quantization for comparison
base_model_for_comparison = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    load_in_8bit=False,  # No quantization for the comparison model
    trust_remote_code=True
)

print("‚úÖ Base model loaded successfully for comparison")

class BaseChatbot:
    """Interface for chatting with the original base model (no fine-tuning)"""
    
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        # Get the device of the model
        self.device = next(model.parameters()).device
        print(f"üí° Base Chatbot initialized on device: {self.device}")
        
        # For non-quantized models, we can safely move to device
        # Don't move 8-bit models as they're already properly placed
        if not getattr(model, 'is_loaded_in_8bit', False):
            self.model = self.model.to(self.device)
            print(f"üìç Base Model moved to device: {next(self.model.parameters()).device}")
        else:
            print(f"üìç Base Model (8-bit) already on device: {next(self.model.parameters()).device}")
    
    def generate_response(self, prompt: str, max_length: int = 200, temperature: float = 0.8) -> str:
        """Generate a response using the base model (no personalization)"""
        
        try:
            # Format the prompt using the same template
            formatted_prompt = f"<|system|>\nYou are a helpful assistant that knows about John Doe's personality, preferences, and experiences.</s>\n<|user|>\n{prompt}</s>\n<|assistant|>\n"
            
            # Tokenize and move to device
            inputs = self.tokenizer.encode(formatted_prompt, return_tensors="pt")
            inputs = inputs.to(self.device)
            
            # Generate with the base model
            with torch.no_grad():
                outputs = self.model.generate(
                    inputs,
                    max_length=len(inputs[0]) + max_length,
                    temperature=temperature,
                    do_sample=True,
                    top_p=0.9,
                    top_k=50,
                    pad_token_id=self.tokenizer.eos_token_id,
                    eos_token_id=self.tokenizer.eos_token_id,
                    use_cache=True,
                    num_return_sequences=1,
                    repetition_penalty=1.15,
                    no_repeat_ngram_size=3,
                    # Remove early_stopping to avoid the warning
                )
            
            # Decode the response
            outputs = outputs.cpu()
            full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            
            # Extract just the assistant's response
            if "<|assistant|>" in full_response:
                response = full_response.split("<|assistant|>")[-1].strip()
            else:
                response = full_response[len(formatted_prompt):].strip()
            
            # Clean up the response
            response = response.replace("</s>", "").strip()
            
            return response
            
        except Exception as e:
            return f"Base model error: {str(e)}"

class PersonalChatbot:
    """Interface for chatting with the fine-tuned LoRA model"""
    
    def __init__(self, model, tokenizer):
        self.model = model
        self.tokenizer = tokenizer
        # Get the device of the model
        self.device = next(model.parameters()).device
        print(f"üí° LoRA Chatbot initialized on device: {self.device}")
        
        # For PEFT models with 8-bit base models, don't use .to() method
        # The model is already properly placed by PEFT
        if not getattr(model.base_model.model, 'is_loaded_in_8bit', False):
            self.model = self.model.to(self.device)
            print(f"üìç LoRA Model moved to device: {next(self.model.parameters()).device}")
        else:
            print(f"üìç LoRA Model (8-bit base) already on device: {next(self.model.parameters()).device}")
    
    def generate_response(self, prompt: str, max_length: int = 200, temperature: float = 0.8) -> str:
        """Generate a response to a user prompt with optimized parameters"""
        
        try:
            # Format the prompt using the same template as training
            formatted_prompt = f"<|system|>\nYou are a helpful assistant that knows about John Doe's personality, preferences, and experiences.</s>\n<|user|>\n{prompt}</s>\n<|assistant|>\n"
            
            # Tokenize with explicit device placement
            inputs = self.tokenizer.encode(formatted_prompt, return_tensors="pt")
            
            # Move inputs to the same device as model
            inputs = inputs.to(self.device)
            
            # Generate with optimized parameters for better personalization
            with torch.no_grad():
                outputs = self.model.generate(
                    inputs,
                    max_length=len(inputs[0]) + max_length,
                    temperature=temperature,
                    do_sample=True,
                    top_p=0.9,
                    top_k=50,
                    pad_token_id=self.tokenizer.eos_token_id,
                    eos_token_id=self.tokenizer.eos_token_id,
                    use_cache=True,
                    num_return_sequences=1,
                    repetition_penalty=1.15,
                    no_repeat_ngram_size=3,
                    # Remove early_stopping to avoid the warning
                )
            
            # Move outputs to CPU for decoding
            outputs = outputs.cpu()
            
            # Decode the response
            full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
            
            # Extract just the assistant's response
            if "<|assistant|>" in full_response:
                response = full_response.split("<|assistant|>")[-1].strip()
            else:
                response = full_response[len(formatted_prompt):].strip()
            
            # Clean up the response
            response = response.replace("</s>", "").strip()
            
            return response
            
        except Exception as e:
            # Try alternative approach with explicit tensor handling
            try:
                # Use tokenizer with return_tensors and explicit device
                encoding = self.tokenizer(
                    formatted_prompt, 
                    return_tensors="pt", 
                    padding=True, 
                    truncation=True,
                    max_length=512
                )
                
                # Move all tensors in encoding to device
                for key in encoding:
                    if isinstance(encoding[key], torch.Tensor):
                        encoding[key] = encoding[key].to(self.device)
                
                # Generate with explicit input_ids and attention_mask
                with torch.no_grad():
                    outputs = self.model.generate(
                        input_ids=encoding['input_ids'],
                        attention_mask=encoding.get('attention_mask', None),
                        max_length=encoding['input_ids'].shape[1] + max_length,
                        temperature=temperature,
                        do_sample=True,
                        top_p=0.9,
                        top_k=50,
                        pad_token_id=self.tokenizer.eos_token_id,
                        eos_token_id=self.tokenizer.eos_token_id,
                        use_cache=True,
                        repetition_penalty=1.15,
                        no_repeat_ngram_size=3,
                        # Remove early_stopping to avoid the warning
                    )
                
                # Decode response
                outputs = outputs.cpu()
                full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
                
                if "<|assistant|>" in full_response:
                    response = full_response.split("<|assistant|>")[-1].strip()
                else:
                    response = full_response[len(formatted_prompt):].strip()
                
                response = response.replace("</s>", "").strip()
                
                return response
                
            except Exception as e2:
                return f"LoRA model error: {str(e2)}"

# Create both chatbot instances
print("ü§ñ Creating chatbot instances for comparison...")

# Base model chatbot (using the non-quantized version for comparison)
try:
    base_chatbot = BaseChatbot(base_model_for_comparison, tokenizer)
    print("‚úÖ Base model chatbot created successfully")
except Exception as e:
    print(f"‚ùå Failed to create base chatbot: {e}")
    # Fallback: try using the original quantized model but handle device placement carefully
    print("üîÑ Trying fallback with original model...")
    try:
        # Create a wrapper that doesn't move the model
        class QuantizedBaseChatbot(BaseChatbot):
            def __init__(self, model, tokenizer):
                self.model = model
                self.tokenizer = tokenizer
                self.device = next(model.parameters()).device
                print(f"üí° Quantized Base Chatbot initialized on device: {self.device}")
                # Don't try to move 8-bit models
        
        base_chatbot = QuantizedBaseChatbot(model, tokenizer)
        print("‚úÖ Fallback base model chatbot created successfully")
    except Exception as e2:
        print(f"‚ùå Fallback also failed: {e2}")
        base_chatbot = None

# LoRA fine-tuned model chatbot
try:
    lora_chatbot = PersonalChatbot(peft_model, tokenizer)
    print("‚úÖ LoRA model chatbot created successfully")
except Exception as e:
    print(f"‚ùå Failed to create LoRA chatbot: {e}")
    lora_chatbot = None

# Test prompts designed to show personalization differences
test_prompts = [
    "What is John's favorite weekend activity?",
    "What does John think about artificial intelligence?", 
    "How does John handle stress?",
    "What kind of books does John like to read?",
    "What's John's morning routine like?",
    "How does John use social media?",
    "What does John do when he feels stuck or uninspired?",
    "Tell me about John's perspective on work-life balance"
]

print("\n" + "="*100)
print("üîç MODEL COMPARISON: BASE vs FINE-TUNED")
print("="*100)
print("This comparison will show how fine-tuning affects the model's responses.")
print("Look for differences in personalization and specific details about John.")
print("="*100)

# Only run comparison if both chatbots were created successfully
if base_chatbot is not None and lora_chatbot is not None:
    for i, prompt in enumerate(test_prompts, 1):
        print(f"\n{'='*20} PROMPT {i} {'='*20}")
        print(f"ü§î Question: {prompt}")
        print(f"{'-'*60}")
        
        # Get base model response
        print("üì∞ BASE MODEL (Original TinyLlama):")
        try:
            base_response = base_chatbot.generate_response(prompt, max_length=150, temperature=0.8)
            print(f"   {base_response}")
        except Exception as e:
            print(f"   ‚ùå Base model error: {e}")
        
        print(f"{'-'*60}")
        
        # Get LoRA model response
        print("‚ú® FINE-TUNED MODEL (After LoRA training):")
        try:
            lora_response = lora_chatbot.generate_response(prompt, max_length=150, temperature=0.8)
            print(f"   {lora_response}")
        except Exception as e:
            print(f"   ‚ùå LoRA model error: {e}")
        
        print(f"{'='*80}")
else:
    print("‚ùå Could not create both chatbots for comparison.")
    print("üí° You can still test the fine-tuned model individually if it was created successfully.")
    
    if lora_chatbot is not None:
        print("\nüîç TESTING FINE-TUNED MODEL ONLY:")
        print("="*50)
        
        for i, prompt in enumerate(test_prompts[:3], 1):  # Test first 3 prompts
            print(f"\nüìù Prompt {i}: {prompt}")
            print("‚ú® Fine-tuned Response:")
            try:
                response = lora_chatbot.generate_response(prompt, max_length=150, temperature=0.8)
                print(f"   {response}")
            except Exception as e:
                print(f"   ‚ùå Error: {e}")

print("\nüéØ WHAT TO LOOK FOR IN THE COMPARISON:")
print("üìä Base Model typically shows:")
print("   ‚Ä¢ Generic responses about AI assistants")
print("   ‚Ä¢ General advice without personal details")
print("   ‚Ä¢ No mention of John's specific interests (hiking, reading, etc.)")
print("   ‚Ä¢ Standard AI assistant language patterns")

print("\n‚ú® Fine-tuned Model should show:")
print("   ‚Ä¢ Specific mentions of John's hiking hobby")
print("   ‚Ä¢ References to books like 'Sapiens' or '1984'")
print("   ‚Ä¢ John's coffee and reading routine")
print("   ‚Ä¢ His thoughtful, nature-loving personality")
print("   ‚Ä¢ Specific details from training data")

print("\nüìà SUCCESS INDICATORS:")
print("   ‚úÖ LoRA model mentions specific details from training data")
print("   ‚úÖ Responses feel more personal and consistent") 
print("   ‚úÖ References to hiking, books, meditation, etc.")
print("   ‚úÖ Different tone/style compared to base model")

print("\nüìä If both models give similar responses:")
print("   üîÑ Training may need more epochs (try 7-10)")
print("   üìà Increase learning rate (try 1e-3)")
print("   ‚öôÔ∏è  Increase LoRA alpha (try 128)")
print("   üìö Add more training examples")

## üé® Section 12: Create Gradio Interface

In [None]:
def create_gradio_interface(base_chatbot: BaseChatbot, lora_chatbot: PersonalChatbot):
    """Create a Gradio web interface comparing both models"""
    
    def compare_models(message, base_history, lora_history):
        # Get responses from both models
        base_response = base_chatbot.generate_response(message)
        lora_response = lora_chatbot.generate_response(message)
        
        # Update chat histories
        base_history.append([message, base_response])
        lora_history.append([message, lora_response])
        
        return "", base_history, lora_history
    
    def single_model_chat(message, history, use_lora=True):
        if use_lora:
            response = lora_chatbot.generate_response(message)
        else:
            response = base_chatbot.generate_response(message)
        history.append([message, response])
        return "", history
    
    with gr.Blocks(title="TinyLlama Personal Assistant Comparison", theme=gr.themes.Soft()) as demo:
        gr.Markdown("# ü§ñ TinyLlama Model Comparison: Base vs Fine-tuned")
        gr.Markdown("Compare responses between the original model and your personalized fine-tuned version!")
        
        with gr.Tab("üîç Side-by-Side Comparison"):
            gr.Markdown("### See how fine-tuning changes the model's responses")
            
            with gr.Row():
                with gr.Column():
                    gr.Markdown("#### üì∞ Base Model (Original)")
                    base_chatbot_interface = gr.Chatbot(height=400, label="Base TinyLlama")
                
                with gr.Column():
                    gr.Markdown("#### ‚ú® Fine-tuned Model (Personalized)")
                    lora_chatbot_interface = gr.Chatbot(height=400, label="John's Personal AI")
            
            compare_msg = gr.Textbox(placeholder="Ask about John's preferences, habits, or personality...", label="Your Question")
            
            with gr.Row():
                compare_btn = gr.Button("üîç Compare Both Models", variant="primary")
                clear_compare_btn = gr.Button("Clear Both Chats")
            
            # Comparison event handlers
            compare_msg.submit(
                compare_models, 
                [compare_msg, base_chatbot_interface, lora_chatbot_interface], 
                [compare_msg, base_chatbot_interface, lora_chatbot_interface]
            )
            compare_btn.click(
                compare_models, 
                [compare_msg, base_chatbot_interface, lora_chatbot_interface], 
                [compare_msg, base_chatbot_interface, lora_chatbot_interface]
            )
            clear_compare_btn.click(
                lambda: ([], [], ""), 
                outputs=[base_chatbot_interface, lora_chatbot_interface, compare_msg]
            )
            
            # Example prompts for comparison
            gr.Examples(
                examples=[
                    "What is John's favorite weekend activity?",
                    "How does John handle stress?",
                    "What does John think about AI?",
                    "What kind of books does John enjoy?",
                    "What's John's morning routine?",
                    "How does John use social media?",
                    "What does John do when feeling stuck?"
                ],
                inputs=compare_msg,
                label="Try these comparison questions:"
            )
        
        with gr.Tab("üí¨ Chat with Fine-tuned Model"):
            gr.Markdown("### Chat exclusively with your personalized John Doe AI")
            
            lora_only_chat = gr.Chatbot(height=400, label="John's Personal AI")
            lora_msg = gr.Textbox(placeholder="Chat with the fine-tuned model...", label="Your Message")
            
            with gr.Row():
                lora_send_btn = gr.Button("Send", variant="primary")
                lora_clear_btn = gr.Button("Clear Chat")
            
            # Fine-tuned model only event handlers
            lora_msg.submit(
                lambda msg, hist: single_model_chat(msg, hist, use_lora=True), 
                [lora_msg, lora_only_chat], 
                [lora_msg, lora_only_chat]
            )
            lora_send_btn.click(
                lambda msg, hist: single_model_chat(msg, hist, use_lora=True), 
                [lora_msg, lora_only_chat], 
                [lora_msg, lora_only_chat]
            )
            lora_clear_btn.click(lambda: ([], ""), outputs=[lora_only_chat, lora_msg])
        
        with gr.Tab("üìä About This Comparison"):
            gr.Markdown("""
            ## üéØ What to Look For:
            
            ### ? Base Model Characteristics:
            - Generic responses about AI capabilities
            - General advice without personal context
            - No specific mentions of John's interests
            - Standard AI assistant language patterns
            
            ### ‚ú® Fine-tuned Model Improvements:
            - **Specific details**: Mentions of hiking, reading specific books like 'Sapiens'
            - **Personal habits**: Morning routines, coffee preferences, meditation
            - **Personality traits**: Thoughtful, nature-loving, curious about learning
            - **Consistent character**: Responses align with John's personality profile
            
            ## üìà Training Success Indicators:
            ‚úÖ **Excellent**: LoRA model gives detailed, personalized responses with specific references  
            ‚úÖ **Good**: Some personalization visible, mentions general interests  
            ‚ö†Ô∏è **Needs work**: Both models give similar generic responses  
            
            ## üîß If Models Are Too Similar:
            - Increase training epochs (7-10)
            - Higher learning rate (1e-3)
            - Increase LoRA alpha (128)
            - Add more training examples
            
            ## üìä Your Training Results:
            With a final loss of **0.155**, your model should show **excellent personalization**!
            """)
        
        gr.Markdown("---")
        gr.Markdown("üí° **Training Info**: Model fine-tuned with LoRA on 30 synthetic examples about John Doe's personality, achieving loss of 0.155")
    
    return demo

# Create and launch the comparison interface
print("üé® Creating comprehensive Gradio comparison interface...")
demo = create_gradio_interface(base_chatbot, lora_chatbot)

# Launch with public link
print("üöÄ Launching Gradio comparison interface...")
demo.launch(share=True, debug=True)

print("\n‚úÖ Gradio comparison interface is now running!")
print("üîç Use the 'Side-by-Side Comparison' tab to see the difference between models")
print("üí¨ Use the 'Chat with Fine-tuned Model' tab for regular conversation")
print("üìä Check the 'About This Comparison' tab for interpretation guidance")
print("\nClick the public link above to access your model comparison from anywhere.")

## üéØ Interactive Testing Cell

In [None]:
# Interactive testing - run this cell to test custom prompts with both models
print("üí¨ Interactive Testing")

# Check which chatbots are available
available_options = []
if 'base_chatbot' in globals() and base_chatbot is not None:
    available_options.append("1. Base model (original TinyLlama)")
if 'lora_chatbot' in globals() and lora_chatbot is not None:
    available_options.append("2. Fine-tuned model (personalized)")
if len(available_options) == 2:
    available_options.append("3. Compare both models")

if not available_options:
    print("‚ùå No chatbots available. Please run Section 25 first to create the chatbot instances.")
else:
    print("Choose which model to test:")
    for option in available_options:
        print(option)
    print("Enter your choice, then ask questions (type 'quit' to stop):\n")

    model_choice = input("Choose model: ").strip()

    if model_choice == "1" and 'base_chatbot' in globals() and base_chatbot is not None:
        print("üîÑ Using Base Model")
        active_chatbot = base_chatbot
        model_name = "Base TinyLlama"
    elif model_choice == "2" and 'lora_chatbot' in globals() and lora_chatbot is not None:
        print("‚ú® Using Fine-tuned Model")
        active_chatbot = lora_chatbot
        model_name = "John's Personal AI"
    elif model_choice == "3" and 'base_chatbot' in globals() and base_chatbot is not None and 'lora_chatbot' in globals() and lora_chatbot is not None:
        print("üîç Comparing Both Models")
        active_chatbot = None
        model_name = "Both Models"
    else:
        # Default to available chatbot
        if 'lora_chatbot' in globals() and lora_chatbot is not None:
            print("‚ú® Defaulting to Fine-tuned Model")
            active_chatbot = lora_chatbot
            model_name = "John's Personal AI"
        elif 'base_chatbot' in globals() and base_chatbot is not None:
            print("üîÑ Defaulting to Base Model")
            active_chatbot = base_chatbot
            model_name = "Base TinyLlama"
        else:
            print("‚ùå No chatbots available!")
            active_chatbot = None

    if active_chatbot is not None or model_choice == "3":
        print(f"\nüí¨ Interactive Testing with {model_name}")
        print("Enter your questions below (type 'quit' to stop):\n")

        while True:
            user_input = input("You: ").strip()
            
            if user_input.lower() in ['quit', 'exit', 'stop']:
                print("Goodbye! üëã")
                break
            
            if user_input:
                if model_choice == "3" and 'base_chatbot' in globals() and base_chatbot is not None and 'lora_chatbot' in globals() and lora_chatbot is not None:
                    # Compare both models
                    print(f"\nüì∞ Base Model Response:")
                    try:
                        base_response = base_chatbot.generate_response(user_input)
                        print(f"   {base_response}")
                    except Exception as e:
                        print(f"   ‚ùå Error: {e}")
                    
                    print(f"\n‚ú® Fine-tuned Model Response:")
                    try:
                        lora_response = lora_chatbot.generate_response(user_input)
                        print(f"   {lora_response}")
                    except Exception as e:
                        print(f"   ‚ùå Error: {e}")
                    print()
                else:  # Single model
                    try:
                        response = active_chatbot.generate_response(user_input)
                        print(f"{model_name}: {response}\n")
                    except Exception as e:
                        print(f"‚ùå Error: {e}\n")
    else:
        print("‚ùå No chatbots available for testing. Please run Section 25 first.")

---

## üéâ Congratulations!

You've successfully:
- ‚úÖ Generated synthetic personal data for John Doe
- ‚úÖ Fine-tuned TinyLlama using LoRA for memory efficiency
- ‚úÖ Created an interactive chatbot that knows about John's personality
- ‚úÖ Built a Gradio web interface for easy interaction

### Next Steps:
1. **Expand the dataset**: Add more categories of personal data
2. **Experiment with parameters**: Try different LoRA configurations
3. **Add evaluation**: Create metrics to measure personalization quality
4. **Deploy**: Host your model on Hugging Face Spaces or other platforms

### üîß Customization Ideas:
- Change the person's name and characteristics
- Add new data categories (work history, family, hobbies)
- Fine-tune on your own personal data (with privacy considerations)
- Experiment with different base models

**Happy fine-tuning!** üöÄ