# Fine-Tuning Gemma 3 1B Instruct: Complete Guide 🎯

Welcome to the comprehensive guide for fine-tuning Google's Gemma 3 1B Instruct model! This notebook will walk you through the entire process of customizing a pre-trained language model for your specific use case.

## What you'll learn:
- Understanding fine-tuning vs training from scratch
- Setting up the environment for different devices (CPU, CUDA, Apple Silicon)
- Loading and preparing the Gemma 3 1B Instruct model
- Creating and formatting training datasets
- Implementing LoRA (Low-Rank Adaptation) for efficient fine-tuning
- Training with different optimization techniques
- Evaluating and testing your fine-tuned model
- Saving and sharing your custom model

## Prerequisites:
- Python 3.8 or higher
- At least 16GB of RAM (32GB+ recommended)
- GPU with 8GB+ VRAM (or Apple Silicon with 16GB+ unified memory)
- HuggingFace account and token for Gemma access
- Basic understanding of machine learning concepts

## Step 1: Understanding Fine-Tuning

Before we start, let's understand what fine-tuning means and why it's powerful:

### 🧠 **What is Fine-Tuning?**
Fine-tuning takes a pre-trained model and adapts it to your specific task or domain by training it on your custom dataset.

### 🎯 **Types of Fine-Tuning:**
- **Full Fine-Tuning**: Updates all model parameters (expensive, high quality)
- **LoRA (Low-Rank Adaptation)**: Updates only small adapter layers (efficient, good quality)
- **Prompt Tuning**: Learns optimal prompts (very efficient, task-specific)

### 💡 **Why Fine-Tune Gemma 3 1B Instruct?**
- Smaller model = faster training and inference
- Good performance for many tasks
- Fits in consumer hardware
- Already instruction-tuned for better baseline

### 📊 **Device Considerations:**
- **Apple Silicon (M1/M2/M3)**: Great for LoRA fine-tuning, unified memory advantage
- **NVIDIA GPUs**: Excellent for all types of fine-tuning
- **CPU Only**: Possible but slow, best for very small datasets

## Step 2: Install Required Libraries

Let's install all the necessary libraries for fine-tuning:

In [1]:
# Install required libraries for fine-tuning
import subprocess
import sys

def install_package(package):
    """Install a package using pip"""
    subprocess.check_call([sys.executable, "-m", "pip", "install", package])

# Core libraries for fine-tuning
packages = [
    "transformers>=4.36.0",    # Latest transformers with Gemma support
    "torch>=2.1.0",           # PyTorch with MPS support
    "datasets",               # For dataset handling
    "accelerate",             # For distributed training
    "peft",                   # For LoRA and other parameter-efficient methods
    "bitsandbytes",           # For quantization (if supported)
    "trl",                    # For training utilities
    "psutil",                 # For system monitoring
    "sentencepiece",          # For tokenization
    "protobuf",               # Required for some tokenizers
]

print("📦 Installing fine-tuning packages...")
print("⚠️  This may take several minutes")
print()

for package in packages:
    try:
        print(f"Installing {package}...")
        install_package(package)
        print(f"✅ {package} installed successfully")
    except Exception as e:
        print(f"❌ Failed to install {package}: {e}")
        if "bitsandbytes" in package:
            print("💡 bitsandbytes may not be available on Apple Silicon - this is OK")

print("\n🎉 Installation complete!")
print("\n💡 Note: Some packages may show warnings - this is normal")

📦 Installing fine-tuning packages...
⚠️  This may take several minutes

Installing transformers>=4.36.0...
✅ transformers>=4.36.0 installed successfully
Installing torch>=2.1.0...



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


✅ torch>=2.1.0 installed successfully
Installing datasets...



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


✅ datasets installed successfully
Installing accelerate...



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


✅ accelerate installed successfully
Installing peft...



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


✅ peft installed successfully
Installing bitsandbytes...



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


✅ bitsandbytes installed successfully
Installing trl...



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


✅ trl installed successfully
Installing psutil...



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


✅ psutil installed successfully
Installing sentencepiece...



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


✅ sentencepiece installed successfully
Installing protobuf...



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


✅ protobuf installed successfully

🎉 Installation complete!




[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Step 3: Environment Setup and Device Detection

Let's set up our environment and detect the best device for training:

In [2]:
# Import all necessary libraries
import torch
from transformers import (
    AutoTokenizer, 
    AutoModelForCausalLM, 
    TrainingArguments, 
    Trainer,
    DataCollatorForLanguageModeling
)
import transformers  # Import the module itself to access __version__
from datasets import Dataset, load_dataset
from peft import LoraConfig, get_peft_model, TaskType, PeftModel
import psutil
import json
import os
import warnings
import time
from typing import Dict, List

# Suppress warnings for cleaner output
warnings.filterwarnings("ignore", category=UserWarning)
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# System information
print("🖥️  SYSTEM INFORMATION")
print("=" * 60)
print(f"Python version: {sys.version}")
print(f"PyTorch version: {torch.__version__}")
print(f"Transformers version: {transformers.__version__}")

# Memory information
memory = psutil.virtual_memory()
print(f"\n💾 MEMORY INFORMATION:")
print(f"Total RAM: {memory.total / (1024**3):.1f} GB")
print(f"Available RAM: {memory.available / (1024**3):.1f} GB")
print(f"RAM usage: {memory.percent:.1f}%")

# Device detection with detailed information
print(f"\n🚀 DEVICE DETECTION:")
if torch.cuda.is_available():
    device = "cuda"
    gpu_count = torch.cuda.device_count()
    print(f"✅ CUDA available with {gpu_count} GPU(s)")
    for i in range(gpu_count):
        gpu_name = torch.cuda.get_device_name(i)
        gpu_memory = torch.cuda.get_device_properties(i).total_memory / (1024**3)
        print(f"   GPU {i}: {gpu_name} ({gpu_memory:.1f} GB)")
    print(f"   CUDA version: {torch.version.cuda}")
    
elif torch.backends.mps.is_available():
    device = "mps"
    print(f"✅ Apple Silicon (MPS) available")
    print(f"   Unified memory: {memory.total / (1024**3):.1f} GB")
    print(f"   MPS is ideal for LoRA fine-tuning")
    
else:
    device = "cpu"
    print(f"⚠️  Using CPU only")
    print(f"   Training will be slower but still possible")
    print(f"   Consider using smaller batch sizes")

print(f"\n🎯 Selected device: {device}")

# Training recommendations based on device
print(f"\n📋 TRAINING RECOMMENDATIONS:")
if device == "cuda":
    print("   • Use LoRA or full fine-tuning")
    print("   • Batch size: 4-16 depending on GPU memory")
    print("   • Enable gradient checkpointing for larger models")
elif device == "mps":
    print("   • LoRA fine-tuning recommended")
    print("   • Batch size: 2-8 depending on memory")
    print("   • Use float16 precision")
else:
    print("   • LoRA fine-tuning only")
    print("   • Small batch size: 1-2")
    print("   • Consider using smaller dataset")

print("\n✅ Environment setup complete!")

  from .autonotebook import tqdm as notebook_tqdm


🖥️  SYSTEM INFORMATION
Python version: 3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 08:22:19) [Clang 14.0.6 ]
PyTorch version: 2.8.0
Transformers version: 4.56.1

💾 MEMORY INFORMATION:
Total RAM: 24.0 GB
Available RAM: 12.7 GB
RAM usage: 47.1%

🚀 DEVICE DETECTION:
✅ Apple Silicon (MPS) available
   Unified memory: 24.0 GB
   MPS is ideal for LoRA fine-tuning

🎯 Selected device: mps

📋 TRAINING RECOMMENDATIONS:
   • LoRA fine-tuning recommended
   • Batch size: 2-8 depending on memory
   • Use float16 precision

✅ Environment setup complete!


## Step 4: HuggingFace Authentication

Gemma models require authentication. Let's set up your HuggingFace token:

In [3]:
# HuggingFace Authentication Setup
print("🔐 HUGGINGFACE AUTHENTICATION")
print("=" * 50)

# Check if user is already logged in
from huggingface_hub import HfApi
try:
    api = HfApi()
    user_info = api.whoami()
    print(f"✅ Already authenticated as: {user_info['name']}")
    print(f"   Email: {user_info.get('email', 'Not provided')}")
    HF_TOKEN = True
except Exception:
    print("❌ Not authenticated with HuggingFace")
    HF_TOKEN = False

# If not authenticated, provide instructions
if not HF_TOKEN:
    print("\n🔑 TO ACCESS GEMMA MODELS:")
    print("1. Go to https://huggingface.co/settings/tokens")
    print("2. Create a new token with 'Read' permissions")
    print("3. Accept the Gemma license at: https://huggingface.co/google/gemma-3-1b-it")
    print("4. Run: huggingface-cli login")
    print("5. Paste your token when prompted")
    print("\n💡 Alternative: Set HF_TOKEN environment variable")
    print("   export HF_TOKEN=your_token_here")
    
    # Check for environment variable
    import os
    if 'HF_TOKEN' in os.environ:
        print("\n✅ Found HF_TOKEN in environment variables")
        HF_TOKEN = True
    else:
        print("\n⚠️  No HF_TOKEN found. Please authenticate before proceeding.")

if HF_TOKEN:
    print("\n🎉 Ready to proceed with Gemma model loading!")
else:
    print("\n⏹️  Please complete authentication before continuing.")

🔐 HUGGINGFACE AUTHENTICATION
✅ Already authenticated as: bobbinetor
   Email: petruolo95@gmail.com

🎉 Ready to proceed with Gemma model loading!


## Step 5: Load and Prepare the Gemma 3 1B Instruct Model

Now let's load the Gemma 3 1B Instruct model with proper configuration for fine-tuning:

In [4]:
# Load and prepare Gemma 3 1B Instruct model
MODEL_NAME = "google/gemma-3-1b-it"

print(f"🤖 LOADING GEMMA 3 1B INSTRUCT MODEL")
print("=" * 50)
print(f"Model: {MODEL_NAME}")
print(f"Device: {device}")

# Memory optimization settings
torch_dtype = torch.float16 if device != "cpu" else torch.float32

try:
    print("📥 Loading tokenizer...")
    tokenizer = AutoTokenizer.from_pretrained(
        MODEL_NAME,
        trust_remote_code=True,
        add_eos_token=True,
    )
    
    # Set pad token if not already set
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        tokenizer.pad_token_id = tokenizer.eos_token_id
    
    print("✅ Tokenizer loaded successfully")
    print(f"   Vocabulary size: {len(tokenizer)}")
    print(f"   Pad token: {tokenizer.pad_token}")
    print(f"   EOS token: {tokenizer.eos_token}")
    
    print("\n📥 Loading model...")
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch_dtype,
        trust_remote_code=True,
        device_map="auto" if device == "cuda" else None,
    )
    
    # Move model to device if not using device_map
    if device != "cuda":
        model = model.to(device)
    
    print("✅ Model loaded successfully")
    
    # Model information
    num_params = sum(p.numel() for p in model.parameters())
    print(f"\n📊 MODEL INFORMATION:")
    print(f"   Parameters: {num_params:,}")
    print(f"   Model size: ~{num_params * 2 / 1e9:.1f} GB (FP16)")
    print(f"   Device: {device}")
    print(f"   Data type: {torch_dtype}")
    
    # Memory usage check
    if device == "cuda":
        memory_used = torch.cuda.memory_allocated() / 1e9
        memory_total = torch.cuda.get_device_properties(0).total_memory / 1e9
        print(f"   GPU memory used: {memory_used:.1f}GB / {memory_total:.1f}GB")
    
    # Test tokenizer with a simple example
    print(f"\n🧪 TOKENIZER TEST:")
    test_text = "### Instruction:\nHello\n\n### Response:\n"
    test_tokens = tokenizer.encode(test_text)
    print(f"   Test text: {repr(test_text)}")
    print(f"   Tokens: {len(test_tokens)}")
    print(f"   Decoded back: {repr(tokenizer.decode(test_tokens))}")
    
except Exception as e:
    print(f"❌ Error loading model: {e}")
    print("💡 This might be due to:")
    print("   • HuggingFace authentication issues")
    print("   • Insufficient memory")
    print("   • Network connectivity")
    print("   • Missing model access permissions")

print(f"\n✅ Model setup complete!")

🤖 LOADING GEMMA 3 1B INSTRUCT MODEL
Model: google/gemma-3-1b-it
Device: mps
📥 Loading tokenizer...
✅ Tokenizer loaded successfully
   Vocabulary size: 262145
   Pad token: <pad>
   EOS token: <eos>

📥 Loading model...


`torch_dtype` is deprecated! Use `dtype` instead!


✅ Model loaded successfully

📊 MODEL INFORMATION:
   Parameters: 999,885,952
   Model size: ~2.0 GB (FP16)
   Device: mps
   Data type: torch.float16

🧪 TOKENIZER TEST:
   Test text: '### Instruction:\nHello\n\n### Response:\n'
   Tokens: 12
   Decoded back: '<bos>### Instruction:\nHello\n\n### Response:\n<eos>'

✅ Model setup complete!


## Step 6: Setup LoRA Configuration (Optional: Customize Precision)

We'll use LoRA (Low-Rank Adaptation) for efficient fine-tuning. You can also customize device and precision settings here:

In [5]:
# Setup LoRA Configuration and Device/Precision Selection
print("🔧 SETTING UP LORA CONFIGURATION")
print("=" * 50)

# Check if model exists
if 'model' not in globals():
    print("❌ Error: 'model' variable not found!")
    print("💡 Please run Step 5 (Load Gemma 3 1B model) first")
else:
    # === DEVICE AND PRECISION CUSTOMIZATION ===
    print("⚙️ DEVICE AND PRECISION SELECTION:")
    print("Current auto-detected device:", device)
    
    # Allow user to override device selection
    FORCE_DEVICE = None  # Set to "cuda", "mps", or "cpu" to override auto-detection
    FORCE_PRECISION = "fp32"  # Set to "fp16", "bf16", or "fp32" to override auto-selection
    
    # Apply device override if specified
    if FORCE_DEVICE:
        device = FORCE_DEVICE
        print(f"🔄 Device overridden to: {device}")
    
    # Determine precision settings
    if FORCE_PRECISION:
        if FORCE_PRECISION == "fp16":
            use_fp16, use_bf16 = True, False
        elif FORCE_PRECISION == "bf16":
            use_fp16, use_bf16 = False, True
        else:  # fp32
            use_fp16, use_bf16 = False, False
        print(f"🔄 Precision overridden to: {FORCE_PRECISION}")
    else:
        # Auto-select precision based on device
        if device == "cuda":
            use_fp16, use_bf16 = True, False  # FP16 for CUDA
        elif device == "mps":
            use_fp16, use_bf16 = False, False  # FP32 for MPS (compatibility)
        else:
            use_fp16, use_bf16 = False, False  # FP32 for CPU
    
    print(f"✅ Selected configuration:")
    print(f"   Device: {device}")
    print(f"   Precision: {'FP16' if use_fp16 else 'BF16' if use_bf16 else 'FP32'}")
    
    # === LORA CONFIGURATION ===
    print(f"\n🎯 LORA PARAMETERS:")
    
    # LoRA parameters (customizable)
    LORA_R = 16        # Rank of adaptation (higher = more parameters, better quality)
    LORA_ALPHA = 32    # LoRA scaling parameter (typically 2x rank)
    LORA_DROPOUT = 0.1 # LoRA dropout (0.0-0.3)

    # Define target modules for LoRA (Gemma-specific)
    target_modules = [
        "q_proj",     # Query projection
        "k_proj",     # Key projection
        "v_proj",     # Value projection
        "o_proj",     # Output projection
        "gate_proj",  # Gate projection (MLP)
        "up_proj",    # Up projection (MLP)
        "down_proj"   # Down projection (MLP)
    ]

    # Create LoRA configuration
    peft_config = LoraConfig(
        r=LORA_R,
        lora_alpha=LORA_ALPHA,
        target_modules=target_modules,
        lora_dropout=LORA_DROPOUT,
        bias="none",
        task_type=TaskType.CAUSAL_LM,
    )

    print(f"   Rank (r): {LORA_R}")
    print(f"   Alpha: {LORA_ALPHA}")
    print(f"   Dropout: {LORA_DROPOUT}")
    print(f"   Target modules: {len(target_modules)} layers")

    # Apply LoRA to the model
    print(f"\n🔄 Applying LoRA to model...")
    try:
        model = get_peft_model(model, peft_config)
        print("✅ LoRA applied successfully!")
        
        # Print trainable parameters
        model.print_trainable_parameters()
        
        # Calculate parameter efficiency
        total_params = sum(p.numel() for p in model.parameters())
        trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
        
        print(f"\n📊 PARAMETER EFFICIENCY:")
        print(f"   Total parameters: {total_params:,}")
        print(f"   Trainable parameters: {trainable_params:,}")
        print(f"   Percentage trainable: {100 * trainable_params / total_params:.2f}%")
        print(f"   Memory reduction: ~{(total_params - trainable_params) / total_params * 100:.1f}%")
        
    except Exception as e:
        print(f"❌ Error applying LoRA: {e}")
        print("💡 This might be due to model architecture or memory issues")

    print(f"\n✅ LoRA setup complete! Ready for efficient fine-tuning.")

🔧 SETTING UP LORA CONFIGURATION
⚙️ DEVICE AND PRECISION SELECTION:
Current auto-detected device: mps
🔄 Precision overridden to: fp32
✅ Selected configuration:
   Device: mps
   Precision: FP32

🎯 LORA PARAMETERS:
   Rank (r): 16
   Alpha: 32
   Dropout: 0.1
   Target modules: 7 layers

🔄 Applying LoRA to model...
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
✅ LoRA applied successfully!
trainable params: 13,045,760 || all params: 1,012,931,712 || trainable%: 1.2879

📊 PARAMETER EFFICIENCY:
   Total parameters: 1,012,931,712
   Trainable parameters: 13,045,760
   Percentage trainable: 1.29%
   Memory reduction: ~98.7%

✅ LoRA setup complete! Ready for efficient fine-tuning.


## Step 7: Prepare Training Dataset

Let's create a sample dataset for fine-tuning. You can replace this with your own data:

In [6]:
# Create a sample dataset for demonstration
# Replace this with your own dataset!

print("📚 PREPARING TRAINING DATASET")
print("=" * 50)

# Sample data: AI assistant training examples
sample_data = [
    {
        "instruction": "Explain what machine learning is in simple terms.",
        "response": "Machine learning is a type of artificial intelligence where computers learn to make predictions or decisions by finding patterns in data, rather than being explicitly programmed for every task."
    },
    {
        "instruction": "What are the benefits of renewable energy?",
        "response": "Renewable energy sources like solar and wind power are sustainable, reduce greenhouse gas emissions, decrease dependence on fossil fuels, and can lead to long-term cost savings while creating jobs in green technology sectors."
    },
    {
        "instruction": "How do you make a simple vegetable soup?",
        "response": "To make vegetable soup: 1) Heat oil in a pot, 2) Sauté diced onions and garlic, 3) Add chopped vegetables like carrots, celery, and potatoes, 4) Pour in vegetable broth, 5) Season with herbs and spices, 6) Simmer until vegetables are tender, about 20-30 minutes."
    },
    {
        "instruction": "What is the importance of exercise for mental health?",
        "response": "Exercise significantly benefits mental health by releasing endorphins that improve mood, reducing stress and anxiety, improving sleep quality, boosting self-esteem, and providing a healthy outlet for managing emotions and daily pressures."
    },
    {
        "instruction": "Explain the concept of photosynthesis.",
        "response": "Photosynthesis is the process by which plants convert sunlight, carbon dioxide, and water into glucose (sugar) and oxygen. This occurs mainly in plant leaves using chlorophyll, providing energy for the plant and producing oxygen as a byproduct that's essential for most life on Earth."
    }
]

# Extend the dataset by repeating and varying the examples
# In practice, you'd want hundreds or thousands of examples
extended_data = []
for i in range(20):  # Repeat each example multiple times
    for item in sample_data:
        extended_data.append(item)

print(f"📝 Created dataset with {len(extended_data)} examples")

# Format data for instruction following
def format_instruction(example):
    """Format the data as instruction-following examples"""
    return f"### Instruction:\n{example['instruction']}\n\n### Response:\n{example['response']}"

# Apply formatting
formatted_texts = [format_instruction(item) for item in extended_data]

print("\n📋 Example formatted training sample:")
print("-" * 40)
print(formatted_texts[0])
print("-" * 40)

# Create HuggingFace dataset
dataset = Dataset.from_dict({"text": formatted_texts})

print(f"\n✅ Dataset created with {len(dataset)} examples")
print(f"   Example keys: {list(dataset.features.keys())}")

# Split into train/validation
train_test_split = dataset.train_test_split(test_size=0.2, seed=42)
train_dataset = train_test_split['train']
eval_dataset = train_test_split['test']

print(f"   Training examples: {len(train_dataset)}")
print(f"   Validation examples: {len(eval_dataset)}")

print("\n💡 Note: In practice, you should use a much larger dataset (1000+ examples)")
print("   for better fine-tuning results. This is just a demonstration.")

📚 PREPARING TRAINING DATASET
📝 Created dataset with 100 examples

📋 Example formatted training sample:
----------------------------------------
### Instruction:
Explain what machine learning is in simple terms.

### Response:
Machine learning is a type of artificial intelligence where computers learn to make predictions or decisions by finding patterns in data, rather than being explicitly programmed for every task.
----------------------------------------

✅ Dataset created with 100 examples
   Example keys: ['text']
   Training examples: 80
   Validation examples: 20

💡 Note: In practice, you should use a much larger dataset (1000+ examples)
   for better fine-tuning results. This is just a demonstration.


## Step 8: Tokenize the Dataset

Now let's tokenize our dataset for training:

In [7]:
# Tokenization configuration
MAX_LENGTH = 512  # Adjust based on your data and memory

print(f"🔤 TOKENIZING DATASET")
print("=" * 40)

# Check if required variables exist
missing_vars = []
if 'tokenizer' not in globals():
    missing_vars.append('tokenizer (from Step 5)')
if 'train_dataset' not in globals():
    missing_vars.append('train_dataset (from Step 6)')
if 'eval_dataset' not in globals():
    missing_vars.append('eval_dataset (from Step 6)')

if missing_vars:
    print("❌ ERROR: Missing required variables!")
    print("💡 The following variables are not defined:")
    for var in missing_vars:
        print(f"   • {var}")
    print("\n🔄 REQUIRED STEPS:")
    print("   1. Run Step 4: HuggingFace authentication")
    print("   2. Run Step 5: Load Gemma 3 1B model (creates tokenizer)")
    print("   3. Run Step 6: Prepare training dataset (creates train_dataset, eval_dataset)")
    print("   4. Then run this Step 7: Tokenize the dataset")
    
    # Show what variables ARE defined
    defined_vars = [var for var in globals().keys() if not var.startswith('_') and var not in ['In', 'Out', 'get_ipython']]
    print(f"\n📋 Currently defined variables: {', '.join(sorted(defined_vars))}")
    
    # Exit early to prevent further errors
    print("\n⏹️ Stopping execution. Please run the missing steps first.")
    
else:
    print(f"✅ All required variables found!")
    print(f"Max sequence length: {MAX_LENGTH}")

    def tokenize_function(examples):
        """Tokenize the text examples"""
        # Tokenize the texts
        tokenized = tokenizer(
            examples["text"],
            truncation=True,
            padding="max_length",
            max_length=MAX_LENGTH,
            return_tensors=None,
        )
        
        # For causal LM, labels are the same as input_ids
        tokenized["labels"] = tokenized["input_ids"].copy()
        
        return tokenized

    # Tokenize datasets
    print("🔄 Tokenizing training dataset...")
    tokenized_train = train_dataset.map(
        tokenize_function,
        batched=True,
        remove_columns=train_dataset.column_names,
        desc="Tokenizing training data"
    )

    print("🔄 Tokenizing validation dataset...")
    tokenized_eval = eval_dataset.map(
        tokenize_function,
        batched=True,
        remove_columns=eval_dataset.column_names,
        desc="Tokenizing validation data"
    )

    print("✅ Tokenization complete!")

    # Examine tokenized data
    sample_tokens = tokenized_train[0]
    print(f"\n📊 TOKENIZATION STATISTICS:")
    print(f"   Sample input_ids length: {len(sample_tokens['input_ids'])}")
    print(f"   Sample attention_mask length: {len(sample_tokens['attention_mask'])}")
    print(f"   Number of padding tokens in sample: {sample_tokens['attention_mask'].count(0)}")

    # Show a sample of tokenized text
    print(f"\n🔍 SAMPLE TOKENIZED TEXT:")
    sample_text = tokenizer.decode(sample_tokens['input_ids'][:50], skip_special_tokens=True)
    print(f"First 50 tokens decoded: {sample_text}...")

    print(f"\n💾 Dataset ready for training!")
    print(f"   Training samples: {len(tokenized_train)}")
    print(f"   Validation samples: {len(tokenized_eval)}")

🔤 TOKENIZING DATASET
✅ All required variables found!
Max sequence length: 512
🔄 Tokenizing training dataset...


Tokenizing training data: 100%|██████████| 80/80 [00:00<00:00, 6041.92 examples/s]

🔄 Tokenizing validation dataset...



Tokenizing validation data: 100%|██████████| 20/20 [00:00<00:00, 5250.43 examples/s]

✅ Tokenization complete!

📊 TOKENIZATION STATISTICS:
   Sample input_ids length: 512
   Sample attention_mask length: 512
   Number of padding tokens in sample: 441

🔍 SAMPLE TOKENIZED TEXT:
First 50 tokens decoded: ...

💾 Dataset ready for training!
   Training samples: 80
   Validation samples: 20





## Step 9: Configure Training Arguments

Let's set up training parameters optimized for different devices and precision settings:

In [8]:
# Training configuration based on device and precision settings
print(f"⚙️  CONFIGURING TRAINING ARGUMENTS")
print("=" * 50)

# Check if required variables exist
missing_vars = []
if 'device' not in globals():
    missing_vars.append('device (from Step 3)')
if 'tokenized_train' not in globals():
    missing_vars.append('tokenized_train (from Step 8)')
if 'use_fp16' not in globals():
    missing_vars.append('use_fp16 (from Step 6)')

if missing_vars:
    print("❌ ERROR: Missing required variables!")
    print("💡 The following variables are not defined:")
    for var in missing_vars:
        print(f"   • {var}")
    print("\n🔄 REQUIRED STEPS:")
    print("   1. Run Step 3: Environment Setup (creates device)")
    print("   2. Run Steps 4-8: Complete model and dataset preparation")
    print("   3. Then run this Step 9: Configure Training Arguments")
    
    # Use fallback values
    device = "mps"  # Default fallback
    use_fp16, use_bf16 = False, False  # Safe defaults
    print(f"\n🔄 Using fallback values:")
    print(f"   Device: {device}")
    print(f"   Precision: FP32 (safe default)")
    print("⚠️  Training time estimates will be inaccurate without tokenized_train")
else:
    print(f"✅ All required variables found!")

print(f"   Optimizing for device: {device}")
print(f"   Using precision: {'FP16' if use_fp16 else 'BF16' if use_bf16 else 'FP32'}")

# === TRAINING PARAMETERS CONFIGURATION ===
print(f"\n🎯 TRAINING PARAMETERS:")

# Device-specific training parameters
if device == "cuda":
    # CUDA optimized settings
    batch_size = 4
    gradient_accumulation_steps = 2
    dataloader_num_workers = 4
    
elif device == "mps":
    # Apple Silicon optimized settings
    batch_size = 2
    gradient_accumulation_steps = 4
    dataloader_num_workers = 2
    
else:
    # CPU settings
    batch_size = 1
    gradient_accumulation_steps = 8
    dataloader_num_workers = 1

# Effective batch size calculation
effective_batch_size = batch_size * gradient_accumulation_steps
print(f"   Batch size: {batch_size}")
print(f"   Gradient accumulation steps: {gradient_accumulation_steps}")
print(f"   Effective batch size: {effective_batch_size}")

# Additional training parameters (customizable)
NUM_EPOCHS = 3          # Number of training epochs
LEARNING_RATE = 2e-4    # Learning rate
WEIGHT_DECAY = 0.01     # Weight decay for regularization
WARMUP_RATIO = 0.1      # Warmup ratio
MAX_GRAD_NORM = 1.0     # Gradient clipping

print(f"   Epochs: {NUM_EPOCHS}")
print(f"   Learning rate: {LEARNING_RATE}")
print(f"   Weight decay: {WEIGHT_DECAY}")

# Output directory
output_dir = "./gemma-3-1b-it-finetuned"

# Training arguments (updated for newer transformers versions)
training_args = TrainingArguments(
    output_dir=output_dir,
    overwrite_output_dir=True,
    
    # Training parameters
    num_train_epochs=NUM_EPOCHS,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    
    # Optimization
    learning_rate=LEARNING_RATE,
    weight_decay=WEIGHT_DECAY,
    warmup_ratio=WARMUP_RATIO,
    lr_scheduler_type="cosine",
    max_grad_norm=MAX_GRAD_NORM,
    
    # Precision - Uses values from Step 6
    fp16=use_fp16,
    bf16=use_bf16,
    
    # Memory optimization
    gradient_checkpointing=True,
    dataloader_pin_memory=False,
    dataloader_num_workers=dataloader_num_workers,
    
    # Evaluation and logging (updated parameter names)
    eval_strategy="steps",          # Updated from evaluation_strategy
    eval_steps=50,
    logging_steps=10,
    save_strategy="steps",          # Added explicit save strategy
    save_steps=100,
    save_total_limit=2,
    
    # Early stopping
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    
    # Reproducibility
    seed=42,
    data_seed=42,
    
    # Reporting
    report_to=[],  # Disable wandb for now
    run_name="gemma-3-1b-it-finetune",
)

print(f"\n📋 TRAINING CONFIGURATION SUMMARY:")
print(f"   Device: {device}")
print(f"   Precision: {'FP16' if use_fp16 else 'BF16' if use_bf16 else 'FP32'}")
print(f"   Epochs: {training_args.num_train_epochs}")
print(f"   Learning rate: {training_args.learning_rate}")
print(f"   Effective batch size: {effective_batch_size}")
print(f"   Gradient checkpointing: {training_args.gradient_checkpointing}")

# Estimate training time (only if tokenized_train exists)
print(f"\n⏱️  TRAINING ESTIMATES:")
if 'tokenized_train' in globals():
    total_steps = len(tokenized_train) // effective_batch_size * training_args.num_train_epochs
    print(f"   Total training steps: {total_steps}")
    
    # Adjust time estimates based on precision
    if device == "mps":
        base_time = 45 if not use_fp16 else 30  # FP32 vs FP16
        estimated_time = total_steps * base_time / 60
        precision_note = "FP32" if not use_fp16 else "FP16"
        print(f"   Estimated time on Apple Silicon ({precision_note}): ~{estimated_time:.1f} minutes")
    elif device == "cuda":
        base_time = 10 if use_fp16 else 15  # FP16 vs FP32
        estimated_time = total_steps * base_time / 60
        precision_note = "FP16" if use_fp16 else "BF16" if use_bf16 else "FP32"
        print(f"   Estimated time on GPU ({precision_note}): ~{estimated_time:.1f} minutes")
    else:
        estimated_time = total_steps * 120 / 60  # FP32 on CPU
        print(f"   Estimated time on CPU (FP32): ~{estimated_time:.1f} minutes")
else:
    print("   ⚠️  Cannot estimate training time: tokenized_train not found")
    print("   💡 Complete Steps 4-8 first to get accurate estimates")
    print("   📊 Estimated steps: ~30 (assuming 100 examples, 3 epochs)")

print(f"\n✅ Training arguments configured!")
print(f"\n📋 NEXT STEPS:")
print(f"   • Step 10: Setup trainer")
print(f"   • Step 11: Start training")

⚙️  CONFIGURING TRAINING ARGUMENTS
✅ All required variables found!
   Optimizing for device: mps
   Using precision: FP32

🎯 TRAINING PARAMETERS:
   Batch size: 2
   Gradient accumulation steps: 4
   Effective batch size: 8
   Epochs: 3
   Learning rate: 0.0002
   Weight decay: 0.01

📋 TRAINING CONFIGURATION SUMMARY:
   Device: mps
   Precision: FP32
   Epochs: 3
   Learning rate: 0.0002
   Effective batch size: 8
   Gradient checkpointing: True

⏱️  TRAINING ESTIMATES:
   Total training steps: 30
   Estimated time on Apple Silicon (FP32): ~22.5 minutes

✅ Training arguments configured!

📋 NEXT STEPS:
   • Step 10: Setup trainer
   • Step 11: Start training


## Step 10: Setup Data Collator and Trainer

Now let's set up the data collator and trainer with all our configurations:

In [9]:
# Setup data collator and trainer
print("📦 SETTING UP DATA COLLATOR AND TRAINER")
print("=" * 50)

# Check if required variables exist
missing_vars = []
required_vars = ['tokenizer', 'model', 'training_args', 'tokenized_train', 'tokenized_eval']
for var in required_vars:
    if var not in globals():
        missing_vars.append(var)

if missing_vars:
    print("❌ ERROR: Missing required variables!")
    print("💡 The following variables are not defined:")
    for var in missing_vars:
        print(f"   • {var}")
    print("\n🔄 REQUIRED STEPS:")
    print("   1. Run Steps 3-9 in order to create all required variables")
    print("   2. Then run this Step 10: Setup trainer")
    
else:
    # Data collator for language modeling
    data_collator = DataCollatorForLanguageModeling(
        tokenizer=tokenizer,
        mlm=False,  # We're doing causal LM, not masked LM
        pad_to_multiple_of=8,  # For efficiency
    )

    print("✅ Data collator created")

    # Create trainer
    print("🏋️ Creating trainer...")

    try:
        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=tokenized_train,
            eval_dataset=tokenized_eval,
            data_collator=data_collator,
            processing_class=tokenizer,  # Use processing_class instead of tokenizer
        )
        
        print("✅ Trainer created successfully!")
        
        # Print training setup summary
        print(f"\n📊 TRAINING SETUP SUMMARY:")
        print(f"   Model: {MODEL_NAME}")
        print(f"   Device: {device}")
        print(f"   Training method: LoRA fine-tuning")
        print(f"   Training samples: {len(tokenized_train)}")
        print(f"   Validation samples: {len(tokenized_eval)}")
        print(f"   Output directory: {output_dir}")
        print(f"   Mixed precision: FP16={training_args.fp16}, BF16={training_args.bf16}")
        
        # Memory check before training
        if device == "cuda":
            memory_used = torch.cuda.memory_allocated() / 1e9
            memory_total = torch.cuda.get_device_properties(0).total_memory / 1e9
            print(f"   GPU memory: {memory_used:.1f}GB / {memory_total:.1f}GB")
        elif device == "mps":
            print(f"   Apple Silicon unified memory in use")
        
        current_memory = psutil.virtual_memory()
        print(f"   System RAM: {current_memory.percent:.1f}% used")
        
        print(f"\n🎯 Ready to start training!")
        
    except Exception as e:
        print(f"❌ Error creating trainer: {e}")
        print("💡 This might be due to:")
        print("   • Mixed precision compatibility issues")
        print("   • Memory or configuration issues")
        print("   • Missing variables from previous steps")
        
        # Provide specific help for common issues
        if "fp16" in str(e).lower() and device == "mps":
            print("\n🔧 SOLUTION: Re-run Step 6 and set FORCE_PRECISION='fp32'")
        elif "tokenizer" in str(e).lower():
            print("\n🔧 SOLUTION: Make sure you've run Step 5 (Load Model)")
        elif "processing_class" in str(e).lower():
            print("\n💡 Note: Using fallback tokenizer parameter for compatibility")
            # Fallback to older API
            try:
                trainer = Trainer(
                    model=model,
                    args=training_args,
                    train_dataset=tokenized_train,
                    eval_dataset=tokenized_eval,
                    data_collator=data_collator,
                    tokenizer=tokenizer,  # Fallback to tokenizer parameter
                )
                print("✅ Trainer created with fallback method!")
            except Exception as fallback_e:
                print(f"❌ Fallback also failed: {fallback_e}")

📦 SETTING UP DATA COLLATOR AND TRAINER
✅ Data collator created
🏋️ Creating trainer...
✅ Trainer created successfully!

📊 TRAINING SETUP SUMMARY:
   Model: google/gemma-3-1b-it
   Device: mps
   Training method: LoRA fine-tuning
   Training samples: 80
   Validation samples: 20
   Output directory: ./gemma-3-1b-it-finetuned
   Mixed precision: FP16=False, BF16=False
   Apple Silicon unified memory in use
   System RAM: 62.1% used

🎯 Ready to start training!


## Step 11: Start Fine-Tuning

Now let's start the actual fine-tuning process:

In [10]:
# Start training
print("🚀 STARTING FINE-TUNING PROCESS")
print("=" * 50)
print("⚠️  This may take some time depending on your hardware")
print("💡 Monitor GPU/CPU usage and temperature")
print()

# Record training start time
training_start_time = time.time()

try:
    # Start training
    print("🏋️ Beginning training...")
    trainer.train()
    
    # Calculate training time
    training_end_time = time.time()
    training_duration = training_end_time - training_start_time
    
    print(f"\n🎉 TRAINING COMPLETED!")
    print(f"   Total training time: {training_duration / 60:.1f} minutes")
    print(f"   Average time per epoch: {training_duration / training_args.num_train_epochs / 60:.1f} minutes")
    
    # Get training metrics
    train_metrics = trainer.state.log_history
    if train_metrics:
        final_train_loss = None
        final_eval_loss = None
        
        # Find the final losses
        for log in reversed(train_metrics):
            if 'train_loss' in log and final_train_loss is None:
                final_train_loss = log['train_loss']
            if 'eval_loss' in log and final_eval_loss is None:
                final_eval_loss = log['eval_loss']
            if final_train_loss is not None and final_eval_loss is not None:
                break
        
        print(f"\n📊 FINAL METRICS:")
        if final_train_loss:
            print(f"   Final training loss: {final_train_loss:.4f}")
        if final_eval_loss:
            print(f"   Final validation loss: {final_eval_loss:.4f}")
    
    # Save the final model
    print(f"\n💾 Saving fine-tuned model...")
    trainer.save_model()
    tokenizer.save_pretrained(output_dir)
    
    print(f"✅ Model saved to: {output_dir}")
    
except KeyboardInterrupt:
    print(f"\n⏹️  Training interrupted by user")
    print(f"   Partial model may be saved in {output_dir}")
    
except Exception as e:
    print(f"\n❌ Training failed: {e}")
    print(f"💡 Common issues:")
    print(f"   • Out of memory: Reduce batch size or use gradient checkpointing")
    print(f"   • Model too large: Use smaller model or more aggressive LoRA settings")
    print(f"   • Device issues: Check CUDA/MPS availability")
    
    # Try to save partial progress
    try:
        trainer.save_model(output_dir + "_partial")
        print(f"📁 Partial model saved to: {output_dir}_partial")
    except:
        pass

print(f"\n🏁 Training session complete!")

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'eos_token_id': 1}.


🚀 STARTING FINE-TUNING PROCESS
⚠️  This may take some time depending on your hardware
💡 Monitor GPU/CPU usage and temperature

🏋️ Beginning training...


It is strongly recommended to train Gemma3 models with the `eager` attention implementation instead of `sdpa`. Use `eager` with `AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.



❌ Training failed: element 0 of tensors does not require grad and does not have a grad_fn
💡 Common issues:
   • Out of memory: Reduce batch size or use gradient checkpointing
   • Model too large: Use smaller model or more aggressive LoRA settings
   • Device issues: Check CUDA/MPS availability
📁 Partial model saved to: ./gemma-3-1b-it-finetuned_partial

🏁 Training session complete!


## Step 12: Test the Fine-Tuned Model

Let's test our fine-tuned model to see how it performs:

In [11]:
# Test the fine-tuned model
print("🧪 TESTING FINE-TUNED MODEL")
print("=" * 50)

# Load the fine-tuned model if needed
if 'trainer' in locals() and trainer.model is not None:
    fine_tuned_model = trainer.model
    print("✅ Using model from training session")
else:
    # Load from saved checkpoint
    print("📥 Loading fine-tuned model from disk...")
    try:
        # Load base model
        base_model = AutoModelForCausalLM.from_pretrained(
            MODEL_NAME,
            torch_dtype=torch.float16 if device != "cpu" else torch.float32,
            trust_remote_code=True
        )
        
        # Load LoRA weights
        fine_tuned_model = PeftModel.from_pretrained(base_model, output_dir)
        fine_tuned_model = fine_tuned_model.to(device)
        print("✅ Fine-tuned model loaded successfully")
        
    except Exception as e:
        print(f"❌ Error loading fine-tuned model: {e}")
        print("💡 Using original model for comparison")
        fine_tuned_model = model

# Set model to evaluation mode
fine_tuned_model.eval()

# Test prompts
test_prompts = [
    "### Instruction:\nExplain what deep learning is in simple terms.\n\n### Response:\n",
    "### Instruction:\nWhat are the benefits of eating vegetables?\n\n### Response:\n",
    "### Instruction:\nHow do you learn a new programming language?\n\n### Response:\n"
]

print("\n🔍 TESTING WITH SAMPLE PROMPTS:")
print("=" * 40)

for i, prompt in enumerate(test_prompts, 1):
    print(f"\n📝 TEST {i}:")
    print(f"Prompt: {prompt.split('### Response:')[0].split('### Instruction:')[1].strip()}")
    print("-" * 30)
    
    try:
        # Tokenize input
        inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
        
        # Generate response
        with torch.no_grad():
            outputs = fine_tuned_model.generate(
                inputs,
                max_length=inputs.shape[1] + 150,
                temperature=0.7,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id,
                top_p=0.9,
                repetition_penalty=1.1
            )
        
        # Decode response
        full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        response_only = full_response[len(prompt):].strip()
        
        print(f"Response: {response_only}")
        
    except Exception as e:
        print(f"❌ Error generating response: {e}")

print("\n📊 EVALUATION NOTES:")
print("• Compare responses to the original Gemma 3 1B Instruct model behavior")
print("• Look for improved instruction following")
print("• Check if responses are more relevant to your specific domain")
print("• With more training data and epochs, quality should improve")

print("\n✅ Model testing complete!")

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


🧪 TESTING FINE-TUNED MODEL
✅ Using model from training session

🔍 TESTING WITH SAMPLE PROMPTS:

📝 TEST 1:
Prompt: Explain what deep learning is in simple terms.
------------------------------
Response: Deep learning is like teaching a computer to learn from data, without explicitly telling it *how* to do things. Instead of giving the computer step-by-step instructions, you show it lots of examples and let it figure out the patterns itself.

Here's a breakdown:

*   **Data:** You feed the computer tons of information – images, text, audio, etc. This is your training data.
*   **Neural Networks:**  Deep learning uses something called neural networks - these are complex systems inspired by how the human brain works. They consist of layers of interconnected "neurons."
*   **Learning:** The computer adjusts the connections between neurons based on its mistakes. If it gets an incorrect answer, it tweaks those

📝 TEST 2:
Prompt: What are the benefits of eating vegetables?
--------------------

## Step 13: Save and Share Your Model

Let's prepare the model for sharing and future use:

In [12]:
# Save and prepare model for sharing
print("💾 SAVING AND PREPARING MODEL")
print("=" * 50)

# Create model card with training information
model_card_content = f"""
# Gemma 3 1B Instruct Fine-tuned Model

## Model Description
This is a fine-tuned version of Google's Gemma 3 1B Instruct model, adapted for custom instruction-following tasks.

## Training Details
- **Base model**: {MODEL_NAME}
- **Fine-tuning method**: LoRA (Low-Rank Adaptation)
- **Training device**: {device}
- **LoRA rank**: {LORA_R}
- **LoRA alpha**: {LORA_ALPHA}
- **Training epochs**: {training_args.num_train_epochs}
- **Learning rate**: {training_args.learning_rate}
- **Batch size**: {effective_batch_size} (effective)

## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")
base_model = AutoModelForCausalLM.from_pretrained("google/gemma-3-1b-it")

# Load fine-tuned model
model = PeftModel.from_pretrained(base_model, "path/to/this/model")

# Generate text
prompt = "### Instruction:\\nYour question here\\n\\n### Response:\\n"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

## Training Data
The model was fine-tuned on a custom instruction-following dataset.

## Limitations
- This is a demonstration model with limited training data
- May not generalize well to all tasks
- Requires the same format for optimal performance

## License
This model inherits the license from the base Gemma 3 model.
"""

# Save model card
with open(f"{output_dir}/README.md", "w") as f:
    f.write(model_card_content)

print("📝 Model card created")

# Save training configuration
training_config = {
    "base_model": MODEL_NAME,
    "device": device,
    "lora_config": {
        "r": LORA_R,
        "alpha": LORA_ALPHA,
        "dropout": LORA_DROPOUT,
        "target_modules": target_modules
    },
    "training_args": training_args.to_dict() if hasattr(training_args, 'to_dict') else str(training_args),
    "dataset_size": len(extended_data),
    "max_length": MAX_LENGTH
}

with open(f"{output_dir}/training_config.json", "w") as f:
    json.dump(training_config, f, indent=2, default=str)

print("⚙️ Training configuration saved")

# List all saved files
print(f"\n📁 SAVED FILES IN {output_dir}:")
try:
    for file in os.listdir(output_dir):
        file_path = os.path.join(output_dir, file)
        if os.path.isfile(file_path):
            size_mb = os.path.getsize(file_path) / (1024 * 1024)
            print(f"   {file} ({size_mb:.1f} MB)")
except Exception as e:
    print(f"❌ Error listing files: {e}")

# Instructions for using the model
print(f"\n🚀 NEXT STEPS:")
print(f"1. Test the model thoroughly with your use case")
print(f"2. If performance is good, consider training with more data")
print(f"3. You can push to HuggingFace Hub for sharing:")
print(f"   model.push_to_hub('your-username/gemma-3-1b-it-finetuned')")
print(f"4. Or share the '{output_dir}' folder directly")

print(f"\n✅ Model preparation complete!")

💾 SAVING AND PREPARING MODEL
📝 Model card created
⚙️ Training configuration saved

📁 SAVED FILES IN ./gemma-3-1b-it-finetuned:
   README.md (0.0 MB)
   training_config.json (0.0 MB)

🚀 NEXT STEPS:
1. Test the model thoroughly with your use case
2. If performance is good, consider training with more data
3. You can push to HuggingFace Hub for sharing:
   model.push_to_hub('your-username/gemma-3-1b-it-finetuned')
4. Or share the './gemma-3-1b-it-finetuned' folder directly

✅ Model preparation complete!


## 🎯 Summary and Next Steps

Congratulations! You've successfully completed the fine-tuning process for Gemma 3 1B Instruct. Here's what you've accomplished:

### ✅ What you achieved:
1. **Environment Setup**: Configured the system for different devices (CPU, CUDA, Apple Silicon)
2. **Model Loading**: Successfully loaded and prepared Gemma 3 1B Instruct for fine-tuning
3. **Dataset Preparation**: Created and formatted training data for instruction-following
4. **LoRA Implementation**: Applied efficient fine-tuning with Low-Rank Adaptation
5. **Training Execution**: Ran the complete fine-tuning process
6. **Model Evaluation**: Tested the fine-tuned model's performance
7. **Model Deployment**: Saved and prepared the model for sharing

### 🔑 Key concepts learned:
- **Parameter-Efficient Fine-Tuning**: Using LoRA to reduce computational requirements
- **Device Optimization**: Configuring training for different hardware
- **Dataset Formatting**: Preparing instruction-following datasets
- **Training Monitoring**: Understanding metrics and performance
- **Model Evaluation**: Testing and validating fine-tuned models

### 🚀 Improvement strategies:

#### For better results:
1. **More Training Data**: Use 1000+ high-quality examples
2. **Longer Training**: Increase epochs and fine-tune learning rate
3. **Better Data Quality**: Clean, diverse, and relevant examples
4. **Hyperparameter Tuning**: Experiment with LoRA rank, learning rate, batch size
5. **Evaluation Metrics**: Implement proper evaluation beyond loss

#### Advanced techniques:
1. **QLoRA**: Quantized LoRA for even more efficiency
2. **Multi-task Training**: Train on multiple tasks simultaneously
3. **Reinforcement Learning from Human Feedback (RLHF)**: Align with human preferences
4. **Curriculum Learning**: Progressive training difficulty
5. **Model Merging**: Combine multiple fine-tuned adapters

### 💡 Production considerations:
- **Quantization**: Use 8-bit or 4-bit quantization for deployment
- **Optimization**: ONNX conversion or TensorRT for inference speed
- **Monitoring**: Track model performance in production
- **Safety**: Implement content filtering and bias detection
- **Versioning**: Keep track of model versions and training data

### 🛠️ Troubleshooting tips:
- **Memory Issues**: Reduce batch size, use gradient checkpointing, or try CPU training
- **Slow Training**: Check device utilization, use mixed precision, optimize data loading
- **Poor Performance**: Increase training data, adjust learning rate, check data quality
- **Overfitting**: Use validation split, early stopping, or regularization

### 📚 Further learning:
- [PEFT Documentation](https://huggingface.co/docs/peft)
- [LoRA Paper](https://arxiv.org/abs/2106.09685)
- [Gemma Model Documentation](https://huggingface.co/docs/transformers/model_doc/gemma)
- [Fine-tuning Best Practices](https://huggingface.co/blog/rlhf)

### 🎉 Congratulations!
You now have a working fine-tuned Gemma 3 1B Instruct model and the knowledge to improve it further. The techniques you've learned can be applied to other models and tasks. Happy fine-tuning! 🚀