# üöÄ Jenga-AI GPU Finetuning on Google Colab

This notebook provides a complete pipeline for finetuning LLMs using the Jenga-AI framework with GPU acceleration on Google Colab.

**Runtime Requirements:**
- Go to `Runtime` ‚Üí `Change runtime type`
- Select `T4 GPU` as Hardware accelerator
- Click `Save`

## 1. Setup and Installation

In [None]:
# Check GPU availability
import subprocess
import sys

# Check if we have GPU
gpu_info = subprocess.run(['nvidia-smi'], capture_output=True, text=True)
if gpu_info.returncode == 0:
    print("‚úÖ GPU Available:")
    print(gpu_info.stdout)
else:
    print("‚ö†Ô∏è No GPU detected. Please enable GPU runtime in Colab.")
    print("Go to Runtime ‚Üí Change runtime type ‚Üí Select T4 GPU")

In [None]:
# Install required packages
!pip install -q torch transformers accelerate peft datasets mlflow bitsandbytes scipy
!pip install -q huggingface_hub tokenizers safetensors tqdm pyyaml

In [None]:
# Verify installations
import torch
import transformers
import mlflow
import peft

print(f"‚úÖ PyTorch version: {torch.__version__}")
print(f"‚úÖ Transformers version: {transformers.__version__}")
print(f"‚úÖ PEFT version: {peft.__version__}")
print(f"‚úÖ MLflow version: {mlflow.__version__}")
print(f"‚úÖ CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úÖ GPU: {torch.cuda.get_device_name(0)}")
    print(f"‚úÖ Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

## 2. Create Synthetic Training Data

In [None]:
import json
import random
from typing import List, Dict

def generate_synthetic_data(num_samples: int = 500) -> List[Dict]:
    """Generate synthetic conversational data for training"""
    
    # Templates for generating diverse conversations
    contexts = [
        "customer support",
        "technical assistance",
        "general inquiry",
        "complaint resolution",
        "product information"
    ]
    
    greetings = [
        "Hello, how can I help you today?",
        "Welcome! What can I assist you with?",
        "Good day! How may I help you?",
        "Thank you for calling. What brings you here today?"
    ]
    
    issues = [
        "I'm having trouble with my account",
        "I need information about your services",
        "There's a problem with my recent order",
        "I'd like to know more about pricing",
        "Can you help me with a technical issue?"
    ]
    
    responses = [
        "I understand your concern. Let me help you with that.",
        "I'll be happy to assist you with this issue.",
        "Thank you for bringing this to our attention.",
        "Let me look into that for you right away."
    ]
    
    resolutions = [
        "I've resolved the issue for you. Is there anything else?",
        "The problem has been fixed. Please let me know if you need more help.",
        "Everything should be working now. Thank you for your patience.",
        "I've updated your information. The changes will take effect shortly."
    ]
    
    data = []
    
    for i in range(num_samples):
        # Create a conversation
        conversation = [
            random.choice(greetings),
            random.choice(issues),
            random.choice(responses),
            "Can you provide more details about your issue?",
            "Yes, " + random.choice([
                "it started happening yesterday",
                "I've tried restarting but it didn't help",
                "this has been ongoing for a week",
                "I followed the instructions but still have problems"
            ]),
            random.choice(resolutions),
            "Thank you for your help!",
            "You're welcome! Have a great day!"
        ]
        
        # Join into text
        text = "\n".join([f"{['Agent', 'Customer'][j%2]}: {msg}" 
                         for j, msg in enumerate(conversation)])
        
        data.append({
            "text": text,
            "sample_id": f"synthetic_{i:04d}",
            "context": random.choice(contexts),
            "quality_level": random.choice(["good", "excellent", "fair"]),
            "length": len(text.split())
        })
    
    return data

# Generate synthetic data
print("üîÑ Generating synthetic training data...")
synthetic_data = generate_synthetic_data(500)
print(f"‚úÖ Generated {len(synthetic_data)} synthetic samples")

# Show sample
print("\nüìù Sample data:")
print(synthetic_data[0]['text'][:500] + "...")

## 3. Initialize MLflow Tracking

In [None]:
import mlflow
import mlflow.pytorch
from datetime import datetime

# Setup MLflow
mlflow.set_tracking_uri("file:///content/mlruns")
experiment_name = "gpu-llm-finetuning"

# Create or get experiment
try:
    experiment_id = mlflow.create_experiment(
        experiment_name,
        artifact_location="/content/mlruns"
    )
except:
    experiment = mlflow.get_experiment_by_name(experiment_name)
    experiment_id = experiment.experiment_id

mlflow.set_experiment(experiment_name)
print(f"‚úÖ MLflow experiment: {experiment_name}")
print(f"‚úÖ Experiment ID: {experiment_id}")

## 4. Prepare Dataset for Training

In [None]:
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer
import torch

class ConversationDataset(Dataset):
    def __init__(self, data, tokenizer, max_length=512):
        self.data = data
        self.tokenizer = tokenizer
        self.max_length = max_length
    
    def __len__(self):
        return len(self.data)
    
    def __getitem__(self, idx):
        item = self.data[idx]
        text = item['text']
        
        # Tokenize
        encoding = self.tokenizer(
            text,
            truncation=True,
            padding="max_length",
            max_length=self.max_length,
            return_tensors="pt"
        )
        
        return {
            'input_ids': encoding['input_ids'].squeeze(),
            'attention_mask': encoding['attention_mask'].squeeze(),
            'labels': encoding['input_ids'].squeeze()
        }

# Initialize tokenizer
model_name = "microsoft/DialoGPT-small"  # Good for conversations
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

# Split data
train_size = int(0.9 * len(synthetic_data))
train_data = synthetic_data[:train_size]
val_data = synthetic_data[train_size:]

print(f"‚úÖ Training samples: {len(train_data)}")
print(f"‚úÖ Validation samples: {len(val_data)}")

# Create datasets
train_dataset = ConversationDataset(train_data, tokenizer, max_length=256)
val_dataset = ConversationDataset(val_data, tokenizer, max_length=256)

print(f"‚úÖ Datasets created with max_length=256")

## 5. Setup Model with PEFT/LoRA for Efficient Training

In [None]:
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model, TaskType
import torch

# Load base model
print("üîÑ Loading base model...")
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,  # Use FP16 for T4 GPU
    device_map="auto"
)

# Configure LoRA
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=16,  # Rank
    lora_alpha=32,
    lora_dropout=0.1,
    bias="none",
    target_modules=["c_attn", "c_proj"]  # DialoGPT attention layers
)

# Apply LoRA
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

# Move to GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

print(f"‚úÖ Model loaded on {device}")
print(f"‚úÖ Model size: {sum(p.numel() for p in model.parameters()) / 1e6:.2f}M parameters")
print(f"‚úÖ Trainable: {sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.2f}M parameters")

## 6. Configure Training Arguments

In [None]:
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling

# Training arguments optimized for T4 GPU
training_args = TrainingArguments(
    output_dir="./results",
    overwrite_output_dir=True,
    
    # Training parameters
    num_train_epochs=3,
    per_device_train_batch_size=4,  # T4 can handle batch size 4-8
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=4,  # Effective batch size = 16
    
    # Optimization
    learning_rate=5e-4,
    weight_decay=0.01,
    warmup_steps=100,
    lr_scheduler_type="cosine",
    
    # GPU optimization
    fp16=True,  # Mixed precision training
    gradient_checkpointing=True,
    optim="adamw_torch",
    
    # Logging
    logging_dir='./logs',
    logging_steps=10,
    evaluation_strategy="steps",
    eval_steps=50,
    save_strategy="steps",
    save_steps=100,
    save_total_limit=2,
    
    # MLflow
    report_to=["mlflow"],
    run_name=f"gpu-finetune-{datetime.now().strftime('%Y%m%d-%H%M%S')}",
    
    # Other
    load_best_model_at_end=True,
    metric_for_best_model="loss",
    greater_is_better=False,
    push_to_hub=False,
)

print("‚úÖ Training arguments configured for T4 GPU")
print(f"üìä Effective batch size: {training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps}")
print(f"üìä Total training steps: ~{len(train_dataset) // (training_args.per_device_train_batch_size * training_args.gradient_accumulation_steps) * training_args.num_train_epochs}")

## 7. Train the Model

In [None]:
# Data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # Causal LM
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
)

print("üöÄ Starting training...")
print("This will take approximately 5-10 minutes on T4 GPU\n")

# Start MLflow run
with mlflow.start_run() as run:
    # Log parameters
    mlflow.log_params({
        "model_name": model_name,
        "num_train_samples": len(train_dataset),
        "num_val_samples": len(val_dataset),
        "batch_size": training_args.per_device_train_batch_size,
        "learning_rate": training_args.learning_rate,
        "num_epochs": training_args.num_train_epochs,
        "lora_r": peft_config.r,
        "lora_alpha": peft_config.lora_alpha,
        "max_length": 256,
        "device": str(device),
    })
    
    # Train
    train_result = trainer.train()
    
    # Log final metrics
    mlflow.log_metrics({
        "final_train_loss": train_result.metrics["train_loss"],
        "total_steps": train_result.metrics["train_steps"],
        "training_time": train_result.metrics["train_runtime"],
    })
    
    print(f"\n‚úÖ Training completed!")
    print(f"üìä Final loss: {train_result.metrics['train_loss']:.4f}")
    print(f"‚è±Ô∏è Training time: {train_result.metrics['train_runtime']:.2f} seconds")
    print(f"üîó MLflow run ID: {run.info.run_id}")

## 8. Evaluate the Model

In [None]:
# Evaluate on validation set
print("üîÑ Evaluating model...")
eval_results = trainer.evaluate()

print("\nüìä Evaluation Results:")
for key, value in eval_results.items():
    print(f"  {key}: {value:.4f}")
    
# Log to MLflow
with mlflow.start_run(run_id=run.info.run_id):
    mlflow.log_metrics({
        f"eval_{key}": value 
        for key, value in eval_results.items()
    })

## 9. Test Generated Responses

In [None]:
def generate_response(prompt, model, tokenizer, max_length=100):
    """Generate a response from the fine-tuned model"""
    
    # Tokenize input
    inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            inputs,
            max_length=max_length,
            num_return_sequences=1,
            temperature=0.7,
            pad_token_id=tokenizer.eos_token_id,
            do_sample=True,
            top_p=0.9,
        )
    
    # Decode
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

# Test prompts
test_prompts = [
    "Agent: Hello, how can I help you today?\nCustomer: I'm having trouble with my account\nAgent:",
    "Agent: Welcome! What can I assist you with?\nCustomer: I need information about pricing\nAgent:",
    "Agent: Thank you for calling. How may I help?\nCustomer: There's a technical issue\nAgent:",
]

print("ü§ñ Testing fine-tuned model responses:\n")
for i, prompt in enumerate(test_prompts, 1):
    print(f"Test {i}:")
    print(f"Prompt: {prompt}")
    response = generate_response(prompt, model, tokenizer)
    print(f"Response: {response}\n")
    print("-" * 50)

## 10. Save the Fine-tuned Model

In [None]:
# Save model and tokenizer
output_dir = "./finetuned_model"

print("üíæ Saving fine-tuned model...")

# Save the model
trainer.save_model(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"‚úÖ Model saved to: {output_dir}")

# Save training configuration
import json
config_data = {
    "model_name": model_name,
    "training_args": training_args.to_dict(),
    "peft_config": {
        "r": peft_config.r,
        "lora_alpha": peft_config.lora_alpha,
        "lora_dropout": peft_config.lora_dropout,
        "target_modules": peft_config.target_modules,
    },
    "performance": {
        "final_train_loss": train_result.metrics["train_loss"],
        "eval_loss": eval_results["eval_loss"],
        "training_time": train_result.metrics["train_runtime"],
    }
}

with open(f"{output_dir}/training_config.json", "w") as f:
    json.dump(config_data, f, indent=2)

print(f"‚úÖ Configuration saved to: {output_dir}/training_config.json")

## 11. Create Model Card and Summary

In [None]:
model_card = f"""# Fine-tuned Conversational Model

## Model Details
- **Base Model**: {model_name}
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Training Device**: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}
- **Training Date**: {datetime.now().strftime('%Y-%m-%d')}

## Training Configuration
- **Training Samples**: {len(train_dataset)}
- **Validation Samples**: {len(val_dataset)}
- **Batch Size**: {training_args.per_device_train_batch_size}
- **Learning Rate**: {training_args.learning_rate}
- **Epochs**: {training_args.num_train_epochs}
- **LoRA Rank**: {peft_config.r}
- **LoRA Alpha**: {peft_config.lora_alpha}

## Performance Metrics
- **Final Training Loss**: {train_result.metrics['train_loss']:.4f}
- **Evaluation Loss**: {eval_results['eval_loss']:.4f}
- **Training Time**: {train_result.metrics['train_runtime']:.2f} seconds
- **Trainable Parameters**: {sum(p.numel() for p in model.parameters() if p.requires_grad) / 1e6:.2f}M

## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load model
model = AutoModelForCausalLM.from_pretrained('{model_name}')
model = PeftModel.from_pretrained(model, './finetuned_model')
tokenizer = AutoTokenizer.from_pretrained('./finetuned_model')

# Generate response
inputs = tokenizer("Your prompt here", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
```

## Training Framework
- Jenga-AI LLM Fine-tuning Pipeline
- Google Colab T4 GPU Runtime
"""

# Save model card
with open(f"{output_dir}/README.md", "w") as f:
    f.write(model_card)

print("üìÑ Model Card:")
print(model_card)
print(f"\n‚úÖ Model card saved to: {output_dir}/README.md")

## 12. Download Model (Optional)

In [None]:
# Zip the model for download
!zip -r finetuned_model.zip finetuned_model/

print("‚úÖ Model compressed to: finetuned_model.zip")
print("üì• You can now download the model from the Files tab")

# Also zip MLflow runs
!zip -r mlflow_runs.zip mlruns/
print("‚úÖ MLflow runs compressed to: mlflow_runs.zip")

# Display download links in Colab
from google.colab import files
print("\nüì• Click below to download:")
files.download('finetuned_model.zip')
files.download('mlflow_runs.zip')

## üìä Training Summary

Congratulations! You've successfully fine-tuned an LLM using the Jenga-AI framework on Google Colab's T4 GPU.

### Key Achievements:
- ‚úÖ Generated 500 synthetic training samples
- ‚úÖ Fine-tuned model with LoRA for efficiency
- ‚úÖ Achieved training on T4 GPU in ~5-10 minutes
- ‚úÖ Integrated MLflow for experiment tracking
- ‚úÖ Saved model for deployment

### Next Steps:
1. **Deploy the model**: Use the saved model for inference
2. **Experiment with hyperparameters**: Try different learning rates, batch sizes
3. **Use real data**: Replace synthetic data with your actual dataset
4. **Scale up**: Try larger models like `gpt2-medium` or `llama-2-7b`

### Resources:
- [Hugging Face Transformers](https://huggingface.co/docs/transformers)
- [PEFT Documentation](https://huggingface.co/docs/peft)
- [MLflow Documentation](https://mlflow.org/docs/latest/index.html)