# LoRA Fine-tuning with MLflow

This notebook demonstrates how to fine-tune a language model using LoRA (Low-Rank Adaptation) with comprehensive MLflow tracking.

## What you'll learn:
- LoRA fine-tuning of small language models
- MLflow experiment tracking and model registry
- Model evaluation and comparison
- Creating instruction-following datasets

## Requirements:
- Python 3.8+
- GPU recommended (but works on CPU/MPS)
- ~4GB RAM for small models

## 1. Setup and Imports

First, let's import all necessary libraries and set up our environment.

In [None]:
import os
import json
import time
import warnings
from pathlib import Path
from typing import Dict, List, Any

# ML and Deep Learning
import torch
import pandas as pd
from datasets import Dataset

# Transformers and PEFT
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    DataCollatorForLanguageModeling,
    Trainer,
    TrainingArguments,
)
from peft import LoraConfig, PeftModel, TaskType, get_peft_model

# MLflow for experiment tracking
import mlflow
import mlflow.pytorch

# Progress tracking
from tqdm.auto import tqdm

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Disable tokenizers warnings
os.environ['TOKENIZERS_PARALLELISM'] = 'false'

print("✅ All libraries imported successfully!")
print(f"🔥 PyTorch version: {torch.__version__}")
print(f"📊 MLflow version: {mlflow.__version__}")
print(f"📈 tqdm available for progress tracking")

## 2. Configuration and Setup

Let's define our training configuration and set up MLflow tracking.

In [None]:
# Training Configuration
config = {
    # Model settings
    "model_name": "microsoft/DialoGPT-small",  # Smaller model for demo
    "max_length": 256,
    
    # Training settings
    "batch_size": 2,
    "learning_rate": 5e-4,
    "num_epochs": 1,  # Reduced for demo
    "warmup_steps": 50,
    "save_steps": 100,
    
    # LoRA settings
    "lora_r": 8,  # Rank
    "lora_alpha": 16,  # Alpha parameter
    "lora_dropout": 0.1,
    "target_modules": ["c_attn", "c_proj"],  # DialoGPT specific modules
    
    # Output settings
    "output_dir": "./lora_model",
    "experiment_name": "lora_finetuning_demo"
}

# Device setup
if torch.cuda.is_available():
    device = torch.device("cuda")
    print("🚀 Using CUDA GPU")
elif torch.backends.mps.is_available():
    device = torch.device("mps")
    print("🍎 Using Apple Silicon MPS")
else:
    device = torch.device("cpu")
    print("💻 Using CPU")

print(f"📋 Configuration loaded: {config['model_name']}")

## 3. MLflow Setup

Initialize MLflow for experiment tracking.

In [None]:
# Set up MLflow for local tracking (ROBUST VERSION)
import os
from pathlib import Path

# Step 1: Clear ALL MLflow environment variables that might cause issues
mlflow_env_vars = [
    'MLFLOW_TRACKING_URI', 
    'MLFLOW_TRACKING_URL', 
    'MLFLOW_SERVER_HOST', 
    'MLFLOW_SERVER_PORT',
    'MLFLOW_REGISTRY_URI'
]

for var in mlflow_env_vars:
    if var in os.environ:
        del os.environ[var]
        print(f"🧹 Cleared environment variable: {var}")

# Step 2: Force local MLflow tracking
mlruns_dir = Path("./mlruns").resolve()
mlruns_dir.mkdir(exist_ok=True)

# Step 3: Set tracking URI to local file system BEFORE any other MLflow operations
tracking_uri = f"file://{mlruns_dir}"
mlflow.set_tracking_uri(tracking_uri)

print(f"🏠 Using local MLflow tracking: {tracking_uri}")
print(f"📍 MLruns directory: {mlruns_dir}")

# Step 4: Verify the tracking URI is set correctly
current_uri = mlflow.get_tracking_uri()
print(f"🔍 Current MLflow URI: {current_uri}")

# Step 5: Set up MLflow experiment
try:
    mlflow.set_experiment(config["experiment_name"])
    print(f"✅ MLflow experiment created: {config['experiment_name']}")
except Exception as e:
    print(f"❌ Error creating experiment: {e}")
    print("🔧 Trying to create experiment with explicit local tracking...")
    
    # Force set tracking URI again and retry
    mlflow.set_tracking_uri(tracking_uri)
    mlflow.set_experiment(config["experiment_name"])
    print(f"✅ MLflow experiment created (retry): {config['experiment_name']}")

print(f"🔗 MLflow UI: http://localhost:5000 (run 'mlflow ui' in terminal)")
print("💡 Note: You don't need to run MLflow UI for the notebook to work!")

## 4. Create Training Dataset

Let's create a simple instruction-following dataset for fine-tuning.

In [None]:
def create_training_dataset():
    """Create a simple instruction-following dataset."""
    
    training_data = [
        {
            "instruction": "What is Unsloth and why use it for fine-tuning LLMs?",
            "output": "Unsloth is a toolkit focused on faster, more memory-efficient fine-tuning of LLMs. It wraps model loading and LoRA/QLoRA setup with optimized kernels and paging optimizers, often yielding lower VRAM usage and higher throughput while staying compatible with Hugging Face/TRL."
        },
        {
            "instruction": "How do I install Unsloth for a typical CUDA setup?",
            "output": "Create a clean environment, install PyTorch with CUDA, then `pip install unsloth`. If you need bleeding-edge features, install from GitHub. Ensure your CUDA and driver versions match the installed PyTorch build."
        },
        {
            "instruction": "Can I use Unsloth on a MacBook without NVIDIA GPUs?",
            "output": "Yes for small models via CPU or Apple MPS, but the biggest speedups come from NVIDIA CUDA GPUs. For larger models, use a cloud GPU or a Linux workstation with a recent NVIDIA card."
        },
        {
            "instruction": "What’s the simplest way to load a base model in Unsloth?",
            "output": "Use `FastLanguageModel.from_pretrained(model_name, load_in_4bit=True, max_seq_length=...)` to load a model and quantize it for QLoRA quickly."
        },
        {
            "instruction": "When should I choose LoRA vs QLoRA in Unsloth?",
            "output": "Use LoRA if you have enough VRAM and want simplicity. Use QLoRA when VRAM is limited: base weights in 4-bit + small trainable LoRA adapters give strong results with minimal memory."
        },
        {
            "instruction": "What do LoRA hyperparameters r, alpha and dropout do?",
            "output": "r is the rank of the low-rank adapters (capacity), alpha scales the adapter update (similar to a learning-rate multiplier), and dropout randomly zeroes adapter activations to regularize training."
        },
        {
            "instruction": "How do I pick target modules for LoRA with Unsloth?",
            "output": "Use Unsloth’s convenience options like `target_modules='all-linear'` to cover common linear layers. Advanced users can pass a list of exact module names to control which layers get adapters."
        },
        {
            "instruction": "How do I format my dataset for instruction tuning with Unsloth?",
            "output": "Create plain text fields per sample with a stable prompt template, e.g. `### Instruction: ...\\n### Response: ...`. Keep formatting consistent across all samples."
        },
        {
            "instruction": "What sequence length should I set in Unsloth?",
            "output": "Choose the smallest `max_seq_length` that fits your task. Longer contexts increase VRAM and training time. Start with 512 or 1024 and scale up only if needed."
        },
        {
            "instruction": "How do I enable efficient packing of short samples?",
            "output": "If you use TRL’s `SFTTrainer`, set `packing=True`. This packs multiple short texts into a single sequence to reduce padding and speed up training."
        },
        {
            "instruction": "Which optimizer is recommended for QLoRA in Unsloth?",
            "output": "Paged optimizers (e.g., paged AdamW 8-bit) are commonly used because they reduce memory footprint. They work well with 4-bit quantization."
        },
        {
            "instruction": "What learning rate should I start with for adapter tuning?",
            "output": "Typical starting points are 1e-4 to 2e-4 for adapters. Use warmup (e.g., 50–200 steps) and monitor loss; lower LR if you see instability or overfitting."
        },
        {
            "instruction": "How do gradient accumulation and batch size affect memory?",
            "output": "Larger batch sizes improve stability but need more VRAM. With limited VRAM, set a small per-device batch (e.g., 1–4) and increase `gradient_accumulation_steps` to reach an effective batch size."
        },
        {
            "instruction": "Should I train in bf16 or fp16 with Unsloth?",
            "output": "If your GPU supports bf16 well, prefer bf16 for stability. Otherwise fp16 is fine. On older cards without good bf16, stick to fp16."
        },
        {
            "instruction": "How do I monitor training progress with Unsloth?",
            "output": "Log loss, learning rate, and throughput. You can integrate Weights & Biases or use TRL/Transformers logging. Validate periodically on a held-out set of prompts."
        },
        {
            "instruction": "What is the recommended prompt template style?",
            "output": "Keep it simple and consistent, e.g., `### Instruction:\\n{user}\\n\\n### Response:\\n{assistant}`. Don’t mix multiple templates in one run unless you know what you’re doing."
        },
        {
            "instruction": "How do I avoid overfitting during Unsloth fine-tuning?",
            "output": "Use a validation split, early stopping or limited epochs, apply dropout in LoRA, and avoid training too long on tiny datasets. Monitor for verbatim memorization."
        },
        {
            "instruction": "How do I run inference after training with adapters?",
            "output": "Load the same base model with Unsloth and call `model.load_adapter('path/to/adapters')`. Tokenize your prompt and generate with your preferred decoding settings."
        },
        {
            "instruction": "Can I merge LoRA adapters back into the base model?",
            "output": "Yes. Merging produces a single model artifact (fp16/fp32) that’s simpler to deploy but loses the flexibility of swapping adapters."
        },
        {
            "instruction": "What models are good starters for Unsloth fine-tuning?",
            "output": "TinyLlama or Phi-mini for very small GPUs; Mistral-7B or Llama-8B for stronger baselines with QLoRA; choose an Instruct variant if you’re doing instruction tuning."
        },
        {
            "instruction": "What dataset size is reasonable for a quick Unsloth demo?",
            "output": "A few hundred to a few thousand examples is enough to see behavior changes. Start small to validate the pipeline, then scale up as needed."
        },
        {
            "instruction": "How do I set `eos_token_id`, padding, and truncation correctly?",
            "output": "Ensure the tokenizer has a defined EOS; set `padding_side='right'` for causal models; truncate to `max_seq_length`. Consistent EOS handling prevents run-on generations."
        },
        {
            "instruction": "What generation settings should I test after fine-tuning?",
            "output": "Try `max_new_tokens` (e.g., 64–256), `temperature` (0.2–0.8), `top_p` (0.9), and `do_sample=True`. For deterministic outputs, set sampling off and use greedy/beam search."
        },
        {
            "instruction": "How does Unsloth help with VRAM limits?",
            "output": "QLoRA with 4-bit base weights, 8-bit optimizers, packing, and kernel optimizations reduce memory usage, allowing larger models on modest GPUs."
        },
        {
            "instruction": "How can I resume training if it stops midway?",
            "output": "Point `output_dir` to the previous run, reload the trainer with the same config, and Unsloth/Transformers will restore weights, optimizer, and scheduler if checkpoints exist."
        },
        {
            "instruction": "What evaluation approach should I use for small instruction datasets?",
            "output": "Hold out 5–20% as a dev set; use simple exact-match or regex for structured tasks; optionally use an LLM-as-judge for open-ended quality checks."
        },
        {
            "instruction": "How do I prepare conversational data for Unsloth?",
            "output": "Either flatten to your text template or use a chat template to format messages consistently (system, user, assistant). Keep roles and separators stable."
        },
        {
            "instruction": "What’s a good epoch count for adapter fine-tunes?",
            "output": "Often 1–3 epochs are enough. With tiny datasets, prefer more steps via repeat+shuffle rather than many epochs to avoid rapid overfitting."
        },
        {
            "instruction": "How do I handle long-context training with Unsloth?",
            "output": "Increase `max_seq_length` cautiously and use packing. Expect higher memory and slower steps. Consider gradient checkpointing if supported to trade compute for memory."
        },
        {
            "instruction": "What should I check if I hit CUDA OOM errors?",
            "output": "Lower `max_seq_length`, reduce batch size, increase gradient accumulation, switch to QLoRA, or disable extra features. Verify no background processes are using VRAM."
        },
        {
            "instruction": "Is it okay to mix tasks in one Unsloth fine-tune?",
            "output": "Yes if they share style/instructions. Keep templates consistent and balance the dataset so one task doesn’t dominate unless intended."
        },
        {
            "instruction": "How do I log train/val samples to W&B with Unsloth?",
            "output": "Initialize W&B in your script, log losses and sample generations at evaluation steps, and attach key hyperparameters (LR, r, alpha, dropout, max_seq_length) as config."
        },
        {
            "instruction": "What scheduler should I use for adapter training?",
            "output": "Cosine or linear with warmup are common defaults. The choice matters less than using a reasonable LR and warmup; tune based on validation loss."
        },
        {
            "instruction": "How do I choose the right rank r for LoRA?",
            "output": "Start with r=8–16. Increase r if the model underfits or the task is complex; decrease if you observe overfitting or want smaller adapters."
        },
        {
            "instruction": "What’s the best way to export the fine-tuned result?",
            "output": "Keep adapters for modularity and small artifacts, or merge them into the base weights to get a single model for simpler deployment."
        },
        {
            "instruction": "Can Unsloth be used with TRL’s SFTTrainer?",
            "output": "Yes. Load the model with Unsloth, wrap LoRA, then pass the model and tokenizer into TRL’s `SFTTrainer` with your dataset and training args."
        },
        {
            "instruction": "How do I set up a tiny demo run on a free GPU notebook?",
            "output": "Use a small model (e.g., TinyLlama), set `max_seq_length=512`, QLoRA on, small batch size, gradient accumulation, and train for a few hundred steps to verify the pipeline."
        },
        {
            "instruction": "What’s the role of a clean prompt template in fine-tuning?",
            "output": "It standardizes context so the model learns consistent input-output mapping. Messy templates make training noisy and degrade quality."
        },
        {
            "instruction": "How do I check that the model learned my style after fine-tuning?",
            "output": "Create a small evaluation script that feeds 10–20 held-out prompts and inspects responses for instruction adherence, tone, and correctness."
        },
        {
            "instruction": "What are common pitfalls when fine-tuning with Unsloth?",
            "output": "Inconsistent templates, too-high LR, no validation set, too-long sequences on small GPUs, forgetting EOS handling, and mixing incompatible tokenizers/models."
        },
        {
            "instruction": "How do I keep runs reproducible?",
            "output": "Fix random seeds, pin package versions, log all hyperparameters, and checkpoint regularly so you can resume with the same state."
        }
    ]
    
    # Format for instruction following with special tokens
    formatted_data = []
    for item in training_data:
        text = f"### Instruction: {item['instruction']}\n### Response: {item['output']}<|endoftext|>"
        formatted_data.append({"text": text})
    
    return Dataset.from_list(formatted_data)

# Create dataset
dataset = create_training_dataset()

print(f"📚 Created dataset with {len(dataset)} examples")
print("\n📝 Sample training example:")
print(dataset[0]["text"][:200] + "...")

## 5. Load Model and Tokenizer

Load the base model and tokenizer, then configure LoRA.

In [None]:
def setup_model_and_tokenizer(model_name: str, lora_config: Dict):
    """Set up model, tokenizer, and LoRA configuration."""
    
    print(f"🔄 Loading model: {model_name}")
    
    # Load tokenizer
    tokenizer = AutoTokenizer.from_pretrained(
        model_name,
        trust_remote_code=True,
        padding_side="right"
    )
    
    # Set pad token if not exists
    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token
        print("🔧 Set pad_token to eos_token")
    
    # Load base model
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16 if device.type == "cuda" else torch.float32,
        device_map="auto" if torch.cuda.is_available() else None,
        trust_remote_code=True
    )
    
    # Configure LoRA
    peft_config = LoraConfig(
        r=lora_config["lora_r"],
        lora_alpha=lora_config["lora_alpha"],
        target_modules=lora_config["target_modules"],
        lora_dropout=lora_config["lora_dropout"],
        bias="none",
        task_type=TaskType.CAUSAL_LM,
    )
    
    # Apply LoRA to model
    model = get_peft_model(model, peft_config)
    
    # Print trainable parameters info
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    total_params = sum(p.numel() for p in model.parameters())
    trainable_percentage = 100 * trainable_params / total_params
    
    print(f"🎯 Trainable parameters: {trainable_params:,}")
    print(f"📊 Total parameters: {total_params:,}")
    print(f"💡 Trainable percentage: {trainable_percentage:.2f}%")
    
    return model, tokenizer

# Setup model and tokenizer
model, tokenizer = setup_model_and_tokenizer(config["model_name"], config)

## 6. Prepare Dataset for Training

Tokenize the dataset and prepare it for training.

In [None]:
def tokenize_dataset(dataset, tokenizer, max_length: int):
    """Tokenize the dataset for training."""
    
    def tokenize_function(examples):
        tokenized = tokenizer(
            examples["text"],
            truncation=True,
            padding=True,
            max_length=max_length,
            return_tensors="pt"
        )
        # Set labels for causal language modeling
        tokenized["labels"] = tokenized["input_ids"].clone()
        return tokenized
    
    tokenized_dataset = dataset.map(
        tokenize_function,
        batched=True,
        remove_columns=dataset.column_names
    )
    
    return tokenized_dataset

# Tokenize dataset
print("🔄 Tokenizing dataset...")
tokenized_dataset = tokenize_dataset(dataset, tokenizer, config["max_length"])

print(f"✅ Dataset tokenized: {len(tokenized_dataset)} examples")
print(f"📏 Max length: {config['max_length']} tokens")

## 7. Training Setup

Configure the training arguments and data collator.

In [None]:
# Training arguments
training_args = TrainingArguments(
    output_dir=config["output_dir"],
    num_train_epochs=config["num_epochs"],
    per_device_train_batch_size=config["batch_size"],
    learning_rate=config["learning_rate"],
    warmup_steps=config["warmup_steps"],
    logging_steps=10,
    save_steps=config["save_steps"],
    save_total_limit=2,
    remove_unused_columns=False,
    report_to=None,  # Disable all external reporting (W&B, TensorBoard, etc.)
    load_best_model_at_end=False,
    dataloader_pin_memory=False,  # Better compatibility
    dataloader_num_workers=0,     # Better compatibility
    fp16=False,  # Disable for stability
    disable_tqdm=False,  # Enable tqdm progress bars
)

# Data collator for language modeling
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # Causal LM, not masked LM
)

print("⚙️ Training configuration set up:")
print(f"   📊 Epochs: {config['num_epochs']}")
print(f"   🎯 Batch size: {config['batch_size']}")
print(f"   📈 Learning rate: {config['learning_rate']}")
print(f"   💾 Output dir: {config['output_dir']}")
print(f"   📈 Progress bars: Enabled with tqdm")

## 8. Model Training with MLflow Tracking

Now let's train the model and track everything with MLflow.

In [None]:
# Start MLflow run for training
with mlflow.start_run(run_name="lora_training") as run:
    
    # Log all configuration parameters
    mlflow.log_params({
        "model_name": config["model_name"],
        "max_length": config["max_length"],
        "batch_size": config["batch_size"],
        "learning_rate": config["learning_rate"],
        "num_epochs": config["num_epochs"],
        "lora_r": config["lora_r"],
        "lora_alpha": config["lora_alpha"],
        "lora_dropout": config["lora_dropout"],
        "target_modules": str(config["target_modules"]),
        "device": str(device),
        "dataset_size": len(tokenized_dataset)
    })
    
    # Create trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset,
        tokenizer=tokenizer,
        data_collator=data_collator,
    )
    
    print("🚀 Starting LoRA fine-tuning...")
    print(f"📊 Dataset size: {len(tokenized_dataset)} examples")
    print(f"⏱️ Estimated time: ~{config['num_epochs'] * 2} minutes")
    
    # Record training start time
    start_time = time.time()
    
    # Train the model
    trainer.train()
    
    # Calculate training time
    training_time = time.time() - start_time
    
    # Save model and tokenizer
    print("💾 Saving model and tokenizer...")
    trainer.save_model(config["output_dir"])
    tokenizer.save_pretrained(config["output_dir"])
    
    # Extract final metrics
    if trainer.state.log_history:
        final_loss = trainer.state.log_history[-1].get("train_loss", 0)
    else:
        final_loss = 0
    
    # Log training metrics to MLflow
    mlflow.log_metrics({
        "final_train_loss": final_loss,
        "training_time_seconds": training_time,
        "training_time_minutes": training_time / 60,
        "total_training_steps": trainer.state.global_step,
        "examples_per_second": len(tokenized_dataset) * config["num_epochs"] / training_time,
    })
    
    # Log model artifacts
    mlflow.log_artifacts(config["output_dir"], "model_checkpoints")
    
    training_run_id = run.info.run_id
    
    print("\n✅ Training completed!")
    print(f"⏱️ Training time: {training_time/60:.2f} minutes")
    print(f"📉 Final loss: {final_loss:.4f}")
    print(f"🆔 Run ID: {training_run_id}")
    print(f"💾 Model saved to: {config['output_dir']}")

## 9. Model Evaluation

Let's test our fine-tuned model with some sample questions.

In [None]:
def evaluate_model(model_path: str, tokenizer, test_questions: List[str]):
    """Evaluate the fine-tuned model."""
    
    print("🔄 Loading fine-tuned model for evaluation...")
    
    # Load the base model
    base_model = AutoModelForCausalLM.from_pretrained(
        config["model_name"],
        torch_dtype=torch.float16 if device.type == "cuda" else torch.float32,
        device_map="auto" if torch.cuda.is_available() else None,
        trust_remote_code=True
    )
    
    # Load LoRA adapter
    model = PeftModel.from_pretrained(base_model, model_path)
    model.eval()
    
    results = []
    
    print(f"🧪 Evaluating {len(test_questions)} questions...")
    
    # Use tqdm for progress tracking during evaluation
    for i, question in enumerate(tqdm(test_questions, desc="Evaluating questions")):
        # Format prompt
        prompt = f"### Instruction: {question}\n### Response:"
        
        # Tokenize input
        inputs = tokenizer(
            prompt, 
            return_tensors="pt", 
            truncation=True, 
            max_length=200
        )
        
        # Generate response
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=100,
                temperature=0.7,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id,
                repetition_penalty=1.1
            )
        
        # Decode response
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = response.replace(prompt, "").strip()
        
        results.append({
            "question": question,
            "response": response,
            "response_length": len(response.split())
        })
    
    return results

# Test questions
test_questions = [
    "What is machine learning?",
    "Explain deep learning in simple terms.",
    "What are the benefits of using MLflow?",
    "How does LoRA fine-tuning work?",
    "What is the difference between training and inference?"
]

# Evaluate model
evaluation_results = evaluate_model(config["output_dir"], tokenizer, test_questions)

print("\n📊 Evaluation Results:")
print("=" * 60)

for i, result in enumerate(evaluation_results, 1):
    print(f"\n🔸 Question {i}: {result['question']}")
    print(f"💬 Response: {result['response']}")
    print(f"📏 Length: {result['response_length']} words")
    print("-" * 40)

## 10. Log Evaluation Results to MLflow

Let's log our evaluation results and metrics to MLflow.

In [None]:
# Log evaluation results to MLflow
with mlflow.start_run(run_name="model_evaluation") as eval_run:
    
    # Calculate evaluation metrics
    avg_response_length = sum(r["response_length"] for r in evaluation_results) / len(evaluation_results)
    non_empty_responses = sum(1 for r in evaluation_results if r["response"].strip())
    response_rate = non_empty_responses / len(evaluation_results)
    
    # Log evaluation metrics
    mlflow.log_metrics({
        "avg_response_length": avg_response_length,
        "response_rate": response_rate,
        "total_test_questions": len(test_questions),
        "successful_responses": non_empty_responses
    })
    
    # Log evaluation parameters
    mlflow.log_params({
        "training_run_id": training_run_id,
        "model_path": config["output_dir"],
        "evaluation_temperature": 0.7,
        "max_new_tokens": 100
    })
    
    # Save evaluation results as CSV
    results_df = pd.DataFrame(evaluation_results)
    results_file = "evaluation_results.csv"
    results_df.to_csv(results_file, index=False)
    mlflow.log_artifact(results_file, "evaluation")
    
    # Log sample responses as parameters
    for i, result in enumerate(evaluation_results[:3]):
        mlflow.log_param(f"sample_question_{i+1}", result["question"][:100])
        mlflow.log_param(f"sample_response_{i+1}", result["response"][:200])
    
    print("📊 Evaluation metrics logged to MLflow:")
    print(f"   📏 Average response length: {avg_response_length:.1f} words")
    print(f"   ✅ Response rate: {response_rate:.1%}")
    print(f"   📁 Results saved to: {results_file}")
    print(f"   🆔 Evaluation run ID: {eval_run.info.run_id}")

# Clean up the temporary file
if os.path.exists(results_file):
    os.remove(results_file)

## 11. Model Registration (Optional)

Register the trained model in MLflow Model Registry for version control and deployment.

In [None]:
def register_model_in_registry(model_path: str, model_name: str = "lora_finetuned_model"):
    """Register the LoRA model in MLflow Model Registry."""
    
    try:
        with mlflow.start_run(run_name="model_registration"):
            
            # Log the model directory as an artifact first
            mlflow.log_artifacts(model_path, "model")
            
            # Log model metadata
            mlflow.log_params({
                "base_model": config["model_name"],
                "lora_r": config["lora_r"],
                "lora_alpha": config["lora_alpha"],
                "training_run_id": training_run_id,
                "model_type": "LoRA fine-tuned"
            })
            
            print(f"🏷️ Model artifacts logged for registration")
            print(f"📦 Model name: {model_name}")
            print(f"🔗 Base model: {config['model_name']}")
            
            return True
            
    except Exception as e:
        print(f"⚠️ Model registration encountered an issue: {e}")
        print("💡 This is normal - model registry is optional for this demo")
        return False

# Register the model
registration_success = register_model_in_registry(config["output_dir"])

if registration_success:
    print("✅ Model registration completed!")
else:
    print("ℹ️ Model saved locally and ready to use")

## 12. Summary and Next Steps

Let's summarize what we accomplished and suggest next steps.

In [None]:
print("🎉 LoRA Fine-tuning Complete!")
print("=" * 60)

print("\n📋 What we accomplished:")
print(f"   ✅ Fine-tuned {config['model_name']} with LoRA")
print(f"   ✅ Trained on {len(dataset)} instruction-following examples")
print(f"   ✅ Used only {config['lora_r']} rank (very efficient!)")
print(f"   ✅ Tracked everything with MLflow (no external dependencies)")
print(f"   ✅ Evaluated model on {len(test_questions)} test questions")
print(f"   ✅ Saved model to: {config['output_dir']}")

print("\n📊 Key Results:")
print(f"   📉 Final training loss: {final_loss:.4f}")
print(f"   ⏱️ Training time: {training_time/60:.1f} minutes")
print(f"   📏 Avg response length: {avg_response_length:.1f} words")
print(f"   ✅ Response rate: {response_rate:.1%}")

print("\n🔗 MLflow Integration (Local Only):")
print(f"   📊 Experiment: {config['experiment_name']}")
print(f"   🏃 Training run: {training_run_id[:8]}...")
print(f"   🧪 Evaluation run: {eval_run.info.run_id[:8]}...")
print(f"   📈 Progress tracking: tqdm enabled")

print("\n🚀 Next Steps:")
print("   1. 🌐 View detailed results: mlflow ui")
print("   2. 🎯 Try different LoRA configurations")
print("   3. 📚 Expand training dataset for better performance")
print("   4. 🔄 Compare with other fine-tuning methods")
print("   5. 🚀 Deploy model for production use")

print("\n💡 Useful Commands:")
print("   • Start MLflow UI: mlflow ui")
print("   • Access at: http://localhost:5000")
print("   • Model location:", Path(config["output_dir"]).absolute())

print("\n🎓 What you learned:")
print("   • LoRA fine-tuning is very parameter-efficient")
print("   • MLflow provides comprehensive local experiment tracking")
print("   • tqdm gives clear progress visualization during training")
print("   • Small models can be effective for specific tasks")
print("   • Evaluation is crucial for model assessment")
print("   • No external tracking services (W&B) needed")

print("\n" + "=" * 60)
print("🎯 Happy fine-tuning with MLflow! 🤖")