# LLM Fine-Tuning Tutorial

This notebook demonstrates how to use the fine-tuning framework to train a language model on custom data. We'll cover the following topics:

1. Setting up the environment
2. Preparing the dataset
3. Configuring the model and training parameters
4. Training the model with different methods (LoRA, QLoRA, Spectrum)
5. Evaluating the fine-tuned model
6. Inference with the fine-tuned model

Let's start!

## 1. Setting up the environment

First, we need to import the necessary libraries and set up the paths.

In [None]:
import os
import sys
import yaml
import torch
import json
from pathlib import Path
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer

# Add the project root to the path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root))

# Import project modules
from src.utils.data_processing import preprocess_dataset
from src.utils.spectrum import SpectrumAnalyzer
from src.utils.evaluation import calculate_perplexity, evaluate_model

# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")
    print(f"Available GPU memory: {torch.cuda.get_device_properties(0).total_memory/1e9:.2f} GB")

## 2. Preparing the dataset

Next, we'll prepare a sample dataset for fine-tuning. For this tutorial, we'll use a sample of the Alpaca dataset format, which contains instruction-based examples.

In [None]:
# Sample dataset in Alpaca format
sample_data = [
    {
        "instruction": "Tell me a joke about programming.",
        "input": "",
        "output": "Why do programmers prefer dark mode? Because light attracts bugs!"
    },
    {
        "instruction": "Explain the concept of recursion in programming.",
        "input": "",
        "output": "Recursion is a programming technique where a function calls itself to solve a problem. It breaks down a complex problem into simpler instances of the same problem. Every recursive solution needs a base case to prevent infinite recursion. A classic example is calculating factorial: n! = n * (n-1)! with base case 0! = 1."
    },
    {
        "instruction": "Translate this sentence to French.",
        "input": "Hello, how are you today?",
        "output": "Bonjour, comment allez-vous aujourd'hui?"
    }
]

# Save the sample data to a file
sample_data_path = project_root / "data" / "sample_alpaca.json"
os.makedirs(sample_data_path.parent, exist_ok=True)

with open(sample_data_path, "w") as f:
    json.dump(sample_data, f, indent=2)

# Load the dataset
dataset = load_dataset("json", data_files=str(sample_data_path))["train"]
print(f"Dataset loaded with {len(dataset)} examples")
print("Sample example:")
print(dataset[0])

### Tokenizing and Processing the Dataset

Now we'll load a model tokenizer and process the dataset.

In [None]:
# Load a tokenizer for preprocessing
model_name = "meta-llama/Llama-3.1-8B-Instruct"  # Use an actual model you have access to
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

# Ensure the tokenizer has padding token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Process the dataset
processed_datasets = preprocess_dataset(
    train_dataset=dataset,
    tokenizer=tokenizer,
    max_seq_length=2048,  # Use a smaller value for demonstration
    format="alpaca",
    add_eos_token=True,
    use_chat_template=True
)

print(f"Processed dataset has {len(processed_datasets['train'])} examples")
print("Sample tokenized example:")
print(f"Input IDs (first 10): {processed_datasets['train'][0]['input_ids'][:10]}")
print(f"Length: {len(processed_datasets['train'][0]['input_ids'])} tokens")

## 3. Configuring the model and training parameters

Let's set up a configuration file for training. We'll define parameters for different fine-tuning methods.

In [None]:
# Create a sample configuration
sample_config = {
    "model": {
        "base_model": "meta-llama/Llama-3.1-8B-Instruct",
        "tokenizer": "meta-llama/Llama-3.1-8B-Instruct",
        "load_in_8bit": False,
        "load_in_4bit": True,
        "trust_remote_code": True,
        "use_flash_attention": True
    },
    "fine_tuning": {
        "method": "qlora",  # Options: "full", "lora", "qlora", "spectrum"
        "lora": {
            "r": 16,
            "alpha": 32,
            "dropout": 0.05,
            "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
            "bias": "none"
        },
        "spectrum": {
            "snr_threshold": 0.5,
            "layers_to_finetune": "auto"  # Will be determined by SNR analysis
        },
        "quantization": {
            "bits": 4,
            "bnb_4bit_compute_dtype": "bfloat16",
            "bnb_4bit_quant_type": "nf4",
            "bnb_4bit_use_double_quant": True
        }
    },
    "training": {
        "epochs": 3,
        "micro_batch_size": 1,
        "gradient_accumulation_steps": 4,  # Reduced for demonstration
        "learning_rate": 2.0e-4,
        "lr_scheduler_type": "cosine",
        "warmup_ratio": 0.03,
        "max_grad_norm": 0.3,
        "weight_decay": 0.001,
        "max_seq_length": 2048,  # Reduced for demonstration
        "gradient_checkpointing": True,
        "mixed_precision": "bf16"  # Options: "no", "fp16", "bf16"
    },
    "dataset": {
        "format": "alpaca",
        "train_path": str(sample_data_path),
        "eval_path": str(sample_data_path),  # Using the same for demonstration
        "preprocessing": {
            "add_eos_token": True,
            "add_bos_token": False,
            "use_chat_template": True
        }
    },
    "output": {
        "output_dir": str(project_root / "models" / "tutorial_run"),
        "logging_steps": 10,
        "save_steps": 50,
        "save_total_limit": 3,
        "push_to_hub": False
    },
    "evaluation": {
        "do_eval": True,
        "eval_batch_size": 2,
        "eval_strategy": "steps",
        "eval_steps": 50
    }
}

# Save the configuration
config_path = project_root / "config" / "tutorial_config.yaml"
os.makedirs(config_path.parent, exist_ok=True)

with open(config_path, "w") as f:
    yaml.dump(sample_config, f, sort_keys=False)

print(f"Configuration saved to {config_path}")

## 4. Training the model

Now we'll demonstrate how to fine-tune a model using our framework. Instead of running actual training (which would require significant compute resources), we'll show how to call the training script with different methods.

In [None]:
# Display command for QLoRA training
qlora_cmd = f"python {project_root}/src/train.py --config {config_path}"
print("QLoRA training command:")
print(qlora_cmd)

# Display command for LoRA training
lora_config = sample_config.copy()
lora_config["fine_tuning"]["method"] = "lora"
lora_config["model"]["load_in_4bit"] = False
lora_config_path = project_root / "config" / "tutorial_lora_config.yaml"
with open(lora_config_path, "w") as f:
    yaml.dump(lora_config, f, sort_keys=False)
lora_cmd = f"python {project_root}/src/train.py --config {lora_config_path}"
print("\nLoRA training command:")
print(lora_cmd)

# Display command for Spectrum training
spectrum_config = sample_config.copy()
spectrum_config["fine_tuning"]["method"] = "spectrum"
spectrum_config_path = project_root / "config" / "tutorial_spectrum_config.yaml"
with open(spectrum_config_path, "w") as f:
    yaml.dump(spectrum_config, f, sort_keys=False)
spectrum_cmd = f"python {project_root}/src/train.py --config {spectrum_config_path}"
print("\nSpectrum training command:")
print(spectrum_cmd)

### How to Run Training

The actual training would be executed using the commands shown above. For large models, you would typically run this on a machine with GPUs, using a command line or job scheduler.

```bash
# Example: Run QLoRA training
python /path/to/src/train.py --config /path/to/config.yaml
```

For distributed training, you would use the distributed_setup.py script to generate launcher scripts:

```bash
# Generate DeepSpeed configuration
python /path/to/scripts/distributed_setup.py deepspeed --zero_stage 2 --offload_optimizer --bf16

# Generate launcher script
python /path/to/scripts/distributed_setup.py launcher --config /path/to/config.yaml --num_gpus_per_node 4 --use_deepspeed

# Run the launcher script
bash /path/to/scripts/run_training.sh
```

## 5. Exploring Spectrum Layer Selection

Let's demonstrate how the Spectrum method analyzes a model to select which layers to fine-tune. We'll load a small model for demonstration purposes.

In [None]:
# Load a small model for demonstration (adjust according to available resources)
try:
    # Use a smaller model for demonstration
    tiny_model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
    model = AutoModelForCausalLM.from_pretrained(
        tiny_model_name,
        torch_dtype=torch.float16,
        device_map="auto"
    )
    
    # Use Spectrum to analyze the model
    analyzer = SpectrumAnalyzer(model)
    snr_values = analyzer.analyze_layer_snr()
    
    # Print SNR report
    print(analyzer.generate_layer_snr_report())
    
    # Get optimally selected layers with threshold 0.5
    trainable_layers = analyzer.get_trainable_layers_by_snr(threshold=0.5)
    print(f"\nLayers selected for fine-tuning with threshold 0.5: {trainable_layers}")
    
    # Try a different threshold
    trainable_layers_075 = analyzer.get_trainable_layers_by_snr(threshold=0.75)
    print(f"Layers selected for fine-tuning with threshold 0.75: {trainable_layers_075}")
except Exception as e:
    print(f"Could not load the model for demonstration: {e}")
    print("Spectrum analysis requires loading a model, which may not be possible in this environment.")

## 6. Evaluating a fine-tuned model

After fine-tuning, you would evaluate the model to measure its performance. Here's how to use the evaluation module.

In [None]:
# This is a code sample for evaluating a fine-tuned model
eval_code = """
from src.utils.evaluation import evaluate_model
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
from datasets import load_dataset

# Load the base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    device_map="auto",
    torch_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

# Load the adapter for LoRA/QLoRA models
adapter_path = "path/to/adapter"
model = PeftModel.from_pretrained(base_model, adapter_path)

# Load evaluation dataset
eval_dataset = load_dataset("json", data_files="path/to/eval_data.json")["train"]

# Run evaluation
results = evaluate_model(
    model=model,
    tokenizer=tokenizer,
    eval_dataset=eval_dataset,
    benchmarks=["lm-evaluation-harness", "domain-specific-eval"],
    batch_size=8,
    output_dir="path/to/eval_results"
)

print("Evaluation results:")
print(json.dumps(results, indent=2))
"""

print("Sample code for evaluating a fine-tuned model:")
print(eval_code)

## 7. Inference with a fine-tuned model

Finally, let's see how to run inference with a fine-tuned model using our inference script.

In [None]:
# Command for interactive chat with a fine-tuned model
chat_cmd = f"python {project_root}/scripts/inference.py --model_path path/to/model --adapter_path path/to/adapter --chat --load_in_4bit"
print("Interactive chat command:")
print(chat_cmd)

# Command for batch inference
batch_cmd = f"python {project_root}/scripts/inference.py --model_path path/to/model --adapter_path path/to/adapter --batch --input_file path/to/inputs.json --output_file path/to/outputs.json --load_in_4bit"
print("\nBatch inference command:")
print(batch_cmd)

## Conclusion

In this tutorial, we've covered the essential components of our LLM fine-tuning framework:

1. Setting up the environment
2. Preparing and processing the dataset
3. Configuring the model and training parameters
4. Using different fine-tuning methods (LoRA, QLoRA, Spectrum)
5. Evaluating the fine-tuned model
6. Running inference with the fine-tuned model

This framework provides a flexible and efficient way to fine-tune large language models on custom data. For more details, refer to the documentation in the `docs/` directory and explore the source code in the `src/` directory.