# Lab 3.1.6 Solution: LLaMA Factory Exploration

**Module:** 3.1 - Large Language Model Fine-Tuning  
**Type:** Solution Notebook

---

This notebook contains solutions to the exercises from the LLaMA Factory exploration task.

## Exercise 1 Solution: Setup and Explore

**Task:** Install LLaMA Factory, launch the web UI, and document 5 interesting features.

In [None]:
# Exercise 1 Solution: Document your findings
# Here are 5 interesting features of LLaMA Factory:

llama_factory_features = [
    {
        "feature": "Multi-Method Training Support",
        "description": "LLaMA Factory supports SFT, DPO, RLHF, PPO, and full fine-tuning all from the same interface.",
        "benefit": "No need to switch tools when trying different training methods."
    },
    {
        "feature": "Real-Time Training Monitoring",
        "description": "The UI shows live loss curves, GPU memory usage, and training progress.",
        "benefit": "Easy to catch problems early without checking terminal logs."
    },
    {
        "feature": "Built-in Chat Testing",
        "description": "Test trained models directly in the Chat tab without any additional setup.",
        "benefit": "Immediate feedback on model quality after training."
    },
    {
        "feature": "One-Click GGUF Export",
        "description": "Export merged LoRA models directly to GGUF format for Ollama.",
        "benefit": "Streamlines the deployment pipeline significantly."
    },
    {
        "feature": "Dataset Preview",
        "description": "Preview dataset samples before training to verify formatting.",
        "benefit": "Catches dataset issues before wasting time on training."
    }
]

# Print features
print("5 Interesting LLaMA Factory Features")
print("=" * 50)
for i, feature in enumerate(llama_factory_features, 1):
    print(f"\n{i}. {feature['feature']}")
    print(f"   Description: {feature['description']}")
    print(f"   Benefit: {feature['benefit']}")

## Exercise 2 Solution: Train with GUI

**Task:** Create a custom dataset (at least 20 examples), upload it, configure LoRA training, monitor training, and test the result.

In [None]:
import json
from pathlib import Path

# Exercise 2 Solution: Create a custom dataset with 25 examples
# Theme: AI and Machine Learning concepts explained simply

custom_dataset = [
    # Basic concepts
    {
        "instruction": "Explain what artificial intelligence is.",
        "input": "",
        "output": "Artificial intelligence (AI) is the simulation of human intelligence by machines. It involves creating computer systems that can perform tasks typically requiring human intelligence, such as visual perception, speech recognition, decision-making, and language understanding."
    },
    {
        "instruction": "What is the difference between AI and machine learning?",
        "input": "",
        "output": "AI is the broader concept of machines being able to carry out tasks in a smart way. Machine learning is a subset of AI that focuses on giving machines the ability to learn from data without being explicitly programmed. Think of AI as the goal and ML as one way to achieve it."
    },
    {
        "instruction": "Explain deep learning in simple terms.",
        "input": "",
        "output": "Deep learning is a type of machine learning that uses neural networks with many layers (hence 'deep'). These networks learn to recognize patterns in data, similar to how our brains work. It's especially good at tasks like image recognition and natural language processing."
    },
    {
        "instruction": "What is a neural network?",
        "input": "",
        "output": "A neural network is a computing system inspired by the human brain. It consists of interconnected nodes (neurons) organized in layers. Data flows through these layers, with each layer learning to recognize increasingly complex patterns. The network learns by adjusting the connections between neurons based on examples."
    },
    {
        "instruction": "Define supervised learning.",
        "input": "",
        "output": "Supervised learning is when you train a model using labeled data - meaning each example has both the input and the correct answer. The model learns to predict the right answer by comparing its predictions to the known correct answers and adjusting accordingly."
    },
    # Training concepts
    {
        "instruction": "What is a loss function?",
        "input": "",
        "output": "A loss function measures how wrong a model's predictions are compared to the actual values. It produces a single number that represents the error. The goal of training is to minimize this loss. Common examples include Mean Squared Error for regression and Cross-Entropy for classification."
    },
    {
        "instruction": "Explain what backpropagation does.",
        "input": "",
        "output": "Backpropagation is how neural networks learn. After making a prediction and calculating the error (loss), backpropagation figures out how much each weight in the network contributed to that error. It then adjusts the weights to reduce the error. This process repeats thousands of times during training."
    },
    {
        "instruction": "What is gradient descent?",
        "input": "",
        "output": "Gradient descent is an optimization algorithm used to minimize the loss function. Imagine you're blindfolded on a hill and want to reach the lowest point - you'd feel which direction goes downward and take a step that way. Gradient descent does this mathematically, repeatedly moving in the direction that reduces the loss."
    },
    {
        "instruction": "What is the learning rate?",
        "input": "",
        "output": "The learning rate controls how big the steps are during gradient descent. Too large and you might overshoot the optimal solution. Too small and training takes forever. It's typically a small number like 0.001 or 0.0001 and is one of the most important hyperparameters to tune."
    },
    {
        "instruction": "Define overfitting.",
        "input": "",
        "output": "Overfitting happens when a model learns the training data too well, including its noise and peculiarities, rather than the underlying pattern. The model performs great on training data but poorly on new data. It's like memorizing test answers instead of understanding the subject."
    },
    # LLM concepts
    {
        "instruction": "What is a large language model (LLM)?",
        "input": "",
        "output": "A large language model is a type of AI trained on vast amounts of text data. It learns patterns in language and can generate human-like text, answer questions, translate languages, and perform various language tasks. Examples include GPT-4, Claude, and LLaMA."
    },
    {
        "instruction": "Explain what fine-tuning means.",
        "input": "",
        "output": "Fine-tuning is the process of taking a pre-trained model and training it further on a specific dataset or task. Instead of training from scratch, you start with a model that already understands language and adapt it to your needs. This is faster and requires less data than training from scratch."
    },
    {
        "instruction": "What is LoRA and why is it useful?",
        "input": "",
        "output": "LoRA (Low-Rank Adaptation) is a technique for efficiently fine-tuning large models. Instead of updating all the model's parameters, LoRA adds small trainable adapters. This reduces memory requirements by 10-100x and training time significantly, making it possible to fine-tune large models on consumer hardware."
    },
    {
        "instruction": "Explain QLoRA.",
        "input": "",
        "output": "QLoRA combines quantization with LoRA. The base model is quantized to 4-bit precision (reducing memory by 4x), while the LoRA adapters train in higher precision. This allows fine-tuning of 70B+ parameter models on a single GPU while maintaining quality close to full fine-tuning."
    },
    {
        "instruction": "What is a transformer architecture?",
        "input": "",
        "output": "The transformer is the architecture behind modern LLMs. Its key innovation is the attention mechanism, which allows the model to focus on relevant parts of the input when generating each output. Unlike previous models, transformers process all tokens in parallel, making them faster to train."
    },
    # Practical concepts
    {
        "instruction": "What is tokenization?",
        "input": "",
        "output": "Tokenization is breaking text into smaller pieces called tokens that a model can process. These could be words, subwords, or characters. For example, 'unhappiness' might become ['un', 'happiness'] or ['un', 'happ', 'iness']. The tokenizer maps these tokens to numbers the model can work with."
    },
    {
        "instruction": "Explain what an embedding is.",
        "input": "",
        "output": "An embedding is a way to represent words or tokens as lists of numbers (vectors). Similar words have similar vectors. For example, 'king' and 'queen' would have similar embeddings. These vector representations allow models to understand relationships between words mathematically."
    },
    {
        "instruction": "What is the attention mechanism?",
        "input": "",
        "output": "The attention mechanism lets a model focus on different parts of the input when producing output. When translating 'the cat sat on the mat', the model pays attention to different input words when generating each output word. This allows it to handle long-range dependencies better than previous methods."
    },
    {
        "instruction": "What is inference in machine learning?",
        "input": "",
        "output": "Inference is using a trained model to make predictions on new data. During training, the model learns patterns. During inference, it applies what it learned to generate outputs. For LLMs, inference means generating text in response to prompts."
    },
    {
        "instruction": "Define batch size.",
        "input": "",
        "output": "Batch size is the number of training examples processed together before updating model weights. Larger batches are faster but need more memory and can lead to poorer generalization. Smaller batches are slower but often train better. A common range is 8-64 examples per batch."
    },
    # DGX Spark specific
    {
        "instruction": "What is unified memory in the context of DGX Spark?",
        "input": "",
        "output": "DGX Spark has 128GB of unified memory shared between the CPU and GPU. This means the GPU can access system memory when needed, allowing you to work with larger models than the GPU memory alone would allow. It's particularly useful for fine-tuning large language models."
    },
    {
        "instruction": "Why should you use NGC containers on DGX Spark?",
        "input": "",
        "output": "DGX Spark uses ARM64 architecture, which is different from standard x86 computers. Most pip packages like PyTorch are built for x86. NGC containers from NVIDIA contain ARM64-optimized versions of AI frameworks, ensuring everything works correctly with the DGX Spark's hardware."
    },
    {
        "instruction": "What is bfloat16 and why use it?",
        "input": "",
        "output": "bfloat16 is a 16-bit floating point format optimized for deep learning. It has the same range as float32 but less precision. Modern GPUs like the Blackwell in DGX Spark have dedicated bfloat16 hardware, making training 2x faster than float32 while maintaining quality."
    },
    {
        "instruction": "What is model quantization?",
        "input": "",
        "output": "Quantization reduces the precision of model weights from float32 (32 bits) to lower precision like int8 (8 bits) or int4 (4 bits). This reduces memory usage and increases inference speed with minimal quality loss. A 7B model that needs 14GB in float16 might only need 4GB in 4-bit."
    },
    {
        "instruction": "Explain DPO (Direct Preference Optimization).",
        "input": "",
        "output": "DPO is a training technique that teaches models to prefer certain responses over others without needing a separate reward model. You provide pairs of responses - one preferred, one rejected - and the model learns to generate more of the preferred style. It's simpler than RLHF but often equally effective."
    }
]

print(f"Created dataset with {len(custom_dataset)} examples")
print("\nSample entry:")
print(json.dumps(custom_dataset[0], indent=2))

In [None]:
# Save the dataset for LLaMA Factory
output_dir = Path("../data")
output_dir.mkdir(exist_ok=True)

dataset_path = output_dir / "ai_concepts_dataset.json"
with open(dataset_path, 'w') as f:
    json.dump(custom_dataset, f, indent=2)

print(f"Dataset saved to: {dataset_path}")

# Create the dataset_info.json entry
dataset_info_entry = {
    "ai_concepts": {
        "file_name": "ai_concepts_dataset.json",
        "columns": {
            "prompt": "instruction",
            "query": "input",
            "response": "output"
        }
    }
}

print("\nAdd this to LLaMA Factory's data/dataset_info.json:")
print(json.dumps(dataset_info_entry, indent=2))

### Training Configuration Used

Here's the configuration that works well for this dataset on DGX Spark:

| Setting | Value | Reason |
|---------|-------|--------|
| Model | Llama-3.1-8B-Instruct | Good balance of quality and speed |
| Finetuning Type | LoRA | Efficient, preserves base knowledge |
| Quantization | 4-bit | Reduces memory for faster iteration |
| LoRA Rank | 16 | Good for small dataset |
| LoRA Alpha | 32 | Standard 2x rank |
| Learning Rate | 2e-4 | Standard for LoRA |
| Epochs | 3 | Small dataset, don't overtrain |
| Batch Size | 4 | DGX Spark can handle more |
| Max Length | 512 | Our responses are short |

In [None]:
# Training configuration as YAML (for CLI usage)
training_config = """
# ai_concepts_training.yaml
# Training config for AI concepts dataset

# Model
model_name_or_path: meta-llama/Llama-3.1-8B-Instruct
trust_remote_code: true

# Training method
stage: sft
finetuning_type: lora

# Quantization
quantization_bit: 4

# Dataset
dataset: ai_concepts
template: llama3
cutoff_len: 512

# LoRA settings
lora_rank: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target: q_proj,k_proj,v_proj,o_proj

# Training
output_dir: ./output/ai-concepts-lora
per_device_train_batch_size: 4
gradient_accumulation_steps: 4
learning_rate: 2e-4
num_train_epochs: 3
lr_scheduler_type: cosine
warmup_ratio: 0.1

# Optimization
bf16: true
gradient_checkpointing: true

# Logging
logging_steps: 5
save_steps: 50
"""

print("Training Configuration:")
print(training_config)

### Test Results

After training, test the model in the Chat tab with these prompts:

In [None]:
# Test prompts and expected behavior
test_cases = [
    {
        "prompt": "What is machine learning?",
        "expected_behavior": "Should give a clear, simple explanation similar to training data style"
    },
    {
        "prompt": "Explain reinforcement learning.",
        "expected_behavior": "Should generalize from training examples to explain a related but not-trained concept"
    },
    {
        "prompt": "What are the benefits of using LoRA for fine-tuning?",
        "expected_behavior": "Should combine knowledge from LoRA and QLoRA examples"
    },
    {
        "prompt": "Why would I use DGX Spark for AI work?",
        "expected_behavior": "Should reference unified memory and NGC containers from training"
    }
]

print("Test Cases for Evaluating Fine-Tuned Model")
print("=" * 50)
for i, test in enumerate(test_cases, 1):
    print(f"\nTest {i}:")
    print(f"  Prompt: {test['prompt']}")
    print(f"  Expected: {test['expected_behavior']}")

## Summary

In this solution, we:

1. **Explored LLaMA Factory** and documented 5 key features:
   - Multi-method training support
   - Real-time training monitoring
   - Built-in chat testing
   - One-click GGUF export
   - Dataset preview

2. **Created a custom dataset** with 25 examples covering:
   - Basic AI/ML concepts
   - Training concepts
   - LLM concepts
   - DGX Spark specifics

3. **Configured training** with appropriate settings for DGX Spark

4. **Defined test cases** for evaluating the fine-tuned model

In [None]:
# Cleanup
import gc
gc.collect()

print("Solution notebook complete!")