# Fine-Tuning LLaMA Tutorial with Explanations


In this practical seminar, we will go through the full pipeline for fine-tuning the LLaMA model.
Each section includes theoretical context and links to documentation to ensure a comprehensive understanding 
of the implementation. **Note**: Ensure you have installed all required packages as listed below.


In [None]:
!pip install -q transformers accelerate bitsandbytes datasets


**Theory**: We begin by installing essential packages, including `transformers` for the model, `datasets` for handling our dataset, 
and `accelerate` to efficiently distribute computations. `bitsandbytes` allows for lower-precision quantization, optimizing performance.
Refer to [Transformers documentation](https://huggingface.co/docs/transformers/index) for more details.


In [None]:
!nvidia-smi

### Checking GPU Availability
`nvidia-smi` command is used to verify GPU status and memory.

In [None]:

from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from datasets import Dataset
import torch
import random
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
logger.info("Libraries imported successfully.")



### Importing Required Libraries

- `AutoModelForCausalLM` and `AutoTokenizer` from Hugging Face's `transformers` for loading a pre-trained language model.
- `Dataset` from `datasets` for creating and managing data efficiently.
- `Trainer` and `TrainingArguments` for configuring and training the model.
- `torch` for handling tensor operations.
- `logging` for enabling info-level logging to track progress.

For more on each of these, refer to the [datasets library](https://huggingface.co/docs/datasets/) and [transformers library](https://huggingface.co/docs/transformers/) documentation.


In [None]:

def gen_dataset(size, digits=(2, 18), operation='addition'):
    logger.info("Generating dataset with varied difficulty.")
    for _ in range(size):
        a = random.randint(10**digits[0], 10**digits[1])
        b = random.randint(10**digits[0], 10**digits[1])
        if operation == 'addition':
            c = a + b
            prompt = f'Calculate the sum of {a} and {b}: {c}'
        elif operation == 'multiplication':
            c = a * b
            prompt = f'Calculate the product of {a} and {b}: {c}'
        yield {'prompt': prompt, 'response': str(c)}



### Dataset Generation Function

This function generates a dataset of simple arithmetic problems, with each example containing a prompt (arithmetic question) 
and the corresponding answer. Here, we're generating synthetic data for fine-tuning purposes.


In [None]:

train_dataset = Dataset.from_generator(gen_dataset, gen_kwargs={"size": 400, "digits": (2, 18), "operation": "addition"})
test_dataset = Dataset.from_generator(gen_dataset, gen_kwargs={"size": 40, "digits": (8, 10), "operation": "multiplication"})
logger.info(f"Generated train dataset size: {len(train_dataset)}, test dataset size: {len(test_dataset)}")
train_dataset



### Creating Train and Test Datasets

Using `from_generator`, we generate two datasets (train and test) for addition and multiplication operations. 
The [datasets.from_generator](https://huggingface.co/docs/datasets/v2.1.0/en/package_reference/main_classes#datasets.Dataset.from_generator) function creates a `Dataset` from a generator function.


In [None]:

from huggingface_hub import login
login()



### Logging into Hugging Face Hub

This cell logs into the Hugging Face Hub for accessing pretrained models and saving fine-tuned models. 
See [huggingface_hub login documentation](https://huggingface.co/docs/huggingface_hub/quick_start#step-3-log-in-to-the-hugging-face-hub).


In [None]:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)



### Configuring BitsAndBytes Quantization

We configure BitsAndBytes for 4-bit quantization to reduce memory usage, making model fine-tuning more efficient. 
Read more about [BitsAndBytesConfig](https://huggingface.co/docs/transformers/main_classes/configuration#transformers.BitsAndBytesConfig).


In [None]:

model_name = "meta-llama/Llama-3.2-1B"
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="right", add_eos_token=True, add_bos_token=True)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config, device_map="auto")
logger.info(f"Model {model_name} loaded with {model.num_parameters()} parameters.")



### Loading the Pre-trained Model and Tokenizer

We initialize the LLaMA model and tokenizer, setting `add_eos_token` and `add_bos_token` for proper tokenization handling.
The `device_map="auto"` automatically distributes model parts across available devices, improving efficiency. 
For more, see [AutoTokenizer documentation](https://huggingface.co/docs/transformers/main_classes/tokenizer).


In [None]:

def tokenize_data(prompt):
    return tokenizer(prompt['prompt'])

train_dataset = train_dataset.map(tokenize_data)
test_dataset = test_dataset.map(tokenize_data)



### Tokenizing the Dataset

This function tokenizes each prompt from the dataset, enabling the model to process them effectively. 
The `map` function applies the tokenization across all dataset examples. For more, see [map documentation](https://huggingface.co/docs/datasets/process#map).


In [None]:

import matplotlib.pyplot as plt

def plot_data_lengths(tokenized_train_dataset, tokenized_val_dataset):
    lengths = [len(x['input_ids']) for x in tokenized_train_dataset]
    lengths += [len(x['input_ids']) for x in tokenized_val_dataset]
    plt.figure(figsize=(10, 6))
    plt.hist(lengths, bins=20, alpha=0.7, color='blue')
    plt.xlabel('Length of input_ids')
    plt.ylabel('Frequency')
    plt.title('Distribution of Lengths of input_ids')
    plt.show()

plot_data_lengths(train_dataset, test_dataset)



### Plotting Data Lengths

This section provides a histogram of tokenized prompt lengths, visualizing the distribution of input lengths.
This can help in setting max length parameters for padding/truncation during training.


In [None]:

max_length = 32

def tokenize_data(prompt):
    result = tokenizer(
        prompt['prompt'],
        truncation=True,
        max_length=max_length,
        padding="max_length",
    )
    result["labels"] = result["input_ids"].copy()
    return result

train_dataset = train_dataset.map(tokenize_data)
test_dataset = test_dataset.map(tokenize_data)
logger.info("Tokenization complete with diagnostic shape checks.")



### Applying Padding and Label Creation

This code modifies tokenization to include truncation, padding, and copying input IDs to labels, which is necessary 
for certain types of language model training. See [padding and truncation documentation](https://huggingface.co/docs/transformers/padding_and_truncation).


In [None]:

from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)



### Preparing Model for Efficient Training with K-Bit Precision

This section enables gradient checkpointing, reducing memory usage during training. 
For more, see [gradient checkpointing](https://huggingface.co/docs/transformers/main_classes/accelerate#transformers.Accelerate.gradient_checkpointing_enable).


In [None]:

def print_trainable_parameters(model):
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}")



### Printing Trainable Parameters

This function calculates and displays the number of trainable parameters in the model, a critical metric for efficient fine-tuning.


In [None]:

from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=32,
    lora_alpha=64,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    lora_dropout=0.05,
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)
print_trainable_parameters(model)



### Configuring and Applying LoRA

LoRA (Low-Rank Adaptation) configuration reduces memory and computational requirements, focusing on specific layers. 
For more, check [LoRA documentation](https://huggingface.co/docs/peft/api/lora).


In [None]:

training_args = TrainingArguments(
    output_dir="./llama_finetuned",
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=5,
    learning_rate=3e-5,
    fp16=True,
    logging_steps=50,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_dir='./logs',
)
logger.info("Training configuration set with advanced parameters.")



### Setting Training Arguments

Here we define `TrainingArguments` for fine-tuning, such as batch size, learning rate, and logging. 
For more details, refer to [TrainingArguments documentation](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments).


In [None]:

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)
logger.info("Trainer initialized with training and evaluation datasets.")
trainer.train()



### Initializing and Training the Model

The `Trainer` class simplifies model training, managing the training loop, evaluation, and logging.
Check the [Trainer documentation](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer) for more information.


In [None]:

eval_results = trainer.evaluate()
logger.info(f"Evaluation Results: {eval_results}")


### Model Evaluation
Evaluates the fine-tuned model using the test dataset and logs results.

In [None]:

for i in range(5):
    example = test_dataset[i]
    input_ids = tokenizer(example['prompt'], return_tensors="pt").input_ids
    output_ids = model.generate(input_ids, max_new_tokens=32)
    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    print(f"Input: {example['prompt']})
Expected Output: {example['response']}
Model Output: {output_text}")



### Testing Model Outputs

Generates outputs for the test set to compare model predictions with expected responses.


In [None]:

sample_inputs = ["Calculate the sum of 10 and 15:", "Calculate the sum of 4 and 6:"]
for input_text in sample_inputs:
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids
    output_ids = model.generate(input_ids)
    output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    print(f"Input: {input_text}
Output: {output_text}")



### Additional Test Cases

Here, we input new arithmetic prompts to see how the model generalizes to similar tasks beyond the test dataset.
