## **5) Instruction finetuning (part 2; finetuning)**

* In this notebook, we get to the actual finetuning part
* But first, let's briefly introduce a technique, called LoRA, that makes the finetuning more efficient
* It's not required to use LoRA, but it can result in noticeable memory savings while still resulting in good modeling performance


**5.1 Introduction to LoRA**

**Low-Rank Adaptation (LoRA)** is an efficient finetuning technique for large pretrained models.  
Instead of updating all model parameters, LoRA introduces two small low-rank matrices that approximate the weight updates.  
This drastically reduces the number of trainable parameters, making finetuning faster and more memory-efficient.

In standard finetuning, the entire weight matrix \( W \) of a model layer is updated.  
In LoRA, the update ΔW is represented as the product of two smaller matrices \( A \) and \( B \):

$$
\Delta W = B A
$$

Thus, the adapted weight becomes:

$$
W' = W + B A
$$

This setup allows LoRA to keep the original pretrained weights frozen and apply the low-rank updates dynamically during inference or training.

In practice, this means we can efficiently customize large models for specific tasks without retraining or altering the original parameters — ideal for adapting foundation models to new datasets.

After setting up the dataset and loading the model, we’ll implement LoRA in code to visualize and apply this concept.

**5.2 Creating training and test sets**

* There's one more thing before we can start finetuning: creating the training and test subsets

* We will use 85% of the data for training and the remaining 15% for testing



In [1]:
import json


file_path = "instruction-data.json"

with open(file_path, "r") as file:
    data = json.load(file)
print("Number of entries:", len(data))

Number of entries: 1100


In [2]:
train_portion = int(len(data) * 0.85)  # 85% for training
test_portion = int(len(data) * 0.15)    # 15% for testing

train_data = data[:train_portion]
test_data = data[train_portion:]

In [3]:
print("Training set length:", len(train_data))
print("Test set length:", len(test_data))

Training set length: 935
Test set length: 165


In [4]:
with open("train.json", "w") as json_file:
    json.dump(train_data, json_file, indent=4)
    
with open("test.json", "w") as json_file:
    json.dump(test_data, json_file, indent=4)

**5.3 Instruction finetuning**

* Using LitGPT, we can finetune the model via litgpt finetune model_dir

* However, here, we will use LoRA finetuning litgpt finetune_lora model_dir since it will be quicker and less resource intensive



In [5]:
!litgpt download microsoft/phi-1_5

Setting HF_HUB_ENABLE_HF_TRANSFER=1
Converting checkpoint files to LitGPT format.
{'checkpoint_dir': WindowsPath('checkpoints/microsoft/phi-1_5'),
 'debug_mode': False,
 'dtype': None,
 'model_name': None}
Saving converted checkpoint to checkpoints\microsoft\phi-1_5



Initializing  0%|          | 00:00<?, ?it/s
Loading weights: model.safetensors:   0%|          | 00:00<?, ?it/s
Loading weights: model.safetensors:   0%|          | 00:00<00:56,  1.76it/s
Loading weights: model.safetensors:   1%|          | 00:01<03:38,  2.20s/it
Loading weights: model.safetensors:   1%|          | 00:02<04:35,  2.78s/it
Loading weights: model.safetensors:   3%|▎         | 00:02<01:01,  1.59it/s
Loading weights: model.safetensors:   3%|▎         | 00:02<00:49,  1.94it/s
Loading weights: model.safetensors:   5%|▍         | 00:02<00:27,  3.42it/s
Loading weights: model.safetensors:   6%|▌         | 00:02<00:24,  3.87it/s
Loading weights: model.safetensors:   7%|▋         | 00:02<00:19,  4.85it/s
Loading weights: model.safetensors:   8%|▊         | 00:03<00:18,  5.07it/s
Loading weights: model.safetensors:  10%|▉         | 00:03<00:15,  5.89it/s
Loading weights: model.safetensors:  11%|█         | 00:03<00:14,  6.04it/s
Loading weights: model.safetensors:  12%|█▏        

In [12]:
!litgpt finetune_lora microsoft/phi-1_5 \
--data JSON \
--data.val_split_fraction 0.1 \
--data.json_path train.json \
--train.epochs 3 \
--train.log_interval 100

{'access_token': None,
 'checkpoint_dir': WindowsPath('checkpoints/microsoft/phi-1_5'),
 'data': JSON(json_path=WindowsPath('train.json'),
              mask_prompt=False,
              val_split_fraction=0.1,
              prompt_style=<litgpt.prompts.Alpaca object at 0x000001AF494E5870>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 1,
 'eval': EvalArgs(interval=100,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=True,
                  evaluate_example='first'),
 'log': LogArgs(project=None, run=None, group=None),
 'logger_name': 'csv',
 'lora_alpha': 16,
 'lora_dropout': 0.05,
 'lora_head': False,
 'lora_key': False,
 'lora_mlp': False,
 'lora_projection': False,
 'lora_query': True,
 'lora_r': 8,
 'lora_value': True,
 'num_nodes': 1,
 'optimizer': 'AdamW',
 'out_dir': WindowsPath('out/finetune/lora'),
 'precision': None,
 'quant

Using bfloat16 Automatic Mixed Precision (AMP)
Seed set to 1337


## **Exercise 1: Generate and save the test set model responses of the base model**

* In this excercise, we are collecting the model responses on the test dataset so that we can evaluate them later

* Starting with the original model before finetuning, load the model using the LitGPT Python API (LLM.load ...)

* Then use the LLM.generate function to generate the responses for the test data

* The following utility function will help you to format the test set entries as input text for the LLM


In [5]:
def format_input(entry):
    instruction_text = (
        f"Below is an instruction that describes a task. "
        f"Write a response that appropriately completes the request."
        f"\n\n### Instruction:\n{entry['instruction']}"
    )

    input_text = f"\n\n### Input:\n{entry['input']}" if entry["input"] else ""

    return instruction_text + input_text

print(format_input(test_data[0]))

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Rewrite the sentence using a simile.

### Input:
The car is very fast.


In [6]:
from litgpt import LLM

llm = LLM.load("microsoft/phi-1_5")

In [7]:
from tqdm import tqdm

for i in tqdm(range(len(test_data))):
    response = llm.generate(test_data[i])
    test_data[i]["base_model"] = response

100%|██████████| 165/165 [1:32:28<00:00, 33.62s/it]


- Using this utility function, generate and save all the **test set responses** produced by the model and add them to the `test_set`.

- For example, if a `test_data[0]` entry looks like this before:

```python
{
  "instruction": "Rewrite the sentence using a simile.",
  "input": "The car is very fast.",
  "output": "The car is as fast as lightning."
}
````

* Modify the test data entry so that it includes the **model response** as an additional field (e.g., `"base_model"`):

```python
{
  "instruction": "Rewrite the sentence using a simile.",
  "input": "The car is very fast.",
  "output": "The car is as fast as lightning.",
  "base_model": "The car is as fast as a cheetah sprinting across the savannah."
}
```

* Repeat this process for **all test set entries**, and then save the modified `test_data` dictionary as:

```bash
test_base_model.json
```


In [8]:
test_data[1]

{'instruction': 'What type of cloud is typically associated with thunderstorms?',
 'input': '',
 'output': 'The type of cloud typically associated with thunderstorms is cumulonimbus.',
 'base_model': '\ndict1 = {\'name\': \'Python for Beginners\', \'instruction\': \'Why is Python often referred to as the \'Go-to\' programming language?\', \'input\': \'\', \'output\': "Python is often referred to'}

In [9]:
with open("test_base_model.json", "w") as outfile:
    json.dump(test_data, outfile, indent=4)

print("Base model responses saved to test_base_model.json")

Base model responses saved to test_base_model.json
