# Code Generation application - fine-tuning LLMs using QloRA

## Fine-tune phi 1.5b on Google colab

This jupyter notebook aims to fine-tune phi 1.5b on evolved codealpaca dataset as a sample for showing how to fine-tune LMs using QloRA

## Setup

Run the cells below to setup and install the required libraries. For our experiment we will need `accelerate`, `peft`, `transformers`, `datasets`,`scipy` and `TRL` to leverage [`SFTTrainer`](https://huggingface.co/docs/trl/main/en/sft_trainer). We will use `bitsandbytes` to [quantize the base model into 4bit](https://huggingface.co/blog/4bit-transformers-bitsandbytes). We will also install `einops` but it was mainly used for loading falcon so I will remove it in later versions.

In [None]:
!nvidia-smi

Sun Oct 15 15:35:26 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P8     9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
import torch
print(f"Is CUDA supported by this system?{torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")

Is CUDA supported by this system?True
CUDA version: 11.8


In [None]:
!pip install transformers==4.30
!pip install -q -U trl accelerate sentencepiece git+https://github.com/huggingface/peft.git
!pip install -q -U datasets bitsandbytes einops scipy wandb

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## Dataset



In [None]:
from datasets import load_dataset

dataset_name = 'theblackcat102/evol-codealpaca-v1'
dataset = load_dataset(dataset_name, split="train")

In [None]:
dataset

Dataset({
    features: ['instruction', 'output'],
    num_rows: 111272
})

In [None]:
dataset[0]

{'instruction': "Please amend the subsequent Python script so that it includes a 'while' loop rather than the existing 'for' loop, which iterates through the items of an integer list.\n\nThe script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. \n\n```python\n  # Establish an integer list\n  arr = [1, 2, 3, 4]\n\n  # Determine the length of the list\n  n = len(arr)\n\n  # Traverse the list and output each individual element\n  for i in range(n+1):\n      print(arr[i])\n```",
 'output': '```python\n# Establish an integer list\narr = [1, 2, 3, 4]\n\n# Determine the length of the list\nn = len(arr)\n\n# Initialize index at 0\ni = 0\n\n# Traverse the list and output each individual element\nwhile i < n:\n    print(arr[i])\n    i += 1\n```\nIn the given code, it tries to access `arr[n]` which is out of bounds as python

In [None]:
len(dataset)

111272

In [None]:
type(dataset)

datasets.arrow_dataset.Dataset

In [None]:
# select only 100 records to minimise training time
dataset_sub = dataset.select(range(1000))
print(len(dataset_sub))
dataset_sub[0]

100


{'instruction': "Please amend the subsequent Python script so that it includes a 'while' loop rather than the existing 'for' loop, which iterates through the items of an integer list.\n\nThe script currently has a bug where it attempts to print an object that is outside the bounds of the list. Fix this error and modify the script to use 'while' instead of 'for' loop. Ensure your script correctly handles empty lists. \n\n```python\n  # Establish an integer list\n  arr = [1, 2, 3, 4]\n\n  # Determine the length of the list\n  n = len(arr)\n\n  # Traverse the list and output each individual element\n  for i in range(n+1):\n      print(arr[i])\n```",
 'output': '```python\n# Establish an integer list\narr = [1, 2, 3, 4]\n\n# Determine the length of the list\nn = len(arr)\n\n# Initialize index at 0\ni = 0\n\n# Traverse the list and output each individual element\nwhile i < n:\n    print(arr[i])\n    i += 1\n```\nIn the given code, it tries to access `arr[n]` which is out of bounds as python

## Loading the model

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer

model_name = "microsoft/phi-1_5"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)
model.config.use_cache = False

Let's also load the tokenizer below

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

In [None]:
from peft import LoraConfig, get_peft_model

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    target_modules=["fc1", "fc2","Wqkv", "out_proj"],
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM"
)

## Loading the trainer

Here we will use the [`SFTTrainer` from TRL library](https://huggingface.co/docs/trl/main/en/sft_trainer) that gives a wrapper around transformers `Trainer` to easily fine-tune models on instruction based datasets using PEFT adapters. Let's first load the training arguments below.

In [None]:
from transformers import TrainingArguments

output_dir = "./results"
per_device_train_batch_size = 1
gradient_accumulation_steps = 1
optim = "paged_adamw_32bit"
save_steps = 100
logging_steps = 10
learning_rate = 4e-3
max_grad_norm = 0.3
max_steps = 100 #max_steps = -1
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
    num_train_epochs=1,
)

Then finally pass everthing to the trainer

In [None]:
print(model)

MixFormerSequentialForCausalLM(
  (layers): Sequential(
    (0): Embedding(
      (wte): Embedding(51200, 2048)
      (drop): Dropout(p=0.0, inplace=False)
    )
    (1): ParallelBlock(
      (ln): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
      (resid_dropout): Dropout(p=0.0, inplace=False)
      (mixer): MHA(
        (rotary_emb): RotaryEmbedding()
        (Wqkv): Linear4bit(in_features=2048, out_features=6144, bias=True)
        (out_proj): Linear4bit(in_features=2048, out_features=2048, bias=True)
        (inner_attn): SelfAttention(
          (drop): Dropout(p=0.0, inplace=False)
        )
        (inner_cross_attn): CrossAttention(
          (drop): Dropout(p=0.0, inplace=False)
        )
      )
      (mlp): MLP(
        (fc1): Linear4bit(in_features=2048, out_features=8192, bias=True)
        (fc2): Linear4bit(in_features=8192, out_features=2048, bias=True)
        (act): NewGELUActivation()
      )
    )
    (2): ParallelBlock(
      (ln): LayerNorm((2048,), eps=1

In [None]:
from trl import SFTTrainer

max_seq_length = 1024

# https://huggingface.co/docs/trl/sft_trainer
def formatting_prompts_func(example):
    output_texts = []
    for i in range(len(example['instruction'])):
        text = f"### Instruction: {example['instruction'][i]}\n ### Response: {example['output'][i]}"
        output_texts.append(text)
    return output_texts

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset_sub,
    peft_config=peft_config,
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    formatting_func=formatting_prompts_func,
    # dataset_text_field="text",
)

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

We will also pre-process the model by upcasting the layer norms in float 32 for more stable training

In [None]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

## Train the model

Now let's train the model! Simply call `trainer.train()`

In [None]:
import os
# Set PYTORCH_CUDA_ALLOC_CONF environment variable
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "caching_allocator"

In [None]:
trainer.train()

[34m[1mwandb[0m: Currently logged in as: [33mleongkwokhing[0m. Use [1m`wandb login --relogin`[0m to force relogin


You're using a CodeGenTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


`attention_mask` is not supported during training. Using it might lead to unexpected results.


Step,Training Loss
10,1.4178
20,1.5295
30,1.782
40,1.9502
50,1.6426
60,1.2654
70,1.5375
80,2.5839
90,2.2757
100,2.0141


`attention_mask` is not supported during training. Using it might lead to unexpected results.
`attention_mask` is not supported during training. Using it might lead to unexpected results.
`attention_mask` is not supported during training. Using it might lead to unexpected results.
`attention_mask` is not supported during training. Using it might lead to unexpected results.
`attention_mask` is not supported during training. Using it might lead to unexpected results.
`attention_mask` is not supported during training. Using it might lead to unexpected results.
`attention_mask` is not supported during training. Using it might lead to unexpected results.
`attention_mask` is not supported during training. Using it might lead to unexpected results.
`attention_mask` is not supported during training. Using it might lead to unexpected results.
`attention_mask` is not supported during training. Using it might lead to unexpected results.
`attention_mask` is not supported during training. Using it 

TrainOutput(global_step=100, training_loss=1.799861192703247, metrics={'train_runtime': 90.0492, 'train_samples_per_second': 1.111, 'train_steps_per_second': 1.111, 'total_flos': 239553548144640.0, 'train_loss': 1.799861192703247, 'epoch': 1.0})

The `SFTTrainer` will take care of properly saving only the adapters during training instead of saving the entire model.

In [None]:
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model  # Take care of distributed/parallel training
model_to_save.save_pretrained("outputs")

In [None]:
device = "cuda:0"

lora_config = LoraConfig.from_pretrained('outputs')
model = get_peft_model(model, lora_config).to(device)

In [None]:
%%time

text = '''### Instruction:\nHow do i create a pandas dataframe in Python?\n### Response:\n'''
device = "cuda:0"

inputs = tokenizer(text, return_tensors="pt", return_attention_mask=False).to(device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.1, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
# print(tokenizer.decode(outputs[0], skip_special_tokens=True))
print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


### Instruction:
How do i create a pandas dataframe in Python?
### Response:
To create a DataFrame, you can use the pd.DataFrame() function and pass your list of lists as an argument to it. Here's how we would go about creating our 'data' dictionary from scratch using this method - 

    import numpy as np  # Importing NumPy library for array operations
    np_array = [1, 2, 3] # Creating 1D Array with values {0}
    df=pd.DataFrame({"A": np_array})   # Converting List into Pandas Data Frame
print("\nThe created DataFrame is:\n", df)     # Printing out the resulting DataFrame object
```
This will output `{'A': 0}`, which means that there are no rows or columns present yet! We'll add them later on when needed. The first step was just converting each row (list within the main list) inside another list into its own column/row by adding two new keys at index zero ("column") and one more key "index" respectively. This way all elements were aligned correctly according to their respective pos

In [None]:
%%time

device = torch.device('cuda:0')
text = '''### Instruction:\nHow to build a neural network using pytorch?\n### Response:\n'''

inputs = tokenizer(text, return_tensors="pt", return_attention_mask=False).to(device)
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=0.1, top_p=0.9, use_cache=True, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.batch_decode(outputs)[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


### Instruction:
How to build a neural network using pytorch?
### Response:
To create and train the model, we will use PyTorch's nn.Module class for defining our layers of neurons in the network. We'll also define an optimizer (SGD) that minimizes this loss function by updating weights based on gradients calculated from backpropagation through these layers. 

    # Define the architecture of your neural net here...
```python
import torch
from torch import nn

class Net(nn.Module):  # This is how you should start building your Neural Network!
   def __init__(self): # Initialize the parameters with random values between -0.1 and 0.9
      super().__init__()
      self.fc = nn.Linear(10, 5)     # 10 input features -> 5 output classes
      for param in self.parameters():       # Loop over all learnable params
         if len(param.shape) > 1:        # If it has more than one dimension
            nn.init.uniform_(param, -0.1, 0.9) # Set each parameter value within [-0.1; 0.9]
          el