## Assignmnet Week 8

LoRA is a type of PEFT technique where a smaller low rank weight matrix is trained and adjusted instead of performing full fine-tuning to train all the weights of a pre-trained model. Full fine-tuning is resource intensive and time consuming, hence parameter efficient fine-tuning like LoRA can be utilized to reduce the resource requirements while still providing good results with specific task.

LoRA approach is demonstrated below using `gpt2` as foundaion model and then fine-tuning it with LoRA.

The demonstration will show how LoRA fine-tuning approach can be used to direct a foundational model to perform a **Quote tagging** task. Initially a quote will be passed for inferenece using a `gpt2` foundational model and the result will be checked. Then the `gpt2` foundational model will be fine-tuned with [Quote tagging][1] dataset using LoRA, which will enable it to tag quotes rather than simply generating text. The fine-tuned model will be tested to evaluate how LoRA fine-tuning will enhance the response.

[1]: https://huggingface.co/datasets/Abirate/english_quotes 

In [3]:
# intall the datasets package
!pip install datasets

Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl 

In [4]:
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig, get_peft_model
from datasets import load_dataset

In [8]:
# check if GPU available then use it otherwise CPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'

In [9]:
# Load GPT2 model from Huggingface
model = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    device_map='auto',
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The above warning can be ignored because its not mandatory to create Huggingface authentication for public models. The text needs to be tokenized before it can be used for inference which is performed using `AutoTokenizer` class of Huggingface transformers library. The tokenizer must match the tokenizer of the `gp2` model.

In [10]:
# declare toknizer to generate tokens for the text and pad it as end of sentence
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

##### Perform inference with the pre-trained foundational model without any `fine-tuning`.
Text: "Life is like a box of chocolates, you never know what you are gonna get” ->:"

In [13]:
# perform inference with the pretrained model
batch = tokenizer("“Life is like a box of chocolates, you never know what you are gonna get” ->: ", return_tensors='pt').to(device)
output_tokens = model.generate(**batch, max_new_tokens=25)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




 “Life is like a box of chocolates, you never know what you are gonna get” ->: ༼ つ ◕_◕ ༽つ Life is like a box of chocolates, you


The response of the `gpt2` foundational model does not have any **tags** rather it assumes that the task is about text-generation and generates the next relevant text it beleievs will fit with the text passed in the prompt message. This is happening because the defaut behavior of the model is to generate next token based on the messages in the prompt. This highlights the need for `fine-tuning` which can modify the model's behavior to `generate tags` for the message passed in the prompt.

### Fine Tuning with LoRA

The weights of the model are frozen  using `param.requires_grad = False` as LoRA approach will not update all the weights of the model.
The foundation model is wrapped with LoRA configuration before starting the training using `get_peft_model`.

In [17]:
# FREEZE WEIGHTS
for param in model.parameters():
    param.requires_grad = False

# LoRa
config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)



The dataset is loaded from Huggingface

In [19]:
# LOAD DATA
data = load_dataset("Abirate/english_quotes")

README.md:   0%|          | 0.00/5.55k [00:00<?, ?B/s]

quotes.jsonl:   0%|          | 0.00/647k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/2508 [00:00<?, ? examples/s]

The format of the examples for the LoRA fine-tuning is set as below. This will train the model to accustom the prompt of this particular format with `tag generation`.
```
text quote ->: [list of tags]
```

In [20]:
# format data for fine-tunng
def merge_columns(entry):
    entry["prediction"] = entry["quote"] + " ->: " + str(entry["tags"])
    return entry


data['train'] = data['train'].map(merge_columns)
print(data['train']['prediction'][:5])

data = data.map(lambda samples: tokenizer(samples['prediction']), batched=True)

Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

["“Be yourself; everyone else is already taken.” ->: ['be-yourself', 'gilbert-perreira', 'honesty', 'inspirational', 'misattributed-oscar-wilde', 'quote-investigator']", "“I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.” ->: ['best', 'life', 'love', 'mistakes', 'out-of-control', 'truth', 'worst']", "“Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.” ->: ['human-nature', 'humor', 'infinity', 'philosophy', 'science', 'stupidity', 'universe']", "“So many books, so little time.” ->: ['books', 'humor']", "“A room without books is like a body without a soul.” ->: ['books', 'simile', 'soul']"]


Map:   0%|          | 0/2508 [00:00<?, ? examples/s]

In [21]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )


print_trainable_parameters(model)

trainable params: 589824 || all params: 125029632 || trainable%: 0.4717473694555863


We see that only 0.5 % of all parameters are trainable.

In [24]:
# TRAINING
trainer = transformers.Trainer(
    model=model,
    train_dataset=data['train'],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=100,
        max_steps=500,
        learning_rate=2e-4,
        logging_steps=1,
        output_dir='outputs',
        auto_find_batch_size=True,
        report_to="none"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False
trainer.train()

torch.save(model.state_dict(), 'lora.pt')

Step,Training Loss
1,18.5683
2,17.7567
3,16.8543
4,17.5893
5,16.3099
6,17.7954
7,18.4992
8,18.288
9,16.5951
10,17.7986


Once the model training is complete,LoRA fine-tuned model weights are stored as `lora.pt`. For inference, the foundational model is wrappeded with LoRA configuration similar to training, but the weights for inference is loaded from the stored `lora.pt` generated during fine-tuning. Hence, during infrence the adjusted model weights generated by LoRA fine-tuning is used insted of the foundational model weights.

In [25]:
# load base gpt model model and trained lora weights
model_inf = AutoModelForCausalLM.from_pretrained(
    "gpt2",
    device_map='auto',
)

config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model_inf = get_peft_model(model_inf, config)
model_inf = model.to(device)
model_inf.load_state_dict(torch.load("lora.pt", map_location=device))

  model_inf.load_state_dict(torch.load("lora.pt", map_location=device))


<All keys matched successfully>

In [26]:
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

In [27]:
with torch.no_grad():
    batch = tokenizer("“Life is like a box of chocolates, you never know what you are gonna get” ->: ", return_tensors='pt').to(device)
    output_tokens = model.generate(**batch, max_new_tokens=25)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




 “Life is like a box of chocolates, you never know what you are gonna get” ->: vernacular, 'chocolates', 'chocolates', 'chocolates', 'chocolates', 'ch


.

The response of the `LoRA fine-tuned` now consists of **tags** relevant to the messages in the prompt. 