<a href="https://colab.research.google.com/github/Ankur-krGarg/Fine-Tuning-Language-Models-_with_LoRA_and_SFT/blob/main/Fine_Tunning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Fine Tunning using LoRA & SFT**

In [None]:
!pip install --upgrade sympy

In [None]:
!pip install -q transformers
!pip install trl
!pip install deeplake
!pip install wandb
!pip install peft

In [None]:
import os
from google.colab import userdata
cohere_api_key = userdata.get ("COHERE_API_KEY")
ACTIVELOOP_TOKEN = userdata.get("ACTIVELOOP_TOKEN")
HF_TOKEN = userdata.get("HFG_TOKEN")

In [None]:
#export WANDB_PROJECT=GenAI360

import torch; torch.set_num_threads(8);

#**Load the Dataset**

In [None]:
import deeplake

ds = deeplake.query('SELECT * FROM "hub://genai360/GAIR-lima-train-set"')
ds_test = deeplake.query('SELECT * FROM "hub://genai360/GAIR-lima-test-set"')

**Formatting Function called**<br>
Takes a row of data in Deep Lake format as input and formats it to begin with a question followed by the answer that is separated by two newlines.

In [None]:
def prepare_sample_text(example):
  """Prepare the text from a sample of the dataset."""
  text = f"Question: {example['question']}\n\nAnswer: {example['answer']}"

  return text

****Load the Pre-Trained Tokenizer Object for OPT****

In [None]:
from transformers import AutoTokenizer

model_id = "facebook/opt-1.3b"
tokenizer = AutoTokenizer.from_pretrained(model_id)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/653 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/441 [00:00<?, ?B/s]

**Initialize the Dataset - used for fine Tunning Model**

In [None]:
from trl.trainer import ConstantLengthDataset

train_dataset = ConstantLengthDataset(
    tokenizer,
    ds,
    formatting_func=prepare_sample_text,
    infinite=True,
    seq_length=1024
)

# Show one sample from train set
iterator = iter(train_dataset)
sample=next(iterator)
print(sample)

{'input_ids': tensor([  507, 28700,    11,  ...,   495,    47,   240]), 'labels': tensor([  507, 28700,    11,  ...,   495,    47,   240])}


In [None]:
train_dataset.stbart_iteration = 0

In [None]:
eval_dataset= ConstantLengthDataset(
    tokenizer,
    ds_test,
    formatting_func=prepare_sample_text,
    infinite=False,
    seq_length=1024
)

#**Initialize Model & trainer**

**Add LoRA Layers**

In [None]:
from peft import LoraConfig

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

**Configure the Training Argument**

In [None]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./OPT-fine_tuned-LIMA-CPU",
    dataloader_drop_last=True,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    num_train_epochs=.2,
    logging_steps=1,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    learning_rate=1e-4,
    lr_scheduler_type="cosine",
    warmup_steps=100,
    gradient_accumulation_steps=1,
    bf16=True,
    weight_decay=0.05,
    run_name="OPT-fine_tuned-LIMA-CPU",
    report_to="wandb"
    )





In [None]:
from transformers import AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained( "facebook/opt-1.3b", torch_dtype=torch.bfloat16
)

pytorch_model.bin:   0%|          | 0.00/2.63G [00:00<?, ?B/s]



model.safetensors:   0%|          | 0.00/2.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [None]:
import torch.nn as nn
from torch.nn import Sequential

for param in model.parameters():
  param.requires_grad=False # Freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce the number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Module): # Changed from (nn,Sequential) to nn.Module
  def __init__(self, module):
      super().__init__()
      self.module = module

  def forward(self, x):
      return self.module(x).to(torch.float32)

model.lm_head = CastOutputToFloat(model.lm_head)


**SFT Trainer class to tie all componenets tigether**

In [None]:
from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=lora_config,

)



**Impact of LoRA**,<br>
Lets create simple function (to see impact of LoRA) that calculates the number of available parameters in the model and compares it with the trainable parameters.

trainable parameters refer to the ones that LoRA added to the base model.

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params=0
    all_param =0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all Params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
print_trainable_parameters(trainer.model)

trainable params: 3145728 || all Params: 1318903808 || trainable%: 0.23851079820371554


In [None]:
#wandb.init(project="GenAI360")
print("Training...")
trainer.train()

Training...


[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Epoch,Training Loss,Validation Loss


Epoch,Training Loss,Validation Loss
0,2.4933,2.581927


TrainOutput(global_step=206, training_loss=2.4682202564859854, metrics={'train_runtime': 927.923, 'train_samples_per_second': 0.222, 'train_steps_per_second': 0.222, 'total_flos': 1533666266185728.0, 'train_loss': 2.4682202564859854, 'epoch': 0.2})

#**Merging LoRA & OPT**

In [None]:
import os

print("saving last checkpoint of the model")
trainer.model.save_pretrained(os.path.join("./OPT-fine_tuned-LIMA", "final_checkpoint/"))

saving last checkpoint of the model


#**Merge LoRA Weights with Base Model**

In [None]:
from transformers import AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
  "facebook/opt-1.3b", return_dict=True, torch_dtype=torch.bfloat16
)



In [None]:
from peft import PeftModel

# Load the Lora model
model = PeftModel.from_pretrained(model, "./OPT-fine_tuned-LIMA/final_checkpoint/")
model.eval()

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): OPTForCausalLM(
      (model): OPTModel(
        (decoder): OPTDecoder(
          (embed_tokens): Embedding(50272, 2048, padding_idx=1)
          (embed_positions): OPTLearnedPositionalEmbedding(2050, 2048)
          (final_layer_norm): LayerNorm((2048,), eps=1e-05, elementwise_affine=True)
          (layers): ModuleList(
            (0-23): 24 x OPTDecoderLayer(
              (self_attn): OPTSdpaAttention(
                (k_proj): Linear(in_features=2048, out_features=2048, bias=True)
                (v_proj): lora.Linear(
                  (base_layer): Linear(in_features=2048, out_features=2048, bias=True)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=2048, out_features=16, bias=False)
                  )
                  (lora_B): ModuleDict(
       

In [None]:
model = model.merge_and_unload()

model.save_pretrained("./OPT-fine_tuned-LIMA/merged", safe_serialization=False)

#**INFERENCE**

evaluate the fine-tuned model’s outputs by employing various prompts

In [None]:
inputs = tokenizer("Question: Write a recipe with chicken.\n\n Answer: ", return_tensors="pt")

generation_output = model.generate(**inputs,
                                   return_dict_in_generate=True,
                                   output_scores=True,
                                   max_length=256,
                                   num_beams=1,
                                   do_sample=True,
                                   repetition_penalty=1.5,
                                   length_penalty=2.)

print( tokenizer.decode(generation_output['sequences'][0]) )



</s>Question: Write a recipe with chicken.

 Answer:  Cook in cast-iron pan (like for steaks or pork). Get some wood chips, put them around the outside of your stove and add coals to that part while you brown... Then close off all doors!
That only works on really soft meat like ground beef/sausages etc ... Otherwise it's too hard / expensive. :) Maybe I need more tips how to make my cooking much better ;)</s>


In [None]:
inputs = tokenizer("Question: Write a recipe with chicken.\n\n Answer: ", return_tensors="pt")

generation_output = model2.generate(**inputs,
                                   return_dict_in_generate=True,
                                   output_scores=True,
                                   max_length=256,
                                   num_beams=1,
                                   do_sample=True,
                                   repetition_penalty=1.5,
                                   length_penalty=2.)

print( tokenizer.decode(generation_output['sequences'][0]) )

NameError: name 'model2' is not defined