### Table of Contents:
 
1. Installing Libraries & Loading Libraries
2. Loading and Pre-Processing Data<br>
3. Loading Gemma <br>
   a. What should be the value of target modules?<br>
   b. Initilizing LoRA<br>
   c. Training Model<br>
   d. Saving Model<br>
   e. Load and Merge Model Weights<br> 
4. Text Generation Using LoRA Model


# 1. Installing and Loading Libraries

In [None]:
!pip install -q peft trl==0.7.10 evaluate transformers==4.38.0 accelerate==0.27.2 bitsandbytes==0.42.0 datasets==2.18.0

In [32]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments,EvalPrediction,pipeline
from peft import LoraConfig, TaskType, get_peft_model , PeftModel
from accelerate.utils import release_memory
from accelerate import Accelerator
from datasets import Dataset
import evaluate

import numpy as np
import pandas as pd
pd.set_option('display.max_colwidth', None)

from trl import SFTTrainer, setup_chat_format

import torch
import torch.nn

import os
os.environ['TOKENIZERS_PARALLELISM'] = "FALSE"
import warnings
warnings.filterwarnings("ignore")

import gc
gc.collect()

522

 # 2. Loading & Pre-Processing Data

In [29]:
code = pd.read_csv("/kaggle/input/python-code-instruction-dataset/train.csv")
code = code.sample(frac = 0.1,random_state = 101,ignore_index=True)
code.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1861 entries, 0 to 1860
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   instruction  1861 non-null   object
 1   input        1307 non-null   object
 2   output       1861 non-null   object
 3   prompt       1861 non-null   object
dtypes: object(4)
memory usage: 58.3+ KB


In [33]:
code.head(1)

Unnamed: 0,instruction,input,output,prompt
0,"Edit the following Python code to give the output as [400, 125, 10,1].","val = [1, 10, 125, 400]\nres = []\n\nfor v in val:\n res.append(v)","val = [1, 10, 125, 400]\nres = []\n\nfor v in reversed(val):\n res.append(v)\n \nprint(res) # Output: [400, 125, 10,1]","Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nEdit the following Python code to give the output as [400, 125, 10,1].\n\n### Input:\nval = [1, 10, 125, 400]\nres = []\n\nfor v in val:\n res.append(v)\n\n### Output:\nval = [1, 10, 125, 400]\nres = []\n\nfor v in reversed(val):\n res.append(v)\n \nprint(res) # Output: [400, 125, 10,1]"


In [3]:
def preprocess_data(example):
    template = f"""Instruction:\n{example['instruction']}\n\nResponse:\n{example['output']}"""
    return template
    
code['template'] = code.apply(preprocess_data,axis=1)
dataset = Dataset.from_pandas(code[['template']]).train_test_split(test_size = 0.2)

# Example Printing
print(code['template'].iloc[1])

Instruction:
Write a Python script to print all prime numbers between two numbers, a and b (where a<b).

Response:
for num in range(a, b+1): 
   if num > 1: 
       for i in range(2, num): 
           if (num % i) == 0: 
               break
       else: 
           print(num)


Above is a sample template from created dataset.

# 3. Loading Gemma

## What should be the value of target modules?

When you print model Summary, you will see someting like **Model Attention Layer** in this case you see **GemmaSdpaAttention** which is attention layer of the model.<br>
In our case we have attention layers which are `q_proj`, `k_proj`, `v_proj`, `o_proj`, `rotary_emb`, `up_proj`, `gate_proj`, `down_proj`.<br>
Refer : [target-modules-for-applying-peft-lora-on-different-models](https://stackoverflow.com/questions/76768226/target-modules-for-applying-peft-lora-on-different-models)


In case of LoRA, we have to mention target_modules when defining config for taining a model.
Here are possible values of `target_modeuls` 
1. `target_modules = "all-linear"`<br>
    This is a convenient shortcut that tells the system to target all linear layers within the pre-trained model for adaptation. This includes the modules you saw previously like `q_proj`, `k_proj`, `v_proj`, `o_proj`,`up_proj`, and `down_proj`.<br>
    Apart from these, Embedding layers and Output projection layer,
    
     - Benfits :<br>
     a. Simplicity  : it is sraightforward way to adapt a large portion of the model, potentially achieving significant memory savings.<br>
     b. Potential Performance Gains : As we are including majority of the model, this can help model to learn more efficintly which leads to better fine-tuning performanc.
     
     - Cons :<br>
     a. Over-adaptation : Adapting all linear layers can be overly aggressive, potentially leading to a loss of information from the pre-trained model and hindering performance. <br>
     b. Increased Memory Foot print : While aiming for memory efficiency, adapting all linear layers will still require some memory overhead compared to targeting specific modules.<br>
     
2. `target_modeules = ["q_proj", "k_proj", "v_proj", "o_proj", "up_proj", "gate_proj", "down_proj"]`<br>
    To over come the over-fitting we can secifying individual modules, you can explicitly list the modules you want to adapt for more control. This allows you to focus on specific areas while leaving less relevant parts untouched.<br>
    Below is the modules desciption.<br>
     - `q_proj`,`k_porj`,`v_proj` are query, key, values in attention mechinism.
     - `gate_proj` Projection layer for the gate in the attention mechanism.
     - `up_proj`, `down_proj` are Projection layers used for the feed-forward network within a transformer block.
    

## Initilizing LoRA

In [7]:
base_model = "/kaggle/input/gemma/transformers/2b-it/2"
lora_config = LoraConfig(
        lora_alpha=8, 
        lora_dropout=0.1,
        r=4,
        target_modules=["q_proj", "o_proj", "k_proj", "v_proj", "gate_proj", "up_proj", "down_proj"],
        task_type="CAUSAL_LM",
)

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type='fp4',
        bnb_4bit_use_double_quant=True,
        bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(base_model)
tokenizer.padding_side = "right"

model = AutoModelForCausalLM.from_pretrained(base_model, device_map="auto", quantization_config=bnb_config)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [9]:
model_output_dir = "gemma2b"
training_args = TrainingArguments(
    output_dir=model_output_dir,                   
    num_train_epochs=1,                       
    per_device_train_batch_size=1,           
    gradient_accumulation_steps=8,      
    per_device_eval_batch_size = 1,
    gradient_checkpointing=True,           
    logging_steps=25,                        
    learning_rate=2e-4,                       
    weight_decay=0.001,
    fp16=True,
    bf16=False,
    max_grad_norm=0.3,                    
    max_steps=-1,
    warmup_ratio=0.03,                        
    group_by_length=True,
    lr_scheduler_type="cosine",             
    report_to="tensorboard",                  
    evaluation_strategy="steps"          
    )


trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset['train'],
    eval_dataset = dataset['test'],
    peft_config=lora_config,
    tokenizer=tokenizer,
    max_seq_length=1024,
    dataset_text_field = 'template',
    dataset_kwargs={
        "add_special_tokens": False,
        "append_concat_token": False,
    }
)

Map:   0%|          | 0/1488 [00:00<?, ? examples/s]

Map:   0%|          | 0/373 [00:00<?, ? examples/s]

## Training Model

In [10]:
torch.cuda.empty_cache()
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss,Validation Loss
25,1.3503,0.989223
50,0.9651,0.919965
75,0.8056,0.877401
100,0.9523,0.863812
125,0.8049,0.849755
150,0.8197,0.839859
175,0.7668,0.836736


TrainOutput(global_step=186, training_loss=0.9232640215145644, metrics={'train_runtime': 1857.5858, 'train_samples_per_second': 0.801, 'train_steps_per_second': 0.1, 'total_flos': 2754438242217984.0, 'train_loss': 0.9232640215145644, 'epoch': 1.0})

## Saving Adapter

In [12]:
lora_adapter_path = "LoraAdapter"
trainer.model.save_pretrained(lora_adapter_path)

In [13]:
# Release Memory
trainer, model, = release_memory(trainer, model)
gc.collect()
torch.cuda.empty_cache()


## Load and Merge Model Weights 

In [14]:
final_model = "Final_Model"
# Loading Base Model
model = AutoModelForCausalLM.from_pretrained(base_model,device_map='auto', torch_dtype=torch.float16)

# Lodaing  Base Model with LoRA adapters.
peft_model = PeftModel.from_pretrained(model,lora_adapter_path,device_map='auto', torch_dtype=torch.float16)

# Merging and Saving Model
model = peft_model.merge_and_unload(progressbar = True)
model.save_pretrained(final_model)
tokenizer.save_pretrained(final_model)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Unloading and merging model: 100%|██████████| 384/384 [00:00<00:00, 4858.36it/s]


('Final_Model/tokenizer_config.json',
 'Final_Model/special_tokens_map.json',
 'Final_Model/tokenizer.model',
 'Final_Model/added_tokens.json',
 'Final_Model/tokenizer.json')

In [15]:
model = release_memory(model)

# 4. Text Generation Using LoRA Model

In [16]:
peft_pipe = pipeline("text-generation",final_model, model_kwargs={"torch_dtype": torch.float16},
    device_map='auto',
    max_new_tokens=512)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [26]:
def get_output(question):
    prompt = f"Provide an optimal Response to the given Instruction, dont include the any kind of extra Explaination in response. keep in short and simple.\n\nInstruction:\n{question}\n\nResponse:\n"
    out = peft_pipe(prompt,
     do_sample=True,
    temperature=0.1,
    top_k=20,
    top_p=0.3,
    add_special_tokens=True)
                    
    return out

In [27]:
out = get_output("Write program to find factorial of number")
print(out[0]['generated_text'])

Provide an optimal Response to the given Instruction, dont include the any kind of extra Explaination in response. keep in short and simple.

Instruction:
Write program to find factorial of number

Response:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

print(factorial(5)) # Output: 120
```

This code defines a Python function called `factorial` that takes a single integer argument, `n`, and returns the factorial of that number. The base case is when `n` is 0, which returns 1. Otherwise, it recursively calls itself with the argument `n-1` and multiplies the result by `n`. The function is then called with the argument 5, which returns 120.


In [28]:
out = get_output("write a program to explain simple iterators and generators")
print(out[0]['generated_text'])

Provide an optimal Response to the given Instruction, dont include the any kind of extra Explaination in response. keep in short and simple.

Instruction:
write a program to explain simple iterators and generators

Response:
Sure, here's a simple Python program to explain the concept of iterators and generators:

```python
def my_iterator():
    for i in range(10):
        yield i

for item in my_iterator():
    print(item)

# Output:
# 0
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
```

This code defines a generator function called `my_iterator` that yields the numbers from 0 to 9. The `for` loop iterates over the generator and prints each item.

This is a simple example of how generators can be used to create a sequence of values on demand. This can be useful when you need to generate a large number of values or when you need to avoid creating a list or array first.
