# Fine-Tune a Generative AI Model for text to code generation

In this notebook, you will fine-tune an existing LLM from Hugging Face for code generation. We use the [FLAN-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) model, which provides a high quality instruction tuned model and can generate python code out of the box. We will explore a PEFT fine-tuning approach and evaluate the results with ROUGE metrics. 

## Installing necessary packages

Now install the required packages for the LLM and datasets.

In [5]:
import torch

# Ensure the model uses GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [6]:
!pip install -q torch
!pip install -q torchdata

!pip install -q transformers
!pip install -q datasets
!pip install -q evaluate
!pip install -q rouge_score
!pip install -q loralib
!pip install -q peft
!pip install -q pandas pyarrow
!pip install -q pandas fastparquet
!pip install -q scikit-learn

In [7]:
from torch.utils.data import Dataset, DataLoader

In [8]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np



## Loading the base model - FlanT5-base

In [9]:
model_name='google/flan-t5-base'

original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

## Loading the training data

In [10]:
!git lfs install
!git clone https://huggingface.co/datasets/iamtarun/python_code_instructions_18k_alpaca

Git LFS initialized.
Cloning into 'python_code_instructions_18k_alpaca'...
remote: Enumerating objects: 30, done.[K
remote: Counting objects: 100% (1/1), done.[K
remote: Total 30 (delta 0), reused 0 (delta 0), pack-reused 29[K
Unpacking objects: 100% (30/30), 4.41 KiB | 645.00 KiB/s, done.


In [11]:
parquet_file_path = '/kaggle/working/python_code_instructions_18k_alpaca/data/train-00000-of-00001-8b6e212f3e1ece96.parquet'

# Load the Parquet file as a Pandas DataFrame
df = pd.read_parquet(parquet_file_path)

# Now df is a Pandas DataFrame containing the data from the Parquet file


In [12]:
df

Unnamed: 0,instruction,input,output,prompt
0,Create a function to calculate the sum of a se...,"[1, 2, 3, 4, 5]",# Python code\ndef sum_sequence(sequence):\n ...,Below is an instruction that describes a task....
1,Generate a Python code for crawling a website ...,website: www.example.com \ndata to crawl: phon...,import requests\nimport re\n\ndef crawl_websit...,Below is an instruction that describes a task....
2,Create a Python list comprehension to get the ...,,"[x*x for x in [1, 2, 3, 5, 8, 13]]",Below is an instruction that describes a task....
3,Generate a python script to perform this action.,"Given a string, remove all the consecutive dup...",def remove_duplicates(string): \n result = ...,Below is an instruction that describes a task....
4,Write a python script to generates random numb...,,def generate_random_divisible_number():\n i...,Below is an instruction that describes a task....
...,...,...,...,...
18607,Generate a loop in Python which iterates throu...,,for value in list_of_values:\n print(value),Below is an instruction that describes a task....
18608,Can you design a function in Python to sort a ...,"tuples = [(""b"", 0), (""a"", 1), (""c"", -1)]",def sort_tuple(tuples):\n # sort by 2nd ite...,Below is an instruction that describes a task....
18609,Generate a Python function to search an array ...,"Array = [1, 5, 10, 15, 20], Target = 15","def search(array, target):\n # Iterate over ar...",Below is an instruction that describes a task....
18610,Develop an HTTP API in Python to retrieve the ...,,import requests\nfrom bs4 import BeautifulSoup...,Below is an instruction that describes a task....


In [169]:
print(df.iloc[0]['prompt'])

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Create a function to calculate the sum of a sequence of integers.

### Input:
[1, 2, 3, 4, 5]

### Output:
# Python code
def sum_sequence(sequence):
  sum = 0
  for num in sequence:
    sum += num
  return sum


In [172]:
original_model.to(device)

T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseGatedActDense(
              (wi_0): Linear(in_features=768, out_features=2048, bias=False)
              (wi_1): Linear(in_features=768, out_features=2048, bias=False)
              (wo):

In [173]:
prompt = f"""What is the color of sky?"""
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True,).input_ids.to(device)
# with torch.inference_mode():
output_ids = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
outputs = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(f"Prompt:\n{prompt}\n")
print(f"Generated instruction:\n{outputs}")

Prompt:
What is the color of sky?

Generated instruction:
blue


In [13]:
def postprocess(text):
    # Replace special tokens with actual escape characters
    text = text.replace('four_spaces>', '    ')
    text = text.replace('newline>', '\n')
    text = text.replace('tab>', '\t')
    return text

def preprocess(text):
    # Replace '\n' and '\t' with unique tokens
    text = text.replace('    ', '<four_spaces>')
    text = text.replace('\n', '<newline>')
    text = text.replace('\t', '<tab>')
    return text

# Apply preprocessing to your DataFrame
# df['processed_output'] = df['output'].apply(preprocess)


def tokenize_function(row,max_input_tokens = 256):
    
    # Preparing the instruction input
    row_input = f"""
    Below is an instruction that describes a task. Write a response that appropriately completes the request.
    
    ### Instruction: 
    {row['instruction']}
    
    ### Input:
    {row['input']}
    
    ### Response:
    """
    
    row_output = preprocess(row['output'])
    
    # Tokenize input without truncation to check the length
    inputs = tokenizer(row_input, return_tensors="pt")
    labels = tokenizer(row_output, return_tensors="pt")

    # Check if tokenized input exceeds max_input_tokens
    if len(inputs['input_ids'][0]) > max_input_tokens or len(labels['input_ids'][0]) > max_input_tokens:
        # Ignore this input by returning None
        return None
    
    # Tokenize input and labels
    inputs = tokenizer(row_input, padding="max_length", truncation=True, max_length=max_input_tokens, return_tensors="pt")
    labels = tokenizer(row_output, padding="max_length", truncation=True, max_length=max_input_tokens, return_tensors="pt")
    
    # Return tokenized inputs and labels
    return {
        'input_ids': inputs.input_ids, 
        'attention_mask': inputs.attention_mask, 
        'labels': labels.input_ids
    }

# Assuming df is your DataFrame with columns 'input' and 'labels'
# Apply the tokenization function to each row
tokenized_data = df.apply(lambda row: tokenize_function(row), axis=1)

tokenized_data = tokenized_data.dropna()

Token indices sequence length is longer than the specified maximum sequence length for this model (629 > 512). Running this sequence through the model will result in indexing errors


### Zero shot inference

In [11]:
row = df.iloc[1]
tokenized_row = tokenize_function(row)

In [12]:
row_input = f"""
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: 
{row['instruction']}

### Input:
{row['input']}

### Response:
"""
input_ids = tokenizer(row_input, return_tensors="pt").input_ids
# with torch.inference_mode():
output_ids = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
outputs = tokenizer.decode(output_ids[0], skip_special_tokens=True)
output_postproc = postprocess(outputs)

print('---------------------------------------')
print(row_input)
print('---------------------------------------')
print(f"Generated instruction:\n{output_postproc}")

---------------------------------------

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: 
Generate a Python code for crawling a website for a specific type of data.

### Input:
website: www.example.com 
data to crawl: phone numbers

### Response:

---------------------------------------
Generated instruction:
I've put that in the "phone numbers" section.


In [13]:
row = df.iloc[3]
tokenized_row = tokenize_function(row)

In [14]:
row_input = f"""
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: 
{row['instruction']}

### Input:
{row['input']}

### Response:
"""
input_ids = tokenizer(row_input, return_tensors="pt").input_ids
# with torch.inference_mode():
output_ids = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
outputs = tokenizer.decode(output_ids[0], skip_special_tokens=True)
output_postproc = postprocess(outputs)

print('---------------------------------------')
print(row_input)
print('---------------------------------------')
print(f"Generated instruction:\n{output_postproc}")
print('---------------------------------------')
print(f"Baseline or expected output code:\n{row['output']}")

---------------------------------------

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: 
Generate a python script to perform this action.

### Input:
Given a string, remove all the consecutive duplicates from the string.

Input: "AAABBCCCD"

### Response:

---------------------------------------
Generated instruction:
aabccd='AABBCCCD'
---------------------------------------
Baseline or expected output code:
def remove_duplicates(string): 
    result = "" 
    prev = '' 

    for char in string:
        if char != prev: 
            result += char
            prev = char
    return result

result = remove_duplicates("AAABBCCCD")
print(result)


### Splitting the data for training

In [14]:
from sklearn.model_selection import train_test_split

# Assuming 'df' is your DataFrame

# Splitting the dataset into training and a combined validation & test set
train_df, val_test_df = train_test_split(tokenized_data, test_size=0.3, random_state=42)

# Splitting the combined validation & test set into separate validation and test sets
val_df, test_df = train_test_split(val_test_df, test_size=0.5, random_state=42)

# Now, train_df, val_df, and test_df are your training, validation, and test sets respectively

In [50]:
# Splitting the dataset into training and a combined validation & test set
train_df_raw, val_test_df_raw = train_test_split(df, test_size=0.3, random_state=42)

# Splitting the combined validation & test set into separate validation and test sets
val_df_raw, test_df_raw = train_test_split(val_test_df, test_size=0.5, random_state=42)

In [54]:
df

Unnamed: 0,instruction,input,output,prompt
0,Create a function to calculate the sum of a se...,"[1, 2, 3, 4, 5]",# Python code\ndef sum_sequence(sequence):\n ...,Below is an instruction that describes a task....
1,Generate a Python code for crawling a website ...,website: www.example.com \ndata to crawl: phon...,import requests\nimport re\n\ndef crawl_websit...,Below is an instruction that describes a task....
2,Create a Python list comprehension to get the ...,,"[x*x for x in [1, 2, 3, 5, 8, 13]]",Below is an instruction that describes a task....
3,Generate a python script to perform this action.,"Given a string, remove all the consecutive dup...",def remove_duplicates(string): \n result = ...,Below is an instruction that describes a task....
4,Write a python script to generates random numb...,,def generate_random_divisible_number():\n i...,Below is an instruction that describes a task....
...,...,...,...,...
18607,Generate a loop in Python which iterates throu...,,for value in list_of_values:\n print(value),Below is an instruction that describes a task....
18608,Can you design a function in Python to sort a ...,"tuples = [(""b"", 0), (""a"", 1), (""c"", -1)]",def sort_tuple(tuples):\n # sort by 2nd ite...,Below is an instruction that describes a task....
18609,Generate a Python function to search an array ...,"Array = [1, 5, 10, 15, 20], Target = 15","def search(array, target):\n # Iterate over ar...",Below is an instruction that describes a task....
18610,Develop an HTTP API in Python to retrieve the ...,,import requests\nfrom bs4 import BeautifulSoup...,Below is an instruction that describes a task....


In [15]:
class CustomDataset(torch.utils.data.Dataset):
    def __init__(self, dataframe):
        self.dataframe = dataframe

    def __len__(self):
        return len(self.dataframe)

    def __getitem__(self, idx):
        row = self.dataframe.iloc[idx]
        # Ensure each item is a 1D tensor
        input_ids = row['input_ids'].squeeze()
        attention_mask = row['attention_mask'].squeeze()
        labels = row['labels'].squeeze()

        # Debug: Print shapes
#         print(f"Input IDs shape: {input_ids.shape}")
#         print(f"Attention Mask shape: {attention_mask.shape}")
#         print(f"Labels shape: {labels.shape}")

        return {
            'input_ids': input_ids,
            'attention_mask': attention_mask,
            'labels': labels
        }


In [16]:
train_dataset = CustomDataset(train_df)
val_dataset = CustomDataset(val_df)
test_dataset = CustomDataset(test_df)

In [128]:
test_dataset.dataframe[0:10]

11487    {'input_ids': [[tensor(7255), tensor(19), tens...
5598     {'input_ids': [[tensor(7255), tensor(19), tens...
4186     {'input_ids': [[tensor(7255), tensor(19), tens...
6746     {'input_ids': [[tensor(7255), tensor(19), tens...
8581     {'input_ids': [[tensor(7255), tensor(19), tens...
7284     {'input_ids': [[tensor(7255), tensor(19), tens...
15641    {'input_ids': [[tensor(7255), tensor(19), tens...
16597    {'input_ids': [[tensor(7255), tensor(19), tens...
2493     {'input_ids': [[tensor(7255), tensor(19), tens...
12696    {'input_ids': [[tensor(7255), tensor(19), tens...
dtype: object

## Perform Parameter Efficient Fine-Tuning (PEFT)

**Parameter Efficient Fine-Tuning (PEFT)** fine-tuning as opposed to "full fine-tuning" as we did above. PEFT is a form of instruction fine-tuning that is much more efficient than full fine-tuning.

PEFT is a generic term that includes **Low-Rank Adaptation (LoRA)** and prompt tuning (which is NOT THE SAME as prompt engineering!). In most cases, when someone says PEFT, they typically mean LoRA. LoRA, at a very high level, allows the user to fine-tune their model using fewer compute resources (in some cases, a single GPU). After fine-tuning for a specific task, use case, or tenant with LoRA, the result is that the original LLM remains unchanged and a newly-trained “LoRA adapter” emerges. This LoRA adapter is much, much smaller than the original LLM - on the order of a single-digit % of the original LLM size (MBs vs GBs).  

That said, at inference time, the LoRA adapter needs to be reunited and combined with its original LLM to serve the inference request.  The benefit, however, is that many LoRA adapters can re-use the original LLM which reduces overall memory requirements when serving multiple tasks and use cases.

In [21]:
def print_number_of_trainable_model_parameters(model):
    trainable_model_params = 0
    all_model_params = 0
    for _, param in model.named_parameters():
        all_model_params += param.numel()
        if param.requires_grad:
            trainable_model_params += param.numel()
    return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"

### Setup the PEFT/LoRA model for Fine-Tuning

We need to set up the PEFT/LoRA model for fine-tuning with a new layer/parameter adapter. Using PEFT/LoRA, you are freezing the underlying LLM and only training the adapter. Have a look at the LoRA configuration below. Note the rank (`r`) hyper-parameter, which defines the rank/dimension of the adapter to be trained.

In [22]:
from peft import LoraConfig, get_peft_model, TaskType

model_name='google/flan-t5-base'

original_model_instance = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)

lora_config = LoraConfig(
    r=32,
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM
)

peft_model = get_peft_model(original_model_instance, lora_config)

In [23]:
print(print_number_of_trainable_model_parameters(peft_model))

trainable model parameters: 3538944
all model parameters: 251116800
percentage of trainable model parameters: 1.41%


In [24]:
print(print_number_of_trainable_model_parameters(original_model))

trainable model parameters: 247577856
all model parameters: 247577856
percentage of trainable model parameters: 100.00%


### Training PEFT adapter

In [25]:
# from transformers import TrainingArguments, Trainer
import time

output_dir = f'./text-to-code-training-{str(int(time.time()))}'

training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=16,
    learning_rate=5e-4,
    per_device_eval_batch_size=16,
    num_train_epochs=2,
    logging_steps=10,
    evaluation_strategy="epoch",
    save_strategy="epoch",
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset
)

trainer.train()


[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········································


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Epoch,Training Loss,Validation Loss
1,0.773,0.677718
2,0.8738,0.659362


TrainOutput(global_step=1630, training_loss=1.1585278470092024, metrics={'train_runtime': 2007.1219, 'train_samples_per_second': 12.982, 'train_steps_per_second': 0.812, 'total_flos': 9062654461083648.0, 'train_loss': 1.1585278470092024, 'epoch': 2.0})

## Checkpoint - Saving the finetuned LLM

Saving the model to the desired directory in Kaggel Output

In [73]:
# Save model and tokenizer
model_save_path = "/kaggle/working/Flan-T5-finetuned-sbanda/"
tokenizer.save_pretrained(model_save_path)
peft_model.save_pretrained(model_save_path)


Zip the file to download all the files at once

In [74]:
!zip -r file.zip /kaggle/working/Flan-T5-finetuned-sbanda

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  adding: kaggle/working/Flan-T5-finetuned-sbanda/ (stored 0%)
  adding: kaggle/working/Flan-T5-finetuned-sbanda/README.md (deflated 66%)
  adding: kaggle/working/Flan-T5-finetuned-sbanda/spiece.model (deflated 48%)
  adding: kaggle/working/Flan-T5-finetuned-sbanda/tokenizer_config.json (deflated 95%)
  adding: kaggle/working/Flan-T5-finetuned-sbanda/special_tokens_map.json (deflated 86%)
  adding: kaggle/working/Flan-T5-finetuned-sbanda/adapter_config.json (deflated 50%)
  adding: kaggle/working/Flan-T5-finetuned-sbanda/adapter_model.safetensors (deflated 22%)
  adding: kaggle/working/Flan-T5-finetuned-sbanda/tokenizer.json (deflated 74%)


## Training in progress above

### Fine-Tune the Model with the Preprocessed Dataset

In [20]:
finetuned_model_path='/kaggle/input/my-model-files'

peft_model = AutoModelForSeq2SeqLM.from_pretrained(finetuned_model_path, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(finetuned_model_path)

In [25]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
peft_model.to(device)  # Ensure the model is on the correct device

T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): lora.Linear(
                (base_layer): Linear(in_features=768, out_features=768, bias=False)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=768, out_features=32, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=32, out_features=768, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (k): Linear(in_features=768, out_features=768, bias=False)
              

In [31]:
original_model.to(device)

T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseGatedActDense(
              (wi_0): Linear(in_features=768, out_features=2048, bias=False)
              (wi_1): Linear(in_features=768, out_features=2048, bias=False)
              (wo):

### Fine-tuned output method
Defining a function to generate output to the give prompt using fine-tuned model

In [32]:
def peft_model_generate_code(input_ids):
    
    peft_model_outputs = peft_model.generate(input_ids=input_ids, max_new_tokens=200, num_beams=1)  # Remove 'generation_config' and pass the parameters directly
    peft_model_text_output = postprocess(tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True))
    
    return peft_model_text_output

In [33]:
def base_model_generate_code(input_ids):
    
    original_output_ids = original_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
    original_model_text_output = postprocess(tokenizer.decode(original_output_ids[0], skip_special_tokens=True))
    
    return original_model_text_output

In [34]:
def base_vs_peft(index):
    
    row = df.iloc[index]
    
    prompt = f"""
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: 
{row['instruction']}

### Input:
{row['input']}

### Response:
"""
    
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
    peft_output = peft_model_generate_code(input_ids)
    base_output = base_model_generate_code(input_ids)
    label = row['output']
    
    dash_line = '-'.join('' for x in range(100))
    print(dash_line)
    print(f'INPUT PROMPT:\n{prompt}')
    print(dash_line)
    print(f'BASEMODEL GENERATION:\n{base_output}\n')
    print(dash_line)
    print(f'PEFT MODEL GENERATION:\n{peft_output}')
    print(dash_line)
    print(f'LABEL:\n{label}')

In [29]:
prompt = f"""
Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: 
{row['instruction']}

### Input:
{row['input']}

### Response:
"""

print(prompt)

NameError: name 'row' is not defined

### Human evaluation of the model

In [35]:
base_vs_peft(4000)

---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: 
Develop a python program to print the character with the maximum frequency in a given string.

### Input:
sample_str = 'football'

### Response:

---------------------------------------------------------------------------------------------------
BASEMODEL GENERATION:
I'll try to find the 'football' in the str.

---------------------------------------------------------------------------------------------------
PEFT MODEL GENERATION:
def max_frequency(str):
    max_frequency = 0
    for i in range(len(str)):
            if str[i] == 'football':
            max_frequency += 1
        return max_frequency

print(max_frequency)
---------------------------------------------------------------------------------------------------
LABEL:
def max_frequency_char

In [90]:
df

Unnamed: 0,instruction,input,output,prompt
0,Create a function to calculate the sum of a se...,"[1, 2, 3, 4, 5]",# Python code\ndef sum_sequence(sequence):\n ...,Below is an instruction that describes a task....
1,Generate a Python code for crawling a website ...,website: www.example.com \ndata to crawl: phon...,import requests\nimport re\n\ndef crawl_websit...,Below is an instruction that describes a task....
2,Create a Python list comprehension to get the ...,,"[x*x for x in [1, 2, 3, 5, 8, 13]]",Below is an instruction that describes a task....
3,Generate a python script to perform this action.,"Given a string, remove all the consecutive dup...",def remove_duplicates(string): \n result = ...,Below is an instruction that describes a task....
4,Write a python script to generates random numb...,,def generate_random_divisible_number():\n i...,Below is an instruction that describes a task....
...,...,...,...,...
18607,Generate a loop in Python which iterates throu...,,for value in list_of_values:\n print(value),Below is an instruction that describes a task....
18608,Can you design a function in Python to sort a ...,"tuples = [(""b"", 0), (""a"", 1), (""c"", -1)]",def sort_tuple(tuples):\n # sort by 2nd ite...,Below is an instruction that describes a task....
18609,Generate a Python function to search an array ...,"Array = [1, 5, 10, 15, 20], Target = 15","def search(array, target):\n # Iterate over ar...",Below is an instruction that describes a task....
18610,Develop an HTTP API in Python to retrieve the ...,,import requests\nfrom bs4 import BeautifulSoup...,Below is an instruction that describes a task....


##### Sum of two integers

In [39]:
row_input = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: 
Generate a python script to perform this action.

### Input:
Given two integers, return the addition of the given integers.

Input: 

### Response:"""

In [40]:
input_ids = tokenizer(row_input, return_tensors="pt").input_ids.to(device)  # Move input tensors to the same device as the model
peft_model_outputs = peft_model.generate(input_ids=input_ids, max_new_tokens=200, num_beams=1)  # Remove 'generation_config' and pass the parameters directly
peft_model_text_output = postprocess(tokenizer.decode(peft_model_outputs[0], skip_special_tokens=True))

In [41]:
print(peft_model_text_output)

def add_ints(a, b):
    return a + b

print(add_ints(2, a))


In [42]:
row_input = """Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: 
Generate a python script to perform this action.

### Input:
The Fibonacci numbers, commonly denoted F(n) form a sequence, called the Fibonacci sequence, such that each number is the sum of the two preceding ones, starting from 0 and 1. That is,

F(0) = 0, F(1) = 1
F(n) = F(n - 1) + F(n - 2), for n > 1.
Given n, calculate F(n).

 

Example 1:

Input: n = 2
Output: 1
Explanation: F(2) = F(1) + F(0) = 1 + 0 = 1.
Example 2:

Input: n = 3
Output: 2
Explanation: F(3) = F(2) + F(1) = 1 + 1 = 2.
Example 3:

Input: n = 4
Output: 3
Explanation: F(4) = F(3) + F(2) = 2 + 1 = 3.
 

Constraints:

0 <= n <= 30

Input: F(0) = a, F(1) = b

### Response:"""

In [43]:
input_ids = tokenizer(row_input, return_tensors="pt").input_ids.to(device)  # Move input tensors to the same device as the model


In [44]:
print(peft_model_generate_code(input_ids))

def fibonacci(n):
    f = 0
    f = 1
    for i in range(n):
        if f % i == 0:
            f += 1
        f += 1
        return f

print(f(n))


In [119]:
print(base_model_generate_code(input_ids))

a, b = b, F(0) = a + b + 1


<a name='2.4'></a>
### Evaluate the Model Quantitatively (with ROUGE Metric)

The [ROUGE metric](https://en.wikipedia.org/wiki/ROUGE_(metric)) helps quantify the validity of summarizations produced by models. It compares summarizations to a "baseline" summary which is usually created by a human. While not perfect, it does indicate the overall increase in summarization effectiveness that we have accomplished by fine-tuning.

In [45]:
rouge = evaluate.load('rouge')

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

Generate the outputs for the sample of the test dataset (only 10 dialogues and summaries to save time), and save the results.

In [113]:
tokenized_test_data = test_df[0:10]
test_ids = [tokenized_test_data.index[i] for i,_ in enumerate(tokenized_test_data)]

original_model_codes = []
peft_model_codes = []
label_codes = []

for index in test_ids:
    
    row = df.iloc[index]
    prompt = f"""
    Below is an instruction that describes a task. Write a response that appropriately completes the request.
    
    ### Instruction: 
    {row['instruction']}
    
    ### Input:
    {row['input']}
    
    ### Response:
    """
    
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

    original_model_text_output = base_model_generate_code(input_ids)
    original_model_codes.append(original_model_text_output)

    
    peft_model_text_output = peft_model_generate_code(input_ids)
    peft_model_codes.append(peft_model_text_output)
    
    label_codes.append(row['output'])
    
zipped_codes = list(zip(label_codes, original_model_codes, peft_model_codes))
 
metrics_df = pd.DataFrame(zipped_codes, columns = ['label_codes', 'original_model_codes', 'peft_model_codes'])
metrics_df

Unnamed: 0,label_codes,original_model_codes,peft_model_codes
0,class Animal:\n def __init__(self):\n ...,I'll try to find a class that can do this.,def make_sound(animal):\n return ''.join(ma...
1,"for i in range(2, 11, 2):\n print(i)",I'll try that.,def print_even(numbers):\n for i in range(l...
2,import random\nimport string\n\ndef random_str...,I'll try to get it to work.,import random\n\n# Create a random alphanumeri...
3,"def getMaxProfit(maxPrice, minPrice): \n # ...",I'll try to find the maximum price of the stock.,def max_profit(max_price):\n max_price = 12...
4,"for i in range(10): \n print(""Perfect squar...",I'll try to get it to work.,def print_perfect_squares(n):\n for i in ra...
5,seen = set()\nduplicates = []\nfor item in my_...,I'll try to find the correct list.,def duplicate_values(my_list):\n return [[1...
6,"def is_rotation(str1, str2):\n return len(s...",if 'hello' in 'lohel': print('NO') else: print...,"def rotation(str, s):\n if s[0] == s[1]:\n ..."
7,from collections import Counter \n\ndef most_f...,I'll try to find it.,def most_frequent_used_words(txt):\n for wo...
8,def sortDescending(numbers):\n for i in ran...,"[2, 3, 4, 5]","def sort_list(list):\n list = [6, 2, 12, 5]..."
9,largestNum = lambda a: max(a),"I'm sorry, I can't find the number of numbers ...",def largest_number(numbers):\n largest_numb...


Evaluate the models computing ROUGE metrics. Notice the improvement in the results!

In [114]:
original_model_results = rouge.compute(
    predictions=original_model_codes,
    references=label_codes,
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_results = rouge.compute(
    predictions=peft_model_codes,
    references=label_codes,
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

ORIGINAL MODEL:
{'rouge1': 0.09665653328153329, 'rouge2': 0.013333333333333332, 'rougeL': 0.07789601139601139, 'rougeLsum': 0.08996542346542347}
PEFT MODEL:
{'rouge1': 0.24173414899932089, 'rouge2': 0.08397068135836251, 'rougeL': 0.2040780441334244, 'rougeLsum': 0.2271730398437482}


The file `data/dialogue-summary-training-results.csv` contains a pre-populated list of all model results which you can use to evaluate on a larger section of data. Let's do that for each of the models:

In [None]:
prompts = [
    f"""
    Below is an instruction that describes a task. Write a response that appropriately completes the request.
    
    ### Instruction: 
    {row['instruction']}
    
    ### Input:
    {row['input']}
    
    ### Response:
    """
    for _, row in df.iloc[test_df.index].iterrows()
]

In [127]:
def generate_model_outputs_in_batches(model, prompts, BATCH_SIZE):
    model.to(device)
    outputs = []
    for i in range(0, len(prompts), BATCH_SIZE):
        batch_prompts = prompts[i:i + BATCH_SIZE]
        input_ids = tokenizer(batch_prompts, padding=True, truncation=True, max_length=512, return_tensors="pt").input_ids.to(device)
        with torch.no_grad():
            batch_output = model.generate(input_ids=input_ids, max_new_tokens=200, num_beams=1)
            batch_output = [tokenizer.decode(ids, skip_special_tokens=True) for ids in batch_output]
        outputs.extend(batch_output)
    return outputs
BATCH_SIZE = 20

# Ensure that the prompts list has the correct number of elements
assert len(prompts) == len(test_df), "Number of prompts does not match the number of rows in test_df"

# Generate outputs for both models
original_model_codes = generate_model_outputs_in_batches(original_model, prompts, BATCH_SIZE)
peft_model_codes = generate_model_outputs_in_batches(peft_model, prompts, BATCH_SIZE)

# Validate the length of the output lists
assert len(original_model_codes) == len(test_df), "Number of outputs from original_model does not match the number of rows in test_df"
assert len(peft_model_codes) == len(test_df), "Number of outputs from peft_model does not match the number of rows in test_df"

process_output_code = lambda text : text.replace('four_spaces>', '    ').replace('newline>', '\n').replace('tab>', '\t')
peft_model_codes = list(map(process_output_code,peft_model_codes))

label_codes = [row['output'] for _, row in df.iloc[test_df.index].iterrows()]
# Rest of the code remains the same


In [159]:
zipped_codes = list(zip(label_codes, original_model_codes, peft_model_codes))
 
full_metrics_df = pd.DataFrame(zipped_codes, columns = ['label_codes', 'original_model_codes', 'peft_model_codes'])

In [160]:
full_metrics_df

Unnamed: 0,label_codes,original_model_codes,peft_model_codes
0,class Animal:\n def __init__(self):\n ...,I'll try to find a class that can do this.,def make_sound(animal):\n return ''.join(ma...
1,"for i in range(2, 11, 2):\n print(i)",I'll try that.,def print_even(numbers):\n for i in range(l...
2,import random\nimport string\n\ndef random_str...,I'll try to get it to work.,def random_alphanumeric_string(string):\n r...
3,"def getMaxProfit(maxPrice, minPrice): \n # ...",I'll try to find the maximum price of the stock.,def max_profit(max_price):\n max_price = 12...
4,"for i in range(10): \n print(""Perfect squar...",I'll try to get it to print out the first 10 p...,def print_perfect_squares(n):\n for i in ra...
...,...,...,...
1930,def count_dups(arr):\n dt = {} \n count = 0 ...,I'm sorry to hear that.,def count_dups(arr): \n for x in arr: \n ...
1931,"users = {} \n\ndef addUser(name, details): \n ...",I'll try to find a solution.,def store_data(data):\n return data\n\ndef ...
1932,class StringFormatter():\n def __init__(sel...,I'll try to find a class that can do this.,def format_string(string):\n return string....
1933,"class NeuralNetwork:\n def __init__(self, i...",I'll try to get the neural network class to work.,"def neural_network(input_size, number of outpu..."


In [166]:
print(full_metrics_df.iloc[300])

label_codes             input_str = "madamabcdcba"\n\ndef find_palindr...
original_model_codes    if'madamabcdcba' in input_str: print('abcdcba'...
peft_model_codes        def find_palindromes(input_str):\n    for i in...
Name: 300, dtype: object


In [161]:
print(len(original_model_codes),len(peft_model_codes),len(label_codes))

1935 1935 1935


In [162]:
original_model_results = rouge.compute(
    predictions=original_model_codes,
    references=label_codes,
    use_aggregator=True,
    use_stemmer=True,
)

peft_model_results = rouge.compute(
    predictions=peft_model_codes,
    references=label_codes,
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('PEFT MODEL:')
print(peft_model_results)

ORIGINAL MODEL:
{'rouge1': 0.10800783354076518, 'rouge2': 0.042290874296053924, 'rougeL': 0.10011801779543442, 'rougeLsum': 0.10543333385880252}
PEFT MODEL:
{'rouge1': 0.3073409456381371, 'rouge2': 0.11732385399253284, 'rougeL': 0.27339907243217143, 'rougeLsum': 0.3014207526081562}


The results show substantial improvement in all ROUGE metrics:

In [164]:
print("Absolute percentage improvement of PEFT MODEL over ACTUAL LABEL CODE")

improvement = (np.array(list(peft_model_results.values())) - np.array(list(original_model_results.values())))
for key, value in zip(peft_model_results.keys(), improvement):
    print(f'{key}: {value*100:.2f}%')

Absolute percentage improvement of PEFT MODEL over ACTUAL LABEL CODE
rouge1: 19.93%
rouge2: 7.50%
rougeL: 17.33%
rougeLsum: 19.60%


## Challenges faced

1. Assesing feasibility of loading/training a model with the available compute resources.
    - <u>Resolution</u>: Selected BFLOAT16 to load the model in reduced precision, thus requring less memory. Hence requiring 2 bytes per parameter instead of 4 bytes. For a 0.25B parameters model like flanT5, we need **0.5GB to load** the model and **10GB memory to train**.
        - <u>Available compute resources:</u>
            - Session: 12 hours
            - Disk: 73.1GB
            - CPU: 29GB
            - GPU Memory: 15.9GB
2. Data processing for training:
    - Writing the `tokenize_function` to craft the prompts in the desired way.
    - crafting the generated outputs to include escape characters as we are ignoring special characters
   