<a href="https://colab.research.google.com/github/anujdutt9/Talks_and_Presentations/blob/main/Decoding_the_Giants/Demo_2_LLM_Fine_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Step 0: Install Dependencies

In [1]:
!pip3 install -U transformers bitsandbytes peft trl accelerate datasets -q

#### Import libraries.

In [2]:
import os
import pandas as pd
import transformers
import torch
from datasets import load_dataset, Dataset, DatasetDict
from trl import SFTTrainer
from peft import LoraConfig, PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig, GemmaTokenizer
from IPython.display import Markdown, display

# Step 1: Download Gemma-2b base model

In [3]:
model_id = "google/gemma-2b"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type = "nf4",
    bnb_4bit_compute_dtype = torch.bfloat16
)

In [4]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id,
                                             quantization_config = bnb_config,
                                             device_map={"":0})

`config.hidden_act` is ignored, you should use `config.hidden_activation` instead.
Gemma's activation function will be set to `gelu_pytorch_tanh`. Please, use
`config.hidden_activation` if you want to override this behaviour.
See https://github.com/huggingface/transformers/pull/29402 for more details.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [5]:
# Output from the model BEFORE Fine-tuning
text = "How do you merge two dictionaries in Python"
device = "cuda:0"
inputs = tokenizer(text, return_tensors = "pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=250, eos_token_id=tokenizer.eos_token_id)
model_response = tokenizer.decode(outputs[0], skip_special_tokens=False)
display(Markdown(f"**Model Response:**\n\n{model_response}"))

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


**Model Response:**

<bos>How do you merge two dictionaries in Python?

Step 1
1 of 2

In Python, dictionaries are unordered and unindexed data structures.

Dictionaries are used to store key-value pairs.

The keys are unique and the values are unique.

The keys are case-sensitive.

The keys are immutable.

The keys are not stored in the order in which they are inserted.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in the order in which they are encountered.

The keys are stored in

In [6]:
# Output from the model BEFORE Fine-tuning
text = "How do you check if a string is a palindrome in Python"
device = "cuda:0"
inputs = tokenizer(text, return_tensors = "pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=250, eos_token_id=tokenizer.eos_token_id)
model_response = tokenizer.decode(outputs[0], skip_special_tokens=False)
display(Markdown(f"**Model Response:**\n\n{model_response}"))

**Model Response:**

<bos>How do you check if a string is a palindrome in Python?

Step 1
1 of 2

To check if a string is a palindrome, we can use the $\texttt{is\_palindrome()}$ method from the $\texttt{re.match()}$ module.

Result
2 of 2

To check if a string is a palindrome, we can use the $\texttt{is\_palindrome()}$ method from the $\texttt{re.match()}$ module.<eos>

# Step 2: Configure LoRA settings for modules to be trained

In [7]:
os.environ["WANDB_DISABLE"] = "false"

In [8]:
lora_config = LoraConfig(
    r = 8,
    target_modules = ["q_proj", "o_proj", "k_proj", "v_proj",
                      "gate_proj", "up_proj", "down_proj"],
    task_type = "CAUSAL_LM"
)

# Step 3: Load the dataset

**Dataset Source:** https://www.kaggle.com/datasets/chinmayadatt/dataset-python-question-answer?resource=download

In [9]:
# Read Dataset CSV
df = pd.read_csv("Dataset_Python_Question_Answer.csv")

# Calculate the length of each 'Question' and 'Answer' combined and add it as a new column
df['text_length'] = df["Question"].str.len() + df["Answer"].str.len()

# Calculate the average length of 'Answer' in the filtered dataset
average_length = int(df['text_length'].mean())

# Find the shortest and longest lengths of 'Answer' in the filtered dataset
shortest_length = int(df['text_length'].min())
longest_length = int(df['text_length'].max())

# Print the statistics
print("Average length of 'Question and Answer' in original dataset:", average_length)
print("Shortest length of 'Question and Answer' in original dataset:", shortest_length)
print("Longest length of 'Question and Answer' in original dataset:", longest_length)

Average length of 'Question and Answer' in original dataset: 1708
Shortest length of 'Question and Answer' in original dataset: 139
Longest length of 'Question and Answer' in original dataset: 3511


In [10]:
import random

# Convert dataset to Dataset object
data = Dataset.from_pandas(df)

# Print the entire dataset
print("<Data structure>")
print(data)

# Generate a random index based on the dataset length
random_index = random.randint(0, len(data) - 1)

# Print a random sample of the dataset
print("\n\n<Random sample dataset>")
print("\n- Question:", data[random_index]["Question"])
print("\n- Answer:", data[random_index]["Answer"])

<Data structure>
Dataset({
    features: ['Question', 'Answer', 'text_length'],
    num_rows: 419
})


<Random sample dataset>

- Question:  How do you perform string concatenation in Python?

- Answer: ["Sure, here's a detailed explanation of how to perform string concatenation in Python:", '**1. String Concatenation Operator (`+`)**', 'The string concatenation operator (`+`) is used to combine strings directly, and the resulting string is stored in the variable on the right.', '```python', 'name = "John"', 'age = 30', 'string = name + " is " + str(age)', '```', '**2. String Formatting**', 'To format a string with variables, you can use string formatting syntax. This allows you to insert values directly into the string template.', '```python', 'name = "John"', 'age = 30', 'message = f"Hello, my name is {name} and I am {age} years old."', '```', '**3. Concatenating String Variables**', 'You can concatenate multiple variables into a string using the `join()` method.', '```python', 'name

In [11]:
def formatting_func(example):
  text = f"Answer: {example['Answer'][0]}"
  return [text]

#### Step 4: Configure supervised fine-tuning parameters.

In [12]:
trainer = SFTTrainer(
    model = model,
    train_dataset = data,
    args = transformers.TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,  # 2
        max_steps = 60,
        learning_rate = 2e-4,
        fp16 = True,
        logging_steps = 1,
        output_dir = "outputs",
        optim = "paged_adamw_8bit"
    ),
    peft_config = lora_config,
    formatting_func = formatting_func
)



Map:   0%|          | 0/419 [00:00<?, ? examples/s]

max_steps is given, it will override any value given in num_train_epochs


#### Step 5: Start model fine-tuning.

In [13]:
trainer.train()

[34m[1mwandb[0m: Using wandb-core as the SDK backend. Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33manujd9[0m. Use [1m`wandb login --relogin`[0m to force relogin


Step,Training Loss
1,0.4266
2,0.4266
3,0.4231
4,0.4165
5,0.4039
6,0.3851
7,0.3608
8,0.3326
9,0.3018
10,0.2697


TrainOutput(global_step=60, training_loss=0.08205500186171169, metrics={'train_runtime': 34.3891, 'train_samples_per_second': 27.916, 'train_steps_per_second': 1.745, 'total_flos': 120457425715200.0, 'train_loss': 0.08205500186171169, 'epoch': 60.0})

#### Step 6: Inference on the fine-tuned model.

In [14]:
# Output from the model AFTER Fine-tuning
text = "How do you merge two dictionaries in Python?"
device = "cuda:0"
inputs = tokenizer(text, return_tensors = "pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=250, eos_token_id=tokenizer.eos_token_id)
model_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
display(Markdown(f"**Model Response:**\n\n{model_response}"))

**Model Response:**

How do you merge two dictionaries in Python?

Answer:

Step 1/3
1. First, we need to create two empty dictionaries.

Step 2/3
2. Then, we need to assign the first dictionary to a variable.

Step 3/3
3. Finally, we need to assign the second dictionary to another variable. Here's the code: ``` import pprint dict1 = {'key1': 'value1', 'key2': 'value2'} dict2 = {'key3': 'value3', 'key4': 'value4'} merged_dict = dict1.copy() merged_dict.update(dict2) pprint(merged_dict) ``` Output: ``` {'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4'} ``` In this example, we have created two dictionaries and assigned them to two variables. We have then used the `copy()` method to create a copy of `dict1` and assigned it to `merged_dict`. We have also updated `merged_dict` with the values of `dict2` using the `update()` method. The final dictionary `merged_dict` now

In [19]:
# Test Generated Code
import pprint

dict1 = {'key1': 'value1', 'key2': 'value2'}
dict2 = {'key3': 'value3', 'key4': 'value4'}
merged_dict = dict1.copy()
merged_dict.update(dict2)
# Expected Output: {'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4'}
pprint.pprint(merged_dict)

{'key1': 'value1', 'key2': 'value2', 'key3': 'value3', 'key4': 'value4'}


In [16]:
# Output from the model AFTER Fine-tuning
text = "How do you check if a string is a palindrome in Python"
device = "cuda:0"
inputs = tokenizer(text, return_tensors = "pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=250, eos_token_id=tokenizer.eos_token_id)
model_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
display(Markdown(f"**Model Response:**\n\n{model_response}"))

**Model Response:**

How do you check if a string is a palindrome in Python?

Answer:

Step 1/5
1. Define a string variable `word`.

Step 2/5
2. Initialize the variable `word` with the input string.

Step 3/5
3. Create a `reverse()` method that takes a single parameter `word`.

Step 4/5
4. Reverse the `word` parameter using the `reverse()` method.

Step 5/5
5. Compare the original `word` variable with the reversed `word` parameter using the `==` operator. Here's the code: ```python word = input("Enter a word: ") if word == word[::-1]: print("The word is a palindrome.") else: print("The word is not a palindrome.") ``` In this code, we first initialize the `word` variable with the input string. Then, we create a `reverse()` method that takes a single parameter `word`. This method reverses the `word` parameter and returns it. Finally, we compare the original `word` variable with the reversed `word` parameter using the `==` operator. If they are equal, we print that the `word` is a palindrome. Otherwise, we print that

In [20]:
# Test Generated Code
word = input("Enter a word: ")

if word == word[::-1]:
  print("The word is a palindrome.")
else:
  print("The word is not a palindrome.")

Enter a word: madam
The word is a palindrome.


#### Step 7: Save the fine-tuned model.

In [18]:
fine_tuned_model = "fine_tuned_gemma2b-it_unmerged"
trainer.model.save_pretrained(fine_tuned_model)

# Push the model on Hugging Face.
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    low_cpu_mem_usage = True,
    return_dict = True,
    torch_dtype = torch.float16,
    device_map = {"": 0}
)

# Merge the fine-tuned model with LoRA adaption along with the base Gemma 2b-it model.
fine_tuned_merged_model = PeftModel.from_pretrained(base_model, fine_tuned_model)
fine_tuned_merged_model = fine_tuned_merged_model.merge_and_unload()

# Save the fine-tuned merged model.
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code = True)
fine_tuned_merged_model.save_pretrained("fine_tuned_gemma2b-it", safe_serialization = True)
tokenizer.save_pretrained("fine_tuned_gemma2b-it")
tokenizer.padding_side = "right"

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]