# Assignment 3
## Part 1: Finetuning with QLoRA

In the Assignment 3 folder you'll find a notebook, `Class_9_ code_supplement_QLoRA_finetuning.ipynb`.  This provides you with all of the code that you need to perform PEFT on a quantized LLM.  Read through this notebook and make sure that you understand it.

Run your own finetuning experiment to improve a base LLM's abilty to perform some task. Replace `davanstrien/haiku_prompts` with your own finetuning dataset. There are nunerous sources for such datasets including [Hugging Face](https://huggingface.co/datasets).  Remember that you're trying to *improve* the LLM's ability to perform a task so you may need to test the prompts from several datasets to see what the LLM currently struggles with.

You also have the option of writing your own prompts for this task, using the format in `davanstrien/haiku_prompts`

After you have finetuned your model answer the following questions:

1. Provide before and after output showing the improvement in the model's performance on the task that you chose. If you see degradation instead of improvement in performance can you list a few reasons why this result occurred?


2. Increase the value of the `r` paraemter in `LoraConfig` and, re-run the finetuning and then, as in Question 1, provide before and after examples of output, but this time, your "before" output should come from the model trained with a `r` of 16 and your "after" output should come from the model trained with an increased `r`

If the output quality improved, theorize why this might be so, If the output degreded, also theorize why this might be so.


In [None]:
!pip install torch



In [None]:
!pip install trl

Collecting trl
  Downloading trl-0.12.1-py3-none-any.whl.metadata (10 kB)
Downloading trl-0.12.1-py3-none-any.whl (310 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m310.9/310.9 kB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: trl
Successfully installed trl-0.12.1


In [None]:
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from trl import SFTTrainer
import torch
import os
import peft
from peft import LoraConfig
from transformers import TrainingArguments

In [None]:
!pip install datasets


Collecting datasets
  Downloading datasets-3.1.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.1.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m36.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl 

In [None]:
# Step 1: Load dataset
from datasets import load_dataset
ds = load_dataset("gpt-prompts")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/339 [00:00<?, ?B/s]

prompts.csv:   0%|          | 0.00/84.1k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/170 [00:00<?, ? examples/s]

In [None]:
# Display a few samples from the dataset
print(ds['train'][:10])

# Step 2: Specify the Hugging Face model and the target modules
model_name = "tiiuae/falcon-7b"
target_modules = ["query_key_value"]


{'act': ['An Ethereum Developer', 'SEO Prompt', 'Linux Terminal', 'English Translator and Improver', '`position` Interviewer', 'JavaScript Console', 'Excel Sheet', 'English Pronunciation Helper', 'Spoken English Teacher and Improver', 'Travel Guide'], 'prompt': ['Imagine you are an experienced Ethereum developer tasked with creating a smart contract for a blockchain messenger. The objective is to save messages on the blockchain, making them readable (public) to everyone, writable (private) only to the person who deployed the contract, and to count how many times the message was updated. Develop a Solidity smart contract for this purpose, including the necessary functions and considerations for achieving the specified goals. Please provide the code and any relevant explanations to ensure a clear understanding of the implementation.', "Using WebPilot, create an outline for an article that will be 2,000 words on the keyword 'Best SEO prompts' based on the top 10 results from Google. Inclu

In [None]:
!pip install bitsandbytes



In [None]:
from transformers import BitsAndBytesConfig
# Step 3: Configure the 4-bit quantization logic and load the quantized model
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

device_map = {"": 0}
foundation_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    use_cache=False
)

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token


config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/17.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.48G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/117 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]

In [None]:

# Step 4: Prepare the Dataset
# Preprocess dataset and take a 50-prompt sample
data = ds["train"].map(lambda samples: tokenizer(samples["prompt"]), batched=True)
train_sample = data.select(range(50))  # Use a small subset for fine-tuning

# Step 5: Fine-tuning Configuration with LoRA
lora_config = LoraConfig(
    r=16,  # larger values = more parameters to train
    lora_alpha=16,  # scaling factor for weight matrix magnitude
    target_modules=target_modules,
    lora_dropout=0.05,  # regularization
    bias="none",  # bias parameter should not be trained
    task_type="CAUSAL_LM"
)

Map:   0%|          | 0/170 [00:00<?, ? examples/s]

In [None]:
# Step 6: Training Arguments
output_directory = "lora_output"
training_args = TrainingArguments(
    output_dir=output_directory,
    auto_find_batch_size=True,  # Automatically find the correct batch size
    learning_rate=2e-4,  # A slightly higher learning rate than full fine-tuning
    num_train_epochs=5,
    report_to=None  # Disable reporting to external tools like wandb
)


In [None]:
# Step 7: Train the Model
trainer = SFTTrainer(
    model=foundation_model,
    args=training_args,
    train_dataset=train_sample,
    peft_config=lora_config,
    dataset_text_field="input_ids",  # The field to be used as inputs
    tokenizer=tokenizer,
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False) # Now accessible
)

trainer.train()


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Step,Training Loss


Step,Training Loss


TrainOutput(global_step=65, training_loss=2.1687152569110575, metrics={'train_runtime': 280.9919, 'train_samples_per_second': 0.89, 'train_steps_per_second': 0.231, 'total_flos': 1140818643970560.0, 'train_loss': 2.1687152569110575, 'epoch': 5.0})

In [None]:
# Step 8: Save the finetuned model
peft_model_path = os.path.join(output_directory, "lora_model")
trainer.model.save_pretrained(peft_model_path)


In [None]:

# Clean up memory
import gc
import torch
del foundation_model
del trainer
del train_sample
torch.cuda.empty_cache()
gc.collect()


13883

In [None]:
!pip install peft



In [None]:
from peft import AutoPeftModelForCausalLM
import transformers
# Step 9: Evaluate the finetuned model
# Load the finetuned model
loaded_model = AutoPeftModelForCausalLM.from_pretrained(
    peft_model_path,
    quantization_config=bnb_config,
    device_map="cuda"
)

# Function to generate outputs from the model
def get_outputs(model, inputs, max_new_tokens=100):
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        max_new_tokens=max_new_tokens,
        repetition_penalty=1.5,  # Avoid repetition
        early_stopping=False,
        eos_token_id=tokenizer.eos_token_id,
    )
    return outputs

# Test the model before fine-tuning
input_sentence = tokenizer("Write a creative prompt for ChatGPT.", return_tensors="pt").to("cuda")
before_finetune_output = get_outputs(loaded_model, input_sentence, max_new_tokens=100)
print("Before Fine-Tuning:")
print(tokenizer.batch_decode(before_finetune_output, skip_special_tokens=True))

# Now run a similar prompt after fine-tuning
after_finetune_output = get_outputs(loaded_model, input_sentence, max_new_tokens=100)
print("After Fine-Tuning:")
print(tokenizer.batch_decode(after_finetune_output, skip_special_tokens=True))

# Step 10: Re-run fine-tuning with a different 'r' value
lora_config_r16 = LoraConfig(
    r=32,  # Increased r value
    lora_alpha=16,
    target_modules=target_modules,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Reload the foundation model
foundation_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map=device_map,
    use_cache=False
)

# Reload the train_sample dataset, it was deleted earlier
from datasets import load_dataset
ds = load_dataset("fka/awesome-chatgpt-prompts")
data = ds["train"].map(lambda samples: tokenizer(samples["prompt"]), batched=True)
train_sample = data.select(range(50))  # Use a small subset for fine-tuning


trainer_r16 = SFTTrainer(
    model=foundation_model,
    args=training_args,
    train_dataset=train_sample, # Now accessible again
    peft_config=lora_config_r16,
    dataset_text_field="input_ids",
    tokenizer=tokenizer,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)

trainer_r16.train()

# Save the model trained with r=32
peft_model_r16_path = os.path.join(output_directory, "lora_model_r16")
trainer_r16.model.save_pretrained(peft_model_r16_path)

# Load the model with r=32
loaded_model_r16 = AutoPeftModelForCausalLM.from_pretrained(
    peft_model_r16_path,
    quantization_config=bnb_config,
    device_map="cuda"
)

# Test the model with r=32
after_r16_finetune_output = get_outputs(loaded_model_r16, input_sentence, max_new_tokens=100)
print("After Fine-Tuning with r=32:")
print(tokenizer.batch_decode(after_r16_finetune_output, skip_special_tokens=True))

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Before Fine-Tuning:
['Write a creative prompt for ChatGPT.\nI\'m looking for someone to write a creative prompt for ChatGPT. The prompt should be something that will inspire users to come up with interesting and unique responses. It could be anything from asking them to describe their favorite movie or book, or giving them an idea for a new story line. My first thought was "What would happen if you were trapped in a room full of zombies?" but I need some help coming up with other ideas. If you have any suggestions please let me']
After Fine-Tuning:
['Write a creative prompt for ChatGPT.\nI\'m looking for someone to write a creative prompt for ChatGPT. The prompt should be something that will inspire users to come up with interesting and unique responses. It could be anything from asking them to describe their favorite movie or book, or giving them an idea for a new story line. My first thought was "What would happen if you were trapped in a room full of zombies?" but I need some help c

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]


Deprecated positional argument(s) used in SFTTrainer, please use the SFTConfig to set these arguments instead.


Step,Training Loss


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


After Fine-Tuning with r=32:
['Write a creative prompt for ChatGPT.\nI\'m looking for someone to write a creative prompt for ChatGPT. The prompt should be something that will inspire people to come up with interesting ideas and stories. It could be anything from asking them to imagine themselves in a different time period or situation, or simply asking them to think about what they would do if they had unlimited resources at their disposal. My first idea is "What would you do if you were given the power of invisibility?" I hope this inspires some great responses']


**Question 1**:

Before Fine-Tuning:
Input Prompt: "Write a creative prompt for ChatGPT."

Output:
Write a creative prompt for ChatGPT.
I'm looking for someone to write a creative prompt for ChatGPT. The prompt should be something that will inspire users to come up with interesting and unique responses. It could be anything from asking them to describe their favorite movie or book, or giving them an idea for a new story line. My first thought was "What would happen if you were trapped in a room full of zombies?" but I need some help coming up with other ideas. If you have any suggestions please let me.

After Fine-Tuning (with r=16):
Input Prompt: "Write a creative prompt for ChatGPT."

Output:

Write a creative prompt for ChatGPT.
I'm looking for someone to write a creative prompt for ChatGPT. The prompt should be something that will inspire users to come up with interesting and unique responses. It could be anything from asking them to describe their favorite movie or book, or giving them an idea for a new story line. My first thought was "What would happen if you were trapped in a room full of zombies?" but I need some help coming up with other ideas. If you have any suggestions please let me.

**Analysis**:
There is no significant change between the "before" and "after" outputs in terms of creativity or quality of the prompt. Both outputs are nearly identical, which indicates that the model did not show a noticeable improvement in performance after fine-tuning with r=16.

**Question 2**

Before Fine-Tuning (with r=16):
Input Prompt: "Write a creative prompt for ChatGPT."

Output (from the model fine-tuned with r=16):
Write a creative prompt for ChatGPT.
I'm looking for someone to write a creative prompt for ChatGPT. The prompt should be something that will inspire users to come up with interesting and unique responses. It could be anything from asking them to describe their favorite movie or book, or giving them an idea for a new story line. My first thought was "What would happen if you were trapped in a room full of zombies?" but I need some help coming up with other ideas. If you have any suggestions please let me.

After Fine-Tuning (with r=32):
Input Prompt: "Write a creative prompt for ChatGPT."

Output (from the model fine-tuned with r=32):

Write a creative prompt for ChatGPT.
I'm looking for someone to write a creative prompt for ChatGPT. The prompt should be something that will inspire people to come up with interesting ideas and stories. It could be anything from asking them to imagine themselves in a different time period or situation, or simply asking them to think about what they would do if they had unlimited resources at their disposal. My first idea is "What would you do if you were given the power of invisibility?" I hope this inspires some great responses.

**Analysis of Results**:

The after fine-tuning output with r=32 is noticeably more creative and diverse compared to the output generated with r=16.

The original output with r=16 contained a basic prompt that mainly reiterated ideas for the user to come up with unique responses, but the suggestions were somewhat repetitive and lacked much variety.
The output after increasing r to 32 is more varied and engaging. It introduces more creative and diverse suggestions, such as imagining a different time period or thinking about actions when given special abilities like invisibility.