# Assignment 4: Instruction finetuning a Llama-2 7B model - part 5a
## **Objective**
1. Fine tune the Llama-2 7B model using QA data generated using Claude-3-Sonnet
2. Fine tune the Llama-2 7B model using duplicate questions but different diverse set of answers, 1 set generated by Claude-3-Sonnet and 1 set generated by Mistral Large


### Note:
In part 5a we will do the first fine tune  
In part 5b we will do the second fine tune
We will evaluate all 3 fine tuned models in part 6


In [2]:
# Installing required packages
!pip install -U -q peft==0.6.2 transformers==4.35.2 datasets==2.15.0 bitsandbytes==0.41.2.post2 trl==0.7.4 accelerate==0.24.1 scipy==1.12.0 wandb==0.16.5 coloredlogs==15.0.1


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [1]:
# Load required packages

from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments, pipeline
from datasets import load_dataset, Dataset
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
import torch

import pickle


  from .autonotebook import tqdm as notebook_tqdm
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(
  _torch_pytree._register_pytree_node(


In [14]:
# load SUTD QA dataset from step 1
with open('sutd_qa_dataset_sonnet.pkl', 'rb') as f:
    sutd_qa_dataset = pickle.load(f)

In [15]:
# split data into traing and test set, 160 instances for train, rest for test
sutd_qa_dataset = sutd_qa_dataset.train_test_split(train_size=0.8, shuffle=False)

In [16]:
# check schema and number of instances
sutd_qa_dataset

DatasetDict({
    train: Dataset({
        features: ['question', 'answer'],
        num_rows: 160
    })
    test: Dataset({
        features: ['question', 'answer'],
        num_rows: 40
    })
})

In [17]:
# inspect first instance
sutd_qa_dataset["train"][0]

{'question': 'What are the core academic programs offered at SUTD?\n',
 'answer': 'Based on the context provided, the core academic programs offered at the Singapore University of Technology and Design (SUTD) are:\n\n1) Undergraduate programs, which are not explicitly listed but implied by mentions of "Transition Into SUTD" and "Integrated Learning Programme".\n\n2) Master\'s programs, including Master of Architecture, Master of Engineering (Research), Master of Innovation by Design, Master of Science in Security by Design, Master of Science in Urban Science, Policy and Planning, MSc in Technology and Design, and MTD (AI Empowered Built Environment).\n\n3) A Dual Master\'s program in Nano-Electronic Engineering and Design, offered in collaboration with CGU (likely referring to Claremont Graduate University).'}

In [18]:
# QUESTION: create a formating function 'formatting_func' which takes an example from your QA dataset as input and outputs
# a dictionary with the key "text" and as value a text prompt with the following format:
# ### USER: {question from example goes here}
# ### ASSISTANT: {answer from example goes here}


#--- ADD YOUR SOLUTION HERE (10 points)---
def formatting_func(example):
    formatted_text = f"### USER: {example['question']}\n### ASSISTANT: {example['answer']}"
    return {"text": formatted_text}
#----------------------------------------


In [19]:
# apply formatting function to data set
formatted_dataset = sutd_qa_dataset.map(formatting_func)

Map: 100%|██████████| 160/160 [00:00<00:00, 16009.94 examples/s]
Map: 100%|██████████| 40/40 [00:00<00:00, 6046.28 examples/s]


In [20]:
# check formatted prompt
formatted_dataset["train"]["text"][0]

# Note: you should see something like this (not necessary the same prompt but same format)
# '### USER: What are some of the best places to eat near the SUTD campus?\n### ASSISTANT: There are several great dining options near the SUTD campus.
# One popular spot is the Changi Business Park Food Court, ...


'### USER: What are the core academic programs offered at SUTD?\n\n### ASSISTANT: Based on the context provided, the core academic programs offered at the Singapore University of Technology and Design (SUTD) are:\n\n1) Undergraduate programs, which are not explicitly listed but implied by mentions of "Transition Into SUTD" and "Integrated Learning Programme".\n\n2) Master\'s programs, including Master of Architecture, Master of Engineering (Research), Master of Innovation by Design, Master of Science in Security by Design, Master of Science in Urban Science, Policy and Planning, MSc in Technology and Design, and MTD (AI Empowered Built Environment).\n\n3) A Dual Master\'s program in Nano-Electronic Engineering and Design, offered in collaboration with CGU (likely referring to Claremont Graduate University).'

In [21]:
# model id of base model
model_id = "NousResearch/Llama-2-7b-hf"

# model id for our finetuned model
new_model = "llama-7b-qlora-sutd-qa-sonnet"

# config for model quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    use_nested_quant = False
)

# Load the entire model on the GPU 0
device_map = {"": 0}


In [22]:
# Load model

# QUESTION: load the base LLM into a variable 'model' using the HF AutoModelForCausalLM class with the given quantization. Load all weights to the GPU

#--- ADD YOUR SOLUTION HERE (10 points)---
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    device_map={"": 0}
)
#------------------------------------------

Loading checkpoint shards: 100%|██████████| 2/2 [00:15<00:00,  7.50s/it]


In [23]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

In [24]:
# Apply lora configuration
lora_config = LoraConfig(
    lora_alpha=8,
    r=8,
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM",
)

In [25]:

# QUESTION: Now it is time to configure the training parameters.
# To make it easier for your, the list of parameters is given.
# Find reasonable values for the parameters, at least something that make the training run without crashing.
# You can refer to the lab exercises and to open source examples on the internet

# list of parameters, some with pre-set values, others you need to set yourself:
# output_dir = "./results"
# per_device_train_batch_size
# gradient_accumulation_steps
# optim
# save_steps = 10
# logging_steps = 10
# learning_rate
# weight_decay
# max_grad_norm
# num_train_epochs
# warmup_ratio
# lr_scheduler_type

# Arguments in SFTTrainer Class
# packing
# max_seq_length

#--- ADD YOUR SOLUTION HERE (20 points)---
training_arguments = TrainingArguments(
    report_to="none",
    output_dir="./results",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=1,
    optim="adamw_bnb_8bit",
    save_steps=10,
    logging_steps=10,
    learning_rate=2e-4,
    weight_decay=0.001,
    max_grad_norm=0.3,
    num_train_epochs=1,
    warmup_ratio=0.03,
    lr_scheduler_type="constant_with_warmup",
    # group_by_length=True,
    # max_steps=-1,
    # fp16=False,
    # bf16=False,
)

packing=False
# print(tokenizer.model_max_length)
max_seq_length = 512 # min(tokenizer.model_max_length, 1024) # Try 512. Can increase to 1024 or decrease to 256.

#------------------------------------------

In [26]:
# configure trainer

trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=formatted_dataset["train"],
    eval_dataset=formatted_dataset["test"],
    peft_config=lora_config,
    dataset_text_field="text",
    packing=packing,
    tokenizer=tokenizer,
    max_seq_length=max_seq_length,
)

Map: 100%|██████████| 160/160 [00:00<00:00, 3383.17 examples/s]
Map: 100%|██████████| 40/40 [00:00<00:00, 4952.39 examples/s]


In [27]:
# now finetune the model!
trainer.train()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,2.0599
20,1.7075
30,1.3571
40,1.2933
50,1.1182
60,1.1282
70,1.167
80,1.1213


TrainOutput(global_step=80, training_loss=1.369066059589386, metrics={'train_runtime': 61.226, 'train_samples_per_second': 2.613, 'train_steps_per_second': 1.307, 'total_flos': 998395118223360.0, 'train_loss': 1.369066059589386, 'epoch': 1.0})

In [28]:
# Save trained model
trainer.model.save_pretrained(new_model)

In [29]:
#evaluate and return the metrics
trainer.evaluate()

{'eval_loss': 1.1335844993591309,
 'eval_runtime': 5.7187,
 'eval_samples_per_second': 6.995,
 'eval_steps_per_second': 0.874,
 'epoch': 1.0}

In [30]:
# Empty VRAM
# Note: this did not unload everything from the GPU, maybe you can find a way to fix this
# As a workaorund you can restart your kernel to clear the GPU, then run the below cells
# https://stackoverflow.com/questions/69357881/how-to-remove-the-model-of-transformers-in-gpu-memory

import gc

del trainer
del model
del tokenizer

gc.collect()
torch.cuda.empty_cache()

In [31]:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
from peft import get_peft_model, LoraConfig, PeftModel
import transformers
import torch


# model id of base model (repeat in case of kernel restart)
model_id = "NousResearch/Llama-2-7b-hf"

# model id for our finetuned model (repeat in case of kernel restart)
new_model = "llama-7b-qlora-sutd-qa-sonnet"

# Load the entire model on the GPU 0 (repeat in case of kernel restart)
device_map = {"": 0}


# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map
)

model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards: 100%|██████████| 2/2 [00:03<00:00,  1.61s/it]


In [32]:
from huggingface_hub import login

# log in to huggingface, you need to put your huggingface access token
# https://huggingface.co/docs/hub/en/security-tokens

hf_access_token = "YOUR_HF_ACCESS"
login(token=hf_access_token)

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /home/jon/.cache/huggingface/token
Login successful


In [33]:
# Saved at https://huggingface.co/jtz18/llama-7b-qlora-sutd-qa/

# push finetuned model to huggingface
model.push_to_hub(new_model, use_temp_dir=False)


Thrown during validation:
`do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
model-00002-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]
[A

[A[A

model-00002-of-00003.safetensors:   0%|          | 16.4k/4.95G [00:00<13:09:59, 104kB/s]
model-00002-of-00003.safetensors:   0%|          | 1.18M/4.95G [00:00<14:47, 5.57MB/s]  

[A[A
model-00002-of-00003.safetensors:   0%|          | 1.87M/4.95G [00:00<16:58, 4.86MB/s]

[A[A
[A

[A[A

model-00002-of-00003.safetensors:   0%|          | 2.44M/4.95G [00:00<29:22, 2.81MB/s]
[A

model-00002-of-00003.safetensors:   0%|          | 2.85M/4.95G [00:00<31:25, 2.62MB/s]

[A[A
[A

model-00002-of-00003.safetensors:   0%|          | 3.19M/4.95G [00:01<39:07, 2.11MB/s]

model-00002-of-00003.safetensors:   0%|          | 4.42M/4.95G [00:01<22:28, 3.67MB/s]
[A

model-00002-of-00003.safetenso

CommitInfo(commit_url='https://huggingface.co/jtz18/llama-7b-qlora-sutd-qa-sonnet/commit/405e0799320bd07f5a994170e62ad2ef80c2a017', commit_message='Upload LlamaForCausalLM', commit_description='', oid='405e0799320bd07f5a994170e62ad2ef80c2a017', pr_url=None, pr_revision=None, pr_num=None)

In [34]:
# Saved at https://huggingface.co/jtz18/llama-7b-qlora-sutd-qa-sonnet/

# push tokenizer to huggingface
tokenizer.push_to_hub(new_model, use_temp_dir=False)

tokenizer.model: 100%|██████████| 500k/500k [00:00<00:00, 858kB/s] 


CommitInfo(commit_url='https://huggingface.co/jtz18/llama-7b-qlora-sutd-qa-sonnet/commit/e5469db629ba74170c2df16734e89c79e4e6eb03', commit_message='Upload tokenizer', commit_description='', oid='e5469db629ba74170c2df16734e89c79e4e6eb03', pr_url=None, pr_revision=None, pr_num=None)

### This concludes the second part of the assignment. Continue with the next part

### This concludes the second part of the assignment. Continue with the next part