# Assignment 4: Instruction finetuning a Llama-2 7B model - part 5b

## Objective: Fine tune a Llama-2 7B model on sonnet and mistral large outputs



### Step 2: finetune model
The second step of the assignment finetune an LLM model using the synthetic question-answer pairs from step 1. 
We will use LoRA and Huggingface. It can be tricky to make finetuning on small GPUs. You can take the code samples from the labs as a starting pint. 
There are many blogs and guides on the internet you can consult, too. 

In [1]:
# Installing required packages
!pip install -U -q peft==0.6.2 transformers==4.35.2 datasets==2.15.0 bitsandbytes==0.41.2.post2 trl==0.7.4 accelerate==0.24.1 scipy==1.12.0 wandb==0.16.5 coloredlogs==15.0.1

In [2]:
# Load required packages

from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments, pipeline
from datasets import load_dataset, Dataset
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
import torch

import pickle


  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# load SUTD QA dataset from step 1
with open('sutd_qa_dataset_sonnet_mistral.pkl', 'rb') as f:
    sutd_qa_dataset = pickle.load(f)

In [4]:
# split data into traing and test set, 160 instances for train, rest for test
sutd_qa_dataset = sutd_qa_dataset.train_test_split(train_size=0.8, shuffle=False)

In [5]:
# check schema and number of instances
sutd_qa_dataset

DatasetDict({
    train: Dataset({
        features: ['question', 'answer'],
        num_rows: 320
    })
    test: Dataset({
        features: ['question', 'answer'],
        num_rows: 80
    })
})

In [6]:
# inspect first instance
sutd_qa_dataset["train"][0]

{'question': 'What are the core academic programs offered at SUTD?\n',
 'answer': 'Based on the context provided, the core academic programs offered at the Singapore University of Technology and Design (SUTD) are:\n\n1) Undergraduate programs, which are not explicitly listed but implied by mentions of "Transition Into SUTD" and "Integrated Learning Programme".\n\n2) Master\'s programs, including Master of Architecture, Master of Engineering (Research), Master of Innovation by Design, Master of Science in Security by Design, Master of Science in Urban Science, Policy and Planning, MSc in Technology and Design, and MTD (AI Empowered Built Environment).\n\n3) A Dual Master\'s program in Nano-Electronic Engineering and Design, offered in collaboration with CGU (likely referring to Claremont Graduate University).'}

In [7]:
# QUESTION: create a formating function 'formatting_func' which takes an example from your QA dataset as input and outputs
# a dictionary with the key "text" and as value a text prompt with the following format:
# ### USER: {question from example goes here}
# ### ASSISTANT: {answer from example goes here}


#--- ADD YOUR SOLUTION HERE (10 points)---
def formatting_func(example):
    formatted_text = f"### USER: {example['question']}\n### ASSISTANT: {example['answer']}"
    return {"text": formatted_text}
#----------------------------------------


In [8]:
# apply formatting function to data set
formatted_dataset = sutd_qa_dataset.map(formatting_func)

Map: 100%|██████████| 320/320 [00:00<00:00, 18133.12 examples/s]
Map: 100%|██████████| 80/80 [00:00<00:00, 12154.32 examples/s]


In [9]:
# check formatted prompt
formatted_dataset["train"]["text"][0]

# Note: you should see something like this (not necessary the same prompt but same format)
# '### USER: What are some of the best places to eat near the SUTD campus?\n### ASSISTANT: There are several great dining options near the SUTD campus.
# One popular spot is the Changi Business Park Food Court, ...


'### USER: What are the core academic programs offered at SUTD?\n\n### ASSISTANT: Based on the context provided, the core academic programs offered at the Singapore University of Technology and Design (SUTD) are:\n\n1) Undergraduate programs, which are not explicitly listed but implied by mentions of "Transition Into SUTD" and "Integrated Learning Programme".\n\n2) Master\'s programs, including Master of Architecture, Master of Engineering (Research), Master of Innovation by Design, Master of Science in Security by Design, Master of Science in Urban Science, Policy and Planning, MSc in Technology and Design, and MTD (AI Empowered Built Environment).\n\n3) A Dual Master\'s program in Nano-Electronic Engineering and Design, offered in collaboration with CGU (likely referring to Claremont Graduate University).'

In [10]:
# model id of base model
model_id = "NousResearch/Llama-2-7b-hf"

# model id for our finetuned model
new_model = "llama-7b-qlora-sutd-qa-sonnet-mistral"

# config for model quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    use_nested_quant = False
)

# Load the entire model on the GPU 0
device_map = {"": 0}


In [11]:
# Load model

# QUESTION: load the base LLM into a variable 'model' using the HF AutoModelForCausalLM class with the given quantization. Load all weights to the GPU

#--- ADD YOUR SOLUTION HERE (10 points)---
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    device_map={"": 0}
)
#------------------------------------------

Downloading shards: 100%|██████████| 2/2 [03:12<00:00, 96.24s/it] 
Loading checkpoint shards: 100%|██████████| 2/2 [00:26<00:00, 13.38s/it]


In [12]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

In [13]:
# Apply lora configuration
lora_config = LoraConfig(
    lora_alpha=8,
    r=8,
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM",
)

In [14]:

# QUESTION: Now it is time to configure the training parameters.
# To make it easier for your, the list of parameters is given.
# Find reasonable values for the parameters, at least something that make the training run without crashing.
# You can refer to the lab exercises and to open source examples on the internet

# list of parameters, some with pre-set values, others you need to set yourself:
# output_dir = "./results"
# per_device_train_batch_size
# gradient_accumulation_steps
# optim
# save_steps = 10
# logging_steps = 10
# learning_rate
# weight_decay
# max_grad_norm
# num_train_epochs
# warmup_ratio
# lr_scheduler_type

# Arguments in SFTTrainer Class
# packing
# max_seq_length

#--- ADD YOUR SOLUTION HERE (20 points)---
training_arguments = TrainingArguments(
    report_to="none",
    output_dir="./results",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=1,
    optim="adamw_bnb_8bit",
    save_steps=10,
    logging_steps=10,
    learning_rate=2e-4,
    weight_decay=0.001,
    max_grad_norm=0.3,
    num_train_epochs=1,
    warmup_ratio=0.03,
    lr_scheduler_type="constant_with_warmup",
    # group_by_length=True,
    # max_steps=-1,
    # fp16=False,
    # bf16=False,
)

packing=False
# print(tokenizer.model_max_length)
max_seq_length = 512 # min(tokenizer.model_max_length, 1024) # Try 512. Can increase to 1024 or decrease to 256.

#------------------------------------------

In [15]:
# configure trainer

trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=formatted_dataset["train"],
    eval_dataset=formatted_dataset["test"],
    peft_config=lora_config,
    dataset_text_field="text",
    packing=packing,
    tokenizer=tokenizer,
    max_seq_length=max_seq_length,
)

Map: 100%|██████████| 320/320 [00:00<00:00, 5211.85 examples/s]
Map: 100%|██████████| 80/80 [00:00<00:00, 5409.65 examples/s]


In [16]:
# now finetune the model!
trainer.train()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,2.1187
20,1.8717
30,1.387
40,1.2772
50,1.1617
60,1.1609
70,1.1229
80,1.0723
90,1.124
100,1.0332


TrainOutput(global_step=160, training_loss=1.25071479678154, metrics={'train_runtime': 174.9705, 'train_samples_per_second': 1.829, 'train_steps_per_second': 0.914, 'total_flos': 1949266501877760.0, 'train_loss': 1.25071479678154, 'epoch': 1.0})

In [17]:
# Save trained model
trainer.model.save_pretrained(new_model)

In [18]:
#evaluate and return the metrics
trainer.evaluate()

{'eval_loss': 0.9711307287216187,
 'eval_runtime': 14.5127,
 'eval_samples_per_second': 5.512,
 'eval_steps_per_second': 0.689,
 'epoch': 1.0}

In [19]:
# Empty VRAM
# Note: this did not unload everything from the GPU, maybe you can find a way to fix this
# As a workaorund you can restart your kernel to clear the GPU, then run the below cells
# https://stackoverflow.com/questions/69357881/how-to-remove-the-model-of-transformers-in-gpu-memory

import gc

del trainer
del model
del tokenizer

gc.collect()
torch.cuda.empty_cache()

In [20]:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
from peft import get_peft_model, LoraConfig, PeftModel
import transformers
import torch


# model id of base model (repeat in case of kernel restart)
model_id = "NousResearch/Llama-2-7b-hf"

# model id for our finetuned model (repeat in case of kernel restart)
new_model = "llama-7b-qlora-sutd-qa-sonnet-mistral"

# Load the entire model on the GPU 0 (repeat in case of kernel restart)
device_map = {"": 0}


# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map
)

model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards: 100%|██████████| 2/2 [00:07<00:00,  3.56s/it]


In [21]:
from huggingface_hub import login

# log in to huggingface, you need to put your huggingface access token
# https://huggingface.co/docs/hub/en/security-tokens

hf_access_token = "YOUR_HF_ACCESS"
login(token=hf_access_token)

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /home/jovyan/.cache/huggingface/token
Login successful


In [22]:
# Saved at https://huggingface.co/jtz18/llama-7b-qlora-sutd-qa/

# push finetuned model to huggingface
model.push_to_hub(new_model, use_temp_dir=False)


Thrown during validation:
`do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
model-00003-of-00003.safetensors:   0%|          | 0.00/3.59G [00:00<?, ?B/s]
Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s][A

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s][A[A


model-00002-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s][A[A[A

model-00003-of-00003.safetensors:   0%|          | 1.74M/3.59G [00:00<05:11, 11.5MB/s][A[A


model-00002-of-00003.safetensors:   0%|          | 3.46M/4.95G [00:00<04:07, 20.0MB/s][A[A[A

model-00001-of-00003.safetensors:   0%|          | 2.13M/4.94G [00:00<11:43, 7.02MB/s][A[A


model-00003-of-00003.safetensors:   0%|          | 2.90M/3.59G [00:00<09:30, 6.29MB/s][A[A[A

model-00001-of-00003.safetensors:   0%|          | 5.65M/4.94G [00:00<05:55, 13.9MB/s][A

CommitInfo(commit_url='https://huggingface.co/jtz18/llama-7b-qlora-sutd-qa-sonnet-mistral/commit/d364cf0d14d891b35a430b7737e8cf10edff1e74', commit_message='Upload LlamaForCausalLM', commit_description='', oid='d364cf0d14d891b35a430b7737e8cf10edff1e74', pr_url=None, pr_revision=None, pr_num=None)

In [23]:
# Saved at https://huggingface.co/jtz18/llama-7b-qlora-sutd-qa/

# push tokenizer to huggingface
tokenizer.push_to_hub(new_model, use_temp_dir=False)

CommitInfo(commit_url='https://huggingface.co/jtz18/llama-7b-qlora-sutd-qa-sonnet-mistral/commit/19aee4d542f3824732c369374ee327517d2b5297', commit_message='Upload tokenizer', commit_description='', oid='19aee4d542f3824732c369374ee327517d2b5297', pr_url=None, pr_revision=None, pr_num=None)

### This concludes the second part of the assignment. Continue with the next part