# Assignment 4: Instruction finetuning a Llama-2 7B model - part 2
**Assignment due 19 April 11:59pm**

Welcome to the fourth assignment for 50.055 Machine Learning Operations. These assignments give you a chance to practice the methods and tools you have learned. 

**This assignment is a group assignment.**

- Read the instructions in this notebook carefully
- Add your solution code and answers in the appropriate places. The questions are marked as **QUESTION:**, the places where you need to add your code and text answers are marked as **ADD YOUR SOLUTION HERE**
- The completed notebook, including your added code and generated output will be your submission for the assignment.
- The notebook should execute without errors from start to finish when you select "Restart Kernel and Run All Cells..". Please test this before submission.
- Use the SUTD Education Cluster to solve and test the assignment.

**Rubric for assessment** 

Your submission will be graded using the following criteria. 
1. Code executes: your code should execute without errors. The SUTD Education cluster should be used to ensure the same execution environment.
2. Correctness: the code should produce the correct result or the text answer should state the factual correct answer.
3. Style: your code should be written in a way that is clean and efficient. Your text answers should be relevant, concise and easy to understand.
4. Partial marks will be awarded for partially correct solutions.
5. There is a maximum of 200 (80 + 40 + 80) points for this assignment.

**ChatGPT policy** 

If you use AI tools, such as ChatGPT, to solve the assignment questions, you need to be transparent about its use and mark AI-generated content as such. In particular, you should include the following in addition to your final answer:
- A copy or screenshot of the prompt you used
- The name of the AI model
- The AI generated output
- An explanation why the answer is correct or what you had to change to arrive at the correct answer

**Assignment Notes:** Please make sure to save the notebook as you go along. Submission Instructions are located at the bottom of the notebook.




### Step 2: finetune model
The second step of the assignment finetune an LLM model using the synthetic question-answer pairs from step 1. 
We will use LoRA and Huggingface. It can be tricky to make finetuning on small GPUs. You can take the code samples from the labs as a starting pint. 
There are many blogs and guides on the internet you can consult, too. 

In [1]:
# Installing required packages
!pip install -U -q peft==0.6.2 transformers==4.35.2 datasets==2.15.0 bitsandbytes==0.41.2.post2 trl==0.7.4 accelerate==0.24.1 scipy==1.12.0 wandb==0.16.5 coloredlogs==15.0.1

In [2]:
# Load required packages

from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments, pipeline
from datasets import load_dataset, Dataset
from peft import LoraConfig, PeftModel
from trl import SFTTrainer
import torch

import pickle


  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# load SUTD QA dataset from step 1
with open('sutd_qa_dataset.pkl', 'rb') as f:
    sutd_qa_dataset = pickle.load(f)

In [4]:
# split data into traing and test set, 160 instances for train, rest for test
sutd_qa_dataset = sutd_qa_dataset.train_test_split(train_size=160, shuffle=False)

In [5]:
# check schema and number of instances
sutd_qa_dataset

DatasetDict({
    train: Dataset({
        features: ['question', 'answer'],
        num_rows: 160
    })
    test: Dataset({
        features: ['question', 'answer'],
        num_rows: 40
    })
})

In [6]:
# inspect first instance
sutd_qa_dataset["train"][0]

{'question': 'How can I get involved in community engagement initiatives at SUTD?',
 'answer': ' As an ambassador-at-large at the MFA and a Professor at SUTD, you can collaborate with the Venture, Innovation, and Entrepreneurship (VIE) Office to engage in community initiatives. They provide support for alumni, students, researchers, and mid-career aspiring entrepreneurs to turn their ideas into reality. You can also participate in curated entrepreneurship programs, entrepreneurship capstone projects, incubation support, and mentorship opportunities. Additionally, you can explore collaborations with strategic partners, social venture building programs, and research commercialization initiatives to make a positive impact on the community.'}

In [7]:
# QUESTION: create a formating function 'formatting_func' which takes an example from your QA dataset as input and outputs 
# a dictionary with the key "text" and as value a text prompt with the following format:
# ### USER: {question from example goes here}
# ### ASSISTANT: {answer from example goes here}


#--- ADD YOUR SOLUTION HERE (10 points)---
def formatting_func(example):
    formatted_text = f"### USER: {example['question']}\n### ASSISTANT: {example['answer']}"
    return {"text": formatted_text}


#----------------------------------------


In [8]:
# apply formatting function to data set
formatted_dataset = sutd_qa_dataset.map(formatting_func)

Map: 100%|██████████| 160/160 [00:00<00:00, 10173.56 examples/s]
Map: 100%|██████████| 40/40 [00:00<00:00, 6987.30 examples/s]


In [9]:
# check formatted prompt
formatted_dataset["train"]["text"][0]

# Note: you should see something like this (not necessary the same prompt but same format)
# '### USER: What are some of the best places to eat near the SUTD campus?\n### ASSISTANT: There are several great dining options near the SUTD campus. 
# One popular spot is the Changi Business Park Food Court, ...


'### USER: How can I get involved in community engagement initiatives at SUTD?\n### ASSISTANT:  As an ambassador-at-large at the MFA and a Professor at SUTD, you can collaborate with the Venture, Innovation, and Entrepreneurship (VIE) Office to engage in community initiatives. They provide support for alumni, students, researchers, and mid-career aspiring entrepreneurs to turn their ideas into reality. You can also participate in curated entrepreneurship programs, entrepreneurship capstone projects, incubation support, and mentorship opportunities. Additionally, you can explore collaborations with strategic partners, social venture building programs, and research commercialization initiatives to make a positive impact on the community.'

In [10]:
# model id of base model
model_id = "NousResearch/Llama-2-7b-hf"

# model id for our finetuned model
new_model = "llama-7b-qlora-sutd-qa"

# config for model quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    use_nested_quant = False
)

# Load the entire model on the GPU 0
device_map = {"": 0}


In [11]:
# Load model

# QUESTION: load the base LLM into a variable 'model' using the HF AutoModelForCausalLM class with the given quantization. Load all weights to the GPU

#--- ADD YOUR SOLUTION HERE (10 points)---
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config, device_map=device_map)
model.config.use_cache = False
model.config.pretraining_tp = 1


#------------------------------------------

Loading checkpoint shards: 100%|██████████| 2/2 [00:54<00:00, 27.49s/it]


In [12]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"



In [13]:
# Apply lora configuration
lora_config = LoraConfig(
    lora_alpha=8,
    r=8,
    lora_dropout=0.1,
    bias="none",
    task_type="CAUSAL_LM",
)

In [14]:

# QUESTION: Now it is time to configure the training parameters. 
# To make it easier for your, the list of parameters is given.
# Find reasonable values for the parameters, at least something that make the training run without crashing. 
# You can refer to the lab exercises and to open source examples on the internet

# list of parameters, some with pre-set values, others you need to set yourself:
# output_dir = "./results"
# per_device_train_batch_size  
# gradient_accumulation_steps 
# optim
# save_steps = 10
# logging_steps = 10
# learning_rate
# weight_decay
# max_grad_norm
# num_train_epochs
# warmup_ratio
# lr_scheduler_type
# packing
# max_seq_length

output_dir = "./results"
per_device_train_batch_size = 2
gradient_accumulation_steps = 1
optim = "paged_adamw_32bit"
save_steps = 10
logging_steps = 10
learning_rate = 2e-4
weight_decay = 0.001
max_grad_norm = 0.3
num_train_epochs = 1
warmup_ratio = 0.03
lr_scheduler_type = "cosine"
packing = False
max_seq_length = None



#--- ADD YOUR SOLUTION HERE (20 points)---
training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    max_grad_norm=max_grad_norm,
    num_train_epochs=num_train_epochs,
    warmup_ratio=warmup_ratio,
    lr_scheduler_type=lr_scheduler_type,
    report_to="none"
    
)
#------------------------------------------

In [15]:
# configure trainer 

trainer = SFTTrainer(
    model=model,
    args=training_arguments,
    train_dataset=formatted_dataset["train"],
    eval_dataset=formatted_dataset["test"],
    peft_config=lora_config,
    dataset_text_field="text",
    packing=packing,
    tokenizer=tokenizer,
    max_seq_length=max_seq_length,
)

Map: 100%|██████████| 160/160 [00:00<00:00, 6945.79 examples/s]
Map: 100%|██████████| 40/40 [00:00<00:00, 3921.10 examples/s]


In [16]:
# now finetune the model!
trainer.train()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,1.8319
20,1.7215
30,1.1617
40,1.0376
50,1.2272
60,1.1389
70,0.9956
80,0.9707


TrainOutput(global_step=80, training_loss=1.2606438517570495, metrics={'train_runtime': 81.8865, 'train_samples_per_second': 1.954, 'train_steps_per_second': 0.977, 'total_flos': 841701669519360.0, 'train_loss': 1.2606438517570495, 'epoch': 1.0})

In [17]:
# Save trained model
trainer.model.save_pretrained(new_model)

In [18]:
#evaluate and return the metrics
trainer.evaluate()

{'eval_loss': 0.9710914492607117,
 'eval_runtime': 6.9998,
 'eval_samples_per_second': 5.714,
 'eval_steps_per_second': 0.714,
 'epoch': 1.0}

In [19]:
# Empty VRAM
# Note: this did not unload everything from the GPU, maybe you can find a way to fix this
# As a workaorund you can restart your kernel to clear the GPU, then run the below cells
# https://stackoverflow.com/questions/69357881/how-to-remove-the-model-of-transformers-in-gpu-memory

import gc

del trainer
del model
del tokenizer

gc.collect()
torch.cuda.empty_cache()

In [20]:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer
from peft import get_peft_model, LoraConfig, PeftModel
import transformers
import torch


# model id of base model (repeat in case of kernel restart)
model_id = "NousResearch/Llama-2-7b-hf"

# model id for our finetuned model (repeat in case of kernel restart)
new_model = "llama-7b-qlora-sutd-qa"

# Load the entire model on the GPU 0 (repeat in case of kernel restart)
device_map = {"": 0}


# Reload model in FP16 and merge it with LoRA weights
base_model = AutoModelForCausalLM.from_pretrained(
    model_id,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map
)
    
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Loading checkpoint shards: 100%|██████████| 2/2 [00:06<00:00,  3.38s/it]


In [21]:
from huggingface_hub import login

# log in to huggingface, you need to put your huggingface access token
# https://huggingface.co/docs/hub/en/security-tokens

hf_access_token = ""
login(token=hf_access_token)

Token has not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to /home/jovyan/.cache/huggingface/token
Login successful


In [22]:
# push finetuned model to huggingface
model.push_to_hub(new_model, use_temp_dir=False)



Thrown during validation:
`do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.


model-00003-of-00003.safetensors:   0%|          | 0.00/3.59G [00:00<?, ?B/s][A[A
Upload 3 LFS files:   0%|          | 0/3 [00:00<?, ?it/s][A


model-00003-of-00003.safetensors:   0%|          | 1.54M/3.59G [00:00<04:03, 14.7MB/s]


model-00001-of-00003.safetensors:   0%|          | 1.28M/4.94G [00:00<06:42, 12.3MB/s][A[A[A

model-00002-of-00003.safetensors:   0%|          | 3.85M/4.95G [00:00<03:04, 26.8MB/s][A[A

model-00003-of-00003.safetensors:   0%|          | 3.85M/3.59G [00:00<15:56, 3.75MB/s][A[A


model-00003-of-00003.safetensors:   0%|          | 4.41M/3.59G [00:01<16:37, 3.60MB/s][A[A[A

model-00003-of-00003.safetensors:   0%|          | 5.01M/3.59G [00:01<15:42, 3.80MB/s][A[A


model-00001-of-00003.safetensors:   0%|          | 4.95M/4.94G [00:01<1

CommitInfo(commit_url='https://huggingface.co/anirhc/llama-7b-qlora-sutd-qa/commit/7f44f47d702c35c982423dfd2b429f18550c16b2', commit_message='Upload LlamaForCausalLM', commit_description='', oid='7f44f47d702c35c982423dfd2b429f18550c16b2', pr_url=None, pr_revision=None, pr_num=None)

In [23]:
# push tokenizer to huggingface
tokenizer.push_to_hub(new_model, use_temp_dir=False)

CommitInfo(commit_url='https://huggingface.co/anirhc/llama-7b-qlora-sutd-qa/commit/28ca48cd3f2ae62dda777b9c5b3e1294908a659f', commit_message='Upload tokenizer', commit_description='', oid='28ca48cd3f2ae62dda777b9c5b3e1294908a659f', pr_url=None, pr_revision=None, pr_num=None)

### This concludes the second part of the assignment. Continue with the next part