# Leveraging Gen AI for SAT Prep - Fine-Tuning

This notebook is based on the example here with a few modifications https://www.datacamp.com/tutorial/fine-tuning-llama-2. Steps here shows how to fine-tune a Llama model. 

In [None]:
%%capture
%pip install -U transformers 
%pip install -U datasets 
%pip install -U accelerate 
%pip install -U peft 
%pip install -U trl 
%pip install -U bitsandbytes 
%pip install -U wandb

In [2]:
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import (
    LoraConfig,
    PeftModel,
    prepare_model_for_kbit_training,
    get_peft_model,
)
import os, torch, wandb
from datasets import load_dataset
from trl import SFTTrainer, setup_chat_format

  from .autonotebook import tqdm as notebook_tqdm
2025-01-31 05:33:40.270598: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1738301620.288603    3948 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1738301620.294059    3948 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [3]:

wb_token = ""

wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune Llama 3 8B on word meanings', 
    job_type="training", 
    anonymous="allow"
)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: (1) Private W&B dashboard, no account required
[34m[1mwandb[0m: (2) Use an existing W&B account


[34m[1mwandb[0m: Enter your choice:  2


[34m[1mwandb[0m: You chose 'Use an existing W&B account'
[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

  ········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /home/ubuntu/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mramamoorthy-thamman[0m ([33mramamoorthy-thamman-test[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


In [37]:
from datasets import Dataset
import pandas as pd
from random import randrange, sample
test_cases_df = pd.read_csv('eval_word_genre.csv')
def gen():
    for index, row in test_cases_df.iterrows():
        word = row['word'].lower()
        definition = row['definition'].lower()
        yield {"instruction": "The meaning of the word {}".format(word), "text": "the meaning is {}".format(definition)}
dataset = Dataset.from_generator(gen)

Generating train split: 100 examples [00:00, 12875.84 examples/s]


In [38]:
print(f"Dataset size: {len(dataset)}")
print(dataset[randrange(len(dataset))])

#Reduce dataset to size N
n_samples = sample(range(len(dataset)), k=100)
print(f"First 5 samples: {n_samples[:5]}")
dataset_temp = dataset.select(n_samples)
print(f"Reduced dataset size: {len(dataset_temp)}")

Dataset size: 100
{'instruction': 'The meaning of the word impasse', 'text': 'the meaning is  a situation in which no progress is possible; a deadlock. '}
First 5 samples: [83, 3, 36, 62, 40]
Reduced dataset size: 100


In [39]:
dataset = dataset.train_test_split(test_size=0.3)

In [40]:
def format_instruction(sample):
	return f"""    
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{sample['instruction']}

### Response:
{sample['output']}
"""

In [12]:
torch_dtype = torch.float16
attn_implementation = "eager"

model_id="meta-llama/Meta-Llama-3-8B-Instruct"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
cache_dir="/home/ubuntu/Pragyan/model_cache"
new_model = "llama-3-8b-finetuned"

In [7]:
# QLoRA config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch_dtype,
    bnb_4bit_use_double_quant=True,
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    cache_dir = cache_dir,
    attn_implementation=attn_implementation
)

Loading checkpoint shards: 100%|██████████| 4/4 [00:11<00:00,  2.91s/it]


In [9]:
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id)

In [10]:
# LoRA config
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']
)
model = get_peft_model(model, peft_config)

In [13]:
training_arguments = TrainingArguments(
    output_dir=new_model,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=2,
    optim="paged_adamw_32bit",
    num_train_epochs=1,
    evaluation_strategy="steps",
    eval_steps=0.2,
    logging_steps=1,
    warmup_steps=10,
    logging_strategy="steps",
    learning_rate=2e-4,
    fp16=False,
    bf16=False,
    group_by_length=True,
    report_to="wandb"
)



In [44]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["test"],
    peft_config=peft_config,
    tokenizer=tokenizer,
    args=training_arguments
)

  trainer = SFTTrainer(
Map: 100%|██████████| 70/70 [00:00<00:00, 7529.40 examples/s]
Map: 100%|██████████| 30/30 [00:00<00:00, 5133.16 examples/s]


In [46]:
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
trainer.train()

Step,Training Loss,Validation Loss
7,3.098,2.582612
14,1.5021,1.835764
21,1.5071,1.648946
28,2.2118,1.601201
35,1.5717,1.53913


Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.


TrainOutput(global_step=35, training_loss=2.0852995361600604, metrics={'train_runtime': 35.9536, 'train_samples_per_second': 1.947, 'train_steps_per_second': 0.973, 'total_flos': 51077200674816.0, 'train_loss': 2.0852995361600604, 'epoch': 1.0})

In [47]:
wandb.finish()
model.config.use_cache = True

0,1
eval/loss,█▃▂▁▁
eval/runtime,█▁▄▇▆
eval/samples_per_second,▁█▅▂▃
eval/steps_per_second,▁█▅▂▃
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇████
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇████
train/grad_norm,▁▁▁▅▇█▇▅▄▄▃▆▄▄▅▃▅▃▂▂▄▃▃▂▂▂▂▃▁▂▂▇▁▂▂
train/learning_rate,▂▂▃▄▅▅▆▇▇██▇▇▇▇▆▆▆▅▅▅▅▄▄▄▄▃▃▃▂▂▂▂▁▁
train/loss,█▇▆██▆▆▅▄▃▂▃▂▂▃▂▃▂▁▄▂▃▂▂▃▂▂▄▂▂▂▂▂▄▂

0,1
eval/loss,1.53913
eval/runtime,2.874
eval/samples_per_second,10.439
eval/steps_per_second,10.439
total_flos,51077200674816.0
train/epoch,1.0
train/global_step,35.0
train/grad_norm,5.81907
train/learning_rate,0.0
train/loss,1.5717


In [50]:
messages = [
    {
        "role": "user",
        "content": "Hello doctor, I have bad acne. How do I get rid of it?"
    }
]

prompt = "Generate a paragraph on Chinese pagodas with the word assiduous."

inputs = tokenizer(prompt, return_tensors='pt', padding=True, 
                   truncation=True).to("cuda")

outputs = model.generate(**inputs, max_length=150, 
                         num_return_sequences=1)

text = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(text)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Generate a paragraph on Chinese pagodas with the word assiduous.  The assiduous efforts of the ancient Chinese in constructing these magnificent structures have resulted in the creation of numerous pagodas that stand as testaments to their ingenuity.  The intricate carvings, ornate decorations, and sturdy architecture of these structures have captivated the imagination of many.  The pagodas, often towering above the surrounding landscape, are a symbol of the country's rich cultural heritage and its people's assiduous dedication to preserving their traditions.  From the majestic Temple of Heaven in Beijing to the ancient Shaolin Temple in Henan, Chinese pagodas have become iconic representations of the country's history and spiritual practices.  The


In [51]:
trainer.model.save_pretrained(new_model)