<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [28]</a>'.</span>

This notebook is taken directly from https://github.com/tcapelle/llm_recipes/tree/main

# From Llama to Alpaca: Finetunning and LLM with Weights & Biases
In this notebooks you will learn how to finetune a pretrained LLama model on an Instruction dataset. We will use an updated version of the Alpaca dataset that, instead of davinci-003 (GPT3) generations uses GPT4 to get an even better instruction dataset! More details on the [official repo page](https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM#how-good-is-the-data)

> This notebook requires a A100/A10 GPU with at least 24GB of memory. You could tweak the params down and run on a T4 but it would take very long time

This notebooks has a companion project and [report](wandb.me/alpaca)

In [1]:
# !pip install wandb transformers trl datasets "protobuf==3.20.3" evaluate

## With Huggingface TRL

Let's grab the Alpaca (GPT-4 curated instructions and outputs) dataset:

In [2]:
# !wget https://raw.githubusercontent.com/Instruction-Tuning-with-GPT-4/GPT-4-LLM/main/data/alpaca_gpt4_data.json
# from uparse_benchmark import parse_benchmark
# from ..utilities.parse_benchmark import parse_benchmark
from utilities.parse_benchmark import parse_benchmark

benchmark = "MedQA"
benchmark_questions, benchmark_answers = parse_benchmark(benchmark)
print(benchmark_questions[0])
# print(benchmark_answers[0])

Loading Benchmark from MedQA-USMLE/US/train.jsonl
Benchmark contains 10178 questions, made up of 10178 with 5 options and 0 with non-5 options
A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient?
 (A) Ampicillin
 (B) Ceftriaxone
 (C) Ciprofloxacin
 (D) Doxycycline
 (E) Nitrofurantoin


In [3]:
# import json

# dataset_file = "alpaca_gpt4_data.json"

# with open(dataset_file, "r") as f:
#     alpaca = json.load(f)

In [4]:
import wandb
wandb.init(project="biollama_ft", # the project I am working on
           tags=["hf_sft"]) # the Hyperparameters I want to keep track of
# artifact = wandb.use_artifact('capecape/alpaca_ft/alpaca_gpt4_splitted:latest', type='dataset')
# artifact_dir = artifact.download()

[34m[1mwandb[0m: Currently logged in as: [33mnelectric[0m ([33mneelectric[0m). Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: wandb version 0.16.2 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


[34m[1mwandb[0m: Tracking run with wandb version 0.16.1


[34m[1mwandb[0m: Run data is saved locally in [35m[1m/home/service/BioLlama/wandb/run-20240123_001522-e1thks20[0m
[34m[1mwandb[0m: Run [1m`wandb offline`[0m to turn off syncing.


[34m[1mwandb[0m: Syncing run [33mblooming-sun-12[0m


[34m[1mwandb[0m: ⭐️ View project at [34m[4mhttps://wandb.ai/neelectric/biollama_ft[0m


[34m[1mwandb[0m: 🚀 View run at [34m[4mhttps://wandb.ai/neelectric/biollama_ft/runs/e1thks20[0m


In [5]:
import os
# print(artifact_dir)
artifact_dir = os.getcwd() + "/benchmarks/MedQA-USMLE/"
from datasets import load_dataset
medqa = load_dataset("json", data_dir=artifact_dir)

  from .autonotebook import tqdm as notebook_tqdm


In [6]:
medqa

DatasetDict({
    train: Dataset({
        features: ['question', 'answer', 'options', 'meta_info', 'answer_idx'],
        num_rows: 10178
    })
    validation: Dataset({
        features: ['question', 'answer', 'options', 'meta_info', 'answer_idx'],
        num_rows: 1272
    })
    test: Dataset({
        features: ['question', 'answer', 'options', 'meta_info', 'answer_idx'],
        num_rows: 1273
    })
})

Let's log the dataset also as a table so we can inspect it on the workspace.

In [7]:
train_dataset = medqa["train"]
eval_dataset = medqa["test"]
#print sizes
print(len(train_dataset))
print(len(eval_dataset))
# turn both of these into only half their size
# train_dataset = train_dataset.select(range(0, len(train_dataset)//2))
# eval_dataset = eval_dataset.select(range(0, len(eval_dataset)//2))

# print(len(train_dataset))
# print(len(eval_dataset))

10178
1273


In [8]:
def create_prompt(row):
    option_string = ""
    for option in row["options"].keys():
        option_string += "\n (" + option + ") " + row["options"][option]
    row["option_string"] = option_string
    return ("You are an excellently helpful AI assistant that answers biomedical questions. "
            "<QUESTION>{question} {option_string}</QUESTION>\n<ANSWER> ({answer_idx}) {answer}</ANSWER>").format_map(row)
create_prompt(train_dataset[0])

'You are an excellently helpful AI assistant that answers biomedical questions. <QUESTION>A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient? \n (A) Ampicillin\n (B) Ceftriaxone\n (C) Ciprofloxacin\n (D) Doxycycline\n (E) Nitrofurantoin</QUESTION>\n<ANSWER> (E) Nitrofurantoin</ANSWER>'

In [9]:
def return_prompt_no_answer(row):
    option_string = ""
    for option in row["options"].keys():
        option_string += "\n (" + option + ") " + row["options"][option]
    row["option_string"] = option_string
    return ("You are an excellently helpful AI assistant that answers biomedical questions. "
            "<QUESTION>{question} {option_string}</QUESTION>\n<ANSWER> ").format_map(row)

def create_prompt_no_answer(row):
    return {"text": return_prompt_no_answer(row)}
    
test_dataset = eval_dataset.map(create_prompt_no_answer)
test_dataset[0]["text"]

'You are an excellently helpful AI assistant that answers biomedical questions. <QUESTION>A junior orthopaedic surgery resident is completing a carpal tunnel repair with the department chairman as the attending physician. During the case, the resident inadvertently cuts a flexor tendon. The tendon is repaired without complication. The attending tells the resident that the patient will do fine, and there is no need to report this minor complication that will not harm the patient, as he does not want to make the patient worry unnecessarily. He tells the resident to leave this complication out of the operative report. Which of the following is the correct next action for the resident to take? \n (A) Disclose the error to the patient but leave it out of the operative report\n (B) Disclose the error to the patient and put it in the operative report\n (C) Tell the attending that he cannot fail to disclose this mistake\n (D) Report the physician to the ethics committee\n (E) Refuse to dictate

In [10]:
import torch
# from cti.transformers.transformers.src.transformers.models.auto import AutoModelForCausalLM, AutoTokenizer
from transformers import AutoModelForCausalLM, AutoTokenizer

In [11]:
model_id = 'meta-llama/Llama-2-7b-hf'

In [12]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    torch_dtype=torch.bfloat16,
    # load_in_4bit = True, # this causes "RuntimeError: only Tensors of floating point and complex dtype can require gradients"
)


Loading checkpoint shards:   0%|                                                                                                                                                                                       | 0/2 [00:00<?, ?it/s]


Loading checkpoint shards:  50%|███████████████████████████████████████████████████████████████████████████████████████▌                                                                                       | 1/2 [00:01<00:01,  1.85s/it]


Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.18s/it]


Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.28s/it]




Training the full models is expensive, but if you have a GPU that can fit the full model, you can skip this part. Let's just train the last 8 layers of the model (Llama2-7B has 32)

In [13]:
n_freeze = 15

# freeze layers (disable gradients)
for param in model.parameters(): param.requires_grad = False
for param in model.lm_head.parameters(): param.requires_grad = True
for param in model.model.layers[n_freeze].parameters(): param.requires_grad = True

In [14]:
# Just freeze embeddings for small memory decrease
model.model.embed_tokens.weight.requires_grad_(False);

In [15]:
def param_count(m):
    params = sum([p.numel() for p in m.parameters()])/1_000_000
    trainable_params = sum([p.numel() for p in m.parameters() if p.requires_grad])/1_000_000
    print(f"Total params: {params:.2f}M, Trainable: {trainable_params:.2f}M")
    return params, trainable_params

params, trainable_params = param_count(model)

Total params: 6738.42M, Trainable: 333.46M


In [16]:
import torch
print(torch.__version__)

2.1.2+cu121


In [17]:
# # !pip uninstall transformers -y
# !pip install transformers
# !pip install -i https://pip.repos.neuron.amazonaws.com transformers-neuronx

In [18]:
# from transformers import TrainingArguments
from trl import SFTTrainer

In [19]:
batch_size = 32

total_num_steps = 11_210 // batch_size
print(total_num_steps)
# print("changing total batch size down to 50 to save time")
# total_num_steps = 50

350


In [20]:
# from cti.transformers.transformers.src.transformers import TrainingArguments

In [21]:
# !pip uninstall torch_xla -y
# !pip install -i https://pip.repos.neuron.amazonaws.com torch-xla

In [22]:
from transformers import TrainingArguments
output_dir = "/home/service/BioLlama/utilities/finetuning/finetuned_models/"
training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size//4,
    bf16=True,
    learning_rate=2e-4,
    lr_scheduler_type="cosine",
    warmup_steps=total_num_steps // 10,
    # num_train_epochs=1,
    max_steps=total_num_steps,
    gradient_accumulation_steps=1,
    gradient_checkpointing=True,
    evaluation_strategy="steps",
    eval_steps=total_num_steps // 3,
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=1,
    save_strategy="epoch", #changed to epoch so we save every epoch i guess?
)

In [23]:
# from utils import LLMSampleCB, token_accuracy

trainer = SFTTrainer(
    model,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    packing=True,
    max_seq_length=1024,
    args=training_args,
    formatting_func=create_prompt,
    # compute_metrics=token_accuracy,
)



In [24]:
# trainer.add_callback(wandb_callback)

In [25]:
trainer.train()
wandb.finish()

You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...




Step,Training Loss,Validation Loss
116,0.9418,1.013588
232,0.9061,0.989663
348,0.9032,0.98629




Checkpoint destination directory /home/service/BioLlama/utilities/finetuning/finetuned_models/checkpoint-350 already exists and is non-empty.Saving will proceed but saved results may be invalid.


[34m[1mwandb[0m: - 0.004 MB of 0.004 MB uploaded

[34m[1mwandb[0m: \ 0.004 MB of 0.004 MB uploaded

[34m[1mwandb[0m: | 0.004 MB of 0.004 MB uploaded

[34m[1mwandb[0m: / 0.004 MB of 0.031 MB uploaded

[34m[1mwandb[0m: - 0.004 MB of 0.031 MB uploaded

[34m[1mwandb[0m: \ 0.031 MB of 0.031 MB uploaded

[34m[1mwandb[0m:                                                                                


[34m[1mwandb[0m: 
[34m[1mwandb[0m: Run history:
[34m[1mwandb[0m:                      eval/loss █▂▁
[34m[1mwandb[0m:                   eval/runtime ▁██
[34m[1mwandb[0m:        eval/samples_per_second █▁▁
[34m[1mwandb[0m:          eval/steps_per_second █▁▁
[34m[1mwandb[0m:                    train/epoch ▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
[34m[1mwandb[0m:              train/global_step ▁▁▁▁▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
[34m[1mwandb[0m:            train/learning_rate ▂▃▅▇██████▇▇▇▇▇▆▆▆▆▅▅▅▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁
[34m[1mwandb[0m:                     train/loss █▆▅▄▄▄▃▄▃▃▃▃▃▂▃▃▂▂▂▂▂▂▂▂▁▂▂▂▁▂▂▂▁▁▂▂▁▂▁▂
[34m[1mwandb[0m:               train/total_flos ▁
[34m[1mwandb[0m:               train/train_loss ▁
[34m[1mwandb[0m:            train/train_runtime ▁
[34m[1mwandb[0m: train/train_samples_per_second ▁
[34m[1mwandb[0m:   train/train_steps_per_second ▁
[34m[1mwandb[0m: 
[34m[1mwandb[0m: Run summary:
[34m[1mwandb[0m:                    

[34m[1mwandb[0m: 🚀 View run [33mblooming-sun-12[0m at: [34m[4mhttps://wandb.ai/neelectric/biollama_ft/runs/e1thks20[0m
[34m[1mwandb[0m: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)


[34m[1mwandb[0m: Find logs at: [35m[1m./wandb/run-20240123_001522-e1thks20/logs[0m


In [26]:
import os
print(os.path.abspath(output_dir))

/home/service/BioLlama/utilities/finetuning/finetuned_models


In [27]:
trainer.save_model(output_dir)
#print contents of output_dir
!ls -l $output_dir
#print full path of output_dir
# !pwd $output_dir

total 13163372
drwxrwxr-x 2 service service       4096 Jan 17 00:52 checkpoint-350
drwxrwxr-x 2 service service       4096 Jan 22 11:09 checkpoint-50
-rw-rw-r-- 1 service service        685 Jan 23 00:39 config.json
-rw-rw-r-- 1 service service        183 Jan 23 00:39 generation_config.json
drwxrwxr-x 2 service service       4096 Jan 23 00:15 logs
-rw-rw-r-- 1 service service 4938985352 Jan 23 00:39 model-00001-of-00003.safetensors
-rw-rw-r-- 1 service service 4947390880 Jan 23 00:39 model-00002-of-00003.safetensors
-rw-rw-r-- 1 service service 3590488816 Jan 23 00:40 model-00003-of-00003.safetensors
-rw-rw-r-- 1 service service      23950 Jan 23 00:40 model.safetensors.index.json
-rw-rw-r-- 1 service service        437 Jan 23 00:40 special_tokens_map.json
-rw-rw-r-- 1 service service        920 Jan 23 00:40 tokenizer_config.json
-rw-rw-r-- 1 service service    1842767 Jan 23 00:40 tokenizer.json
-rw-rw-r-- 1 service service     499723 Jan 23 00:40 tokenizer.model
-rw-rw-r

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [28]:
#there is a finetune of llama 2 7b hf in the foler finetuned_models
#load this local model here and use it to generate some text

print(output_dir)

from transformers import AutoModelForCausalLM, AutoTokenizer
new_tokenizer = AutoTokenizer.from_pretrained(output_dir)
new_model = AutoModelForCausalLM.from_pretrained(output_dir)
prompt = 'You are an excellently helpful AI assistant that answers biomedical questions. <QUESTION>A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient?</QUESTION>\n<ANSWER> '

input_ids = new_tokenizer.encode(prompt, return_tensors="pt")
# input_ids = new_tokenizer.encode(prompt, return_tensors="pt")

# print(input_ids)
# print(input_ids.shape)

output = new_model.generate(input_ids, max_new_tokens=35, do_sample=True, top_p=0.95, top_k=60)
print(new_tokenizer.decode(output[0], skip_special_tokens=True))

SyntaxError: invalid syntax. Perhaps you forgot a comma? (3336475183.py, line 17)