<a href="https://colab.research.google.com/github/changyuhsin1999/Fine-tuned-llama-MedQA/blob/main/project_1_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Install libraries

In [1]:
!pip install transformers datasets peft accelerate bitsandbytes trl safetensors torch --no-cache

Collecting datasets
  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m13.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting peft
  Downloading peft-0.9.0-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.9/190.9 kB[0m [31m281.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting accelerate
  Downloading accelerate-0.28.0-py3-none-any.whl (290 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.1/290.1 kB[0m [31m207.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting bitsandbytes
  Downloading bitsandbytes-0.43.0-py3-none-manylinux_2_24_x86_64.whl (102.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.2/102.2 MB[0m [31m252.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting trl
  Downloading trl-0.7.11-py3-none-any.whl (155 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m155.3/155.3 kB[0m [31m

## Load dataset and do train split

In [2]:
from datasets import load_dataset
from random import randrange

# Load dataset from the hub
dataset = load_dataset("medalpaca/medical_meadow_medqa", split="train")

print(f"Dataset Size: {len(dataset)}")
print(dataset[randrange(len(dataset))])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/1.77k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/10.7M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset Size: 10178
{'input': "Q:A 30-year-old man presents with fever, malaise, and severe pain in his right wrist and left knee for the last 2 days. He describes the pain as 8/10 in intensity, sharp in character, and extending from his right wrist to his fingers. He denies any recent inciting trauma or similar symptoms in the past. His past medical history is unremarkable. He is sexually active with multiple partners and uses condoms inconsistently. The vital signs include blood pressure 120/70 mm Hg, pulse 100/min, and temperature 38.3°C (101.0°F). On physical examination, the right wrist and left knee joints are erythematous, warm, and extremely tender to palpation. Both joints have a significantly restricted range of motion. A petechial rash is noted on the right forearm. An arthrocentesis is performed on the left knee joint. Which of the following would be the most likely finding in this patient?? \n{'A': 'Arthrocentesis aspirate showing gram-positive cocci in clusters', 'B': 'Ar

## Prompting

In [3]:
def format_prompt(sample):
    return f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
please give the response in a simple format of A: answer and do not give the explanation.
### Instruction:
{sample["instruction"]}

### Input:
{sample["input"]}

### Response:
{sample["output"]}
"""

In [4]:
from random import randrange

print(format_prompt(dataset[randrange(len(dataset))]))


Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
please give the response in a simple format of A: answer and do not give the explanation.
### Instruction:
Please answer with one of the option in the bracket

### Input:
Q:A 19-year-old man is seen by his primary care physician. The patient has a history of excessive daytime sleepiness going back several years. He has begun experiencing episodes in which his knees become weak and he drops to the floor when he laughs. He has a history of marijuana use. His family history is notable for hypertension and cardiac disease. His primary care physician refers him for a sleep study, and which confirms your suspected diagnosis.

Which of the following is the best first-line pharmacological treatment for this patient?? 
{'A': 'Dextroamphetamine', 'B': 'Lisdexamfetamine', 'C': 'Methylphenidate', 'D': 'Zolpidem', 'E': 'Modafinil'},

### Resp

In [5]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [6]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# Hugging Face model name
model_name = "meta-llama/Llama-2-7b-chat-hf"
use_flash_attention = False

# BitsAndBytesConfig int-4 config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    use_cache=False,
    use_flash_attention_2=use_flash_attention,
    device_map="auto",
    torch_dtype=torch.float16
)

model.config.pretraining_tp = 1

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [7]:
from datasets import load_dataset
from random import randrange

# Load dataset from the hub
dataset = load_dataset("medalpaca/medical_meadow_medqa", split="train")
sample = dataset[randrange(len(dataset))]

prompt = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
please give the response in a format of A: answer

### Instruction:
{sample["instruction"]}

### Input:
{sample["input"]}

### Response:

"""

input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
outputs = model.generate(input_ids=input_ids, max_new_tokens=512, do_sample=True, top_p=0.6,temperature=0.9)

print(f"Instruction:\n{sample['instruction']}\n")
print(f"Input:\n{sample['input']}\n")
print(f"Generated Response:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}\n")
print(f"Ground Truth:\n{sample['output']}")

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


Instruction:
Please answer with one of the option in the bracket

Input:
Q:A 5-month-old boy is brought to the physician by his mother because of poor weight gain and chronic diarrhea. He has had 3 episodes of otitis media since birth. Pregnancy and delivery were uncomplicated but his mother received no prenatal care. His immunizations are up-to-date. He is at the 10th percentile for height and 5th percentile for weight. Physical examination shows thick white plaques on the surface of his tongue that can be easily scraped off with a tongue blade. Administration of which of the following is most likely to have prevented this patient's condition?? 
{'A': 'Fluconazole', 'B': 'Pencillin G', 'C': 'Zidovudine', 'D': 'Rifampin', 'E': 'Ganciclovir'},

Generated Response:
A: Fluconazole

Ground Truth:
C: Zidovudine


In [8]:
# Initialize counters
correct_predictions = 0
total_predictions = 0

# Define the number of samples to evaluate
num_samples = 100  # Adjust this number based on your needs and computational resources

# Loop through the dataset
for _ in range(num_samples):
    # Randomly select a sample from the dataset
    sample = dataset[randrange(len(dataset))]

    # Construct the prompt
    prompt = f"""
    Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
    please give the response in a simple format of A: answer and do not give the explanation.

    ### Instruction:
    {sample["instruction"]}

    ### Input:
    {sample["input"]}

    ### Response:

    """

    # Generate the response
    input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
    outputs = model.generate(input_ids=input_ids, max_new_tokens=512, do_sample=True, top_p=0.6, temperature=0.9)
    generated_response = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):].strip()
    normalized_generated_response = generated_response.strip().lower()
    normalized_ground_truth = sample['output'].strip().lower()

    # Check if the normalized responses match
    if normalized_generated_response == normalized_ground_truth:
        correct_predictions += 1
    total_predictions += 1

# Calculate and print the accuracy
accuracy = correct_predictions / total_predictions
print(f"Accuracy: {accuracy*100:.2f}%")

Accuracy: 8.00%


In [10]:
from peft import LoraConfig, prepare_model_for_kbit_training, get_peft_model

# LoRA config based on QLoRA paper
peft_config = LoraConfig(
    lora_alpha=32,
    lora_dropout=0.1,
    r=16,
    bias="none",
    task_type="CAUSAL_LM",
)
# Prepare model for training
model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

In [11]:
from transformers import TrainingArguments

args = TrainingArguments(
    output_dir="finetuned-llama-7b-chat-hf-with-medQA",
    num_train_epochs=1,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=2,
    gradient_checkpointing=True,
    optim="paged_adamw_32bit",
    logging_steps=10,
    save_strategy="epoch",
    learning_rate=2e-4,
    fp16=True,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    lr_scheduler_type="constant",
    disable_tqdm=False
)

In [12]:
from trl import SFTTrainer

max_seq_length = 1024 # max sequence length for model and packing of the dataset

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    packing=True,
    formatting_func=format_prompt,
    args=args,
)

Generating train split: 0 examples [00:00, ? examples/s]

dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


In [None]:
# Train
trainer.train()

# Save model
trainer.save_model()



Step,Training Loss
10,1.4188
20,1.1397
30,1.0003
40,0.9428
50,0.9281
60,0.9106
70,0.9056
80,0.8863
90,0.9042
100,0.8998


Step,Training Loss
10,1.4188
20,1.1397
30,1.0003
40,0.9428
50,0.9281
60,0.9106
70,0.9056
80,0.8863
90,0.9042
100,0.8998


In [14]:
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer

# Load finetuned LLM model and tokenizer
model_trained = AutoPeftModelForCausalLM.from_pretrained(
    args.output_dir,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    load_in_4bit=True,
)
tokenizer = AutoTokenizer.from_pretrained(args.output_dir)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [22]:
from datasets import load_dataset
from random import randrange

# Load dataset from the hub
dataset = load_dataset("medalpaca/medical_meadow_medqa", split="train")
sample = dataset[randrange(len(dataset))]

prompt = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
please give the response in a format of A: answer

### Instruction:
{sample["instruction"]}

### Input:
{sample["input"]}

### Response:

"""

input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
outputs = model_trained.generate(input_ids=input_ids, max_new_tokens=512, do_sample=True, top_p=0.6,temperature=0.9)

print(f"Instruction:\n{sample['instruction']}\n")
print(f"Input:\n{sample['input']}\n")
print(f"Generated Response:\n{tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]}\n")
print(f"Ground Truth:\n{sample['output']}")

Instruction:
Please answer with one of the option in the bracket

Input:
Q:A 27 year-old-male presents to the Emergency Room as a code trauma after being shot in the neck. En route, the patient’s blood pressure is 127/73 mmHg, pulse is 91/min, respirations are 14/min, and oxygen saturation is 100% on room air with GCS of 15. On physical exam, the patient is in no acute distress; however, there is an obvious entry point with oozing blood near the left lateral neck above the cricoid cartilage with a small hematoma that is non-pulsatile and stable since arrival. The rest of the physical exam is unremarkable. Rapid hemoglobin returns back at 14.1 g/dL. After initial resuscitation, what is the next best step in management?? 
{'A': 'MRI', 'B': 'Plain radiography films', 'C': 'Conventional angiography', 'D': 'CT angiography', 'E': 'Bedside neck exploration'},

Generated Response:
D: CT angiography


Ground Truth:
D: CT angiography


In [46]:

# Initialize counters
correct_predictions = 0
total_predictions = 0

# Define the number of samples to evaluate
num_samples = 100  # Adjust this number based on your needs and computational resources

# Loop through the dataset
for _ in range(num_samples):
    # Randomly select a sample from the dataset
    sample = dataset[randrange(len(dataset))]

    # Construct the prompt
    prompt = f"""
    Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
    please give the response in a simple format of A: answer and do not give any explanation. please do not add special characters or extra spaces in the response.

    ### Instruction:
    {sample["instruction"]}

    ### Input:
    {sample["input"]}

    ### Response:

    """

    # Generate the response
    input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids.cuda()
    outputs = model_trained.generate(input_ids=input_ids, max_new_tokens=512, do_sample=True, top_p=0.6, temperature=0.9)
    generated_response = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]
    normalized_generated_response = generated_response.strip().lower()
    normalized_ground_truth = sample['output'].strip().lower()

    # Debugging print statements
    #print(f"Normalized Generated Response: {normalized_generated_response}")
    #print(f"Normalized Ground Truth: {normalized_ground_truth}")

    # Check if the normalized responses match
    if normalized_generated_response == normalized_ground_truth:
        correct_predictions += 1
    total_predictions += 1

# Calculate and print the accuracy
accuracy = correct_predictions / total_predictions
print(f"Accuracy: {accuracy*100:.2f}%")

Accuracy: 20.00%
