<a href="https://colab.research.google.com/github/Tobai24/FineTuning/blob/main/finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Load the foundation model from unsloth

In [1]:
%%capture
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps xformers trl peft accelerate bitsandbytes

In [2]:
import json
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [3]:
max_seq_length = 2048
dtype = None
load_in_4bit = True

In [4]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

==((====))==  Unsloth 2025.3.18: Fast Llama patching. Transformers: 4.49.0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/198 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.6k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

In [5]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

Unsloth 2025.3.18 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


### Load the dataset and preprocess it for training

In [6]:
import pandas as pd

In [7]:
df = pd.read_json("train.jsonl", lines=True)

In [8]:
df.head()

Unnamed: 0,question,answer,options,meta_info,answer_idx
0,A 23-year-old pregnant woman at 22 weeks gesta...,Nitrofurantoin,"{'A': 'Ampicillin', 'B': 'Ceftriaxone', 'C': '...",step2&3,E
1,A 3-month-old baby died suddenly at night whil...,Placing the infant in a supine position on a f...,{'A': 'Placing the infant in a supine position...,step2&3,A
2,A mother brings her 3-week-old infant to the p...,Abnormal migration of ventral pancreatic bud,{'A': 'Abnormal migration of ventral pancreati...,step1,A
3,A pulmonary autopsy specimen from a 58-year-ol...,Thromboembolism,"{'A': 'Thromboembolism', 'B': 'Pulmonary ische...",step1,A
4,A 20-year-old woman presents with menorrhagia ...,Von Willebrand disease,"{'A': 'Factor V Leiden', 'B': 'Hemophilia A', ...",step1,E


In [9]:
df["question"][0]

'A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient?'

In [10]:
df["options"][0]

{'A': 'Ampicillin',
 'B': 'Ceftriaxone',
 'C': 'Ciprofloxacin',
 'D': 'Doxycycline',
 'E': 'Nitrofurantoin'}

In [11]:
df["answer_idx"][0]

'E'

In [12]:
df[0:2].to_dict(orient="records")

[{'question': 'A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient?',
  'answer': 'Nitrofurantoin',
  'options': {'A': 'Ampicillin',
   'B': 'Ceftriaxone',
   'C': 'Ciprofloxacin',
   'D': 'Doxycycline',
   'E': 'Nitrofurantoin'},
  'meta_info': 'step2&3',
  'answer_idx': 'E'},
 {'question': 'A 3-month-old baby died suddenly at night while asleep. His mother noticed that he had died only after she awoke in the morning. No cause of death was determined based on the 

#### Preprocessing the dataset

In [13]:
EOS_TOKEN = tokenizer.eos_token

def format_for_finetuning(data):
    instruction = "You are a helpful AI assistant that answers medical multiple-choice questions."
    texts = []

    for item in data:
        input_text = item["question"]
        output_text = item["answer"]

        # Format multiple-choice options
        options_text = "\n".join([f"{key}: {value}" for key, value in item["options"].items()])

        # Define the chat prompt format
        formatted_text = (
            f"### Instruction:\n{instruction}\n\n"
            f"### Input:\n{input_text}\n\n"
            f"Options:\n{options_text}\n\n"
            f"### Response:\n{output_text}{EOS_TOKEN}"
        )

        texts.append(formatted_text)

    return {"text": texts}

In [14]:
new_df = format_for_finetuning(df.to_dict(orient="records"))

In [15]:
print(new_df["text"][3])

### Instruction:
You are a helpful AI assistant that answers medical multiple-choice questions.

### Input:
A pulmonary autopsy specimen from a 58-year-old woman who died of acute hypoxic respiratory failure was examined. She had recently undergone surgery for a fractured femur 3 months ago. Initial hospital course was uncomplicated, and she was discharged to a rehab facility in good health. Shortly after discharge home from rehab, she developed sudden shortness of breath and had cardiac arrest. Resuscitation was unsuccessful. On histological examination of lung tissue, fibrous connective tissue around the lumen of the pulmonary artery is observed. Which of the following is the most likely pathogenesis for the present findings?

Options:
A: Thromboembolism
B: Pulmonary ischemia
C: Pulmonary hypertension
D: Pulmonary passive congestion
E: Pulmonary hemorrhage

### Response:
Thromboembolism<|end_of_text|>


### Configure the training parameter

In [16]:
from datasets import Dataset
hf_dataset = Dataset.from_dict(new_df)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=hf_dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    packing=True,
    args=TrainingArguments(
        per_device_train_batch_size=4,
        gradient_accumulation_steps=8,
        warmup_steps=10,
        max_steps=80,
        learning_rate=1e-4,
        fp16=not torch.cuda.is_bf16_supported(),
        bf16=torch.cuda.is_bf16_supported(),
        logging_steps=2,
        optim="adamw_torch",
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=3407,
        output_dir="outputs",
        save_strategy="epoch",
        save_total_limit=2,
    ),
)


Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/10178 [00:00<?, ? examples/s]

Unsloth: Hugging Face's packing is currently buggy - we're disabling it for now!


In [17]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 10,178 | Num Epochs = 1 | Total steps = 80
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 8 x 1) = 32
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33moluwatobiiyanuoluwa24[0m ([33moluwatobiiyanuoluwa24-freelance[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
2,1.8797
4,1.8643
6,1.7764
8,1.6932
10,1.5915
12,1.5026
14,1.3635
16,1.2893
18,1.2965
20,1.2673


### Make Inference

In [18]:
FastLanguageModel.for_inference(model)

import torch

def generate_response(question, options, instruction="You are a helpful AI assistant that answers medical multiple-choice questions."):
    options_text = "\n".join([f"{key}: {value}" for key, value in options.items()])
    formatted_input = f"""
### Instruction:
{instruction}

### Input:
{question}

{options_text}

### Response:
"""

    # Tokenize input
    inputs = tokenizer(formatted_input, return_tensors="pt").to("cuda" if torch.cuda.is_available() else "cpu")

    # Generate response
    with torch.no_grad():
        output = model.generate(
            **inputs,
            max_length=512,
            temperature=0.7,
            top_p=0.9,
            repetition_penalty=1.1,
            eos_token_id=tokenizer.eos_token_id
        )


    response = tokenizer.decode(output[0], skip_special_tokens=True)

    answer_start = response.find("### Response:") + len("### Response:\n")
    return response[answer_start:].strip()

In [19]:
question = "A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient?"

options = {'A': 'Ampicillin',
   'B': 'Ceftriaxone',
   'C': 'Ciprofloxacin',
   'D': 'Doxycycline',
   'E': 'Nitrofurantoin'}

response = generate_response(question, options)
print("Model's Answer:", response)

Model's Answer: Nitrofurantoin
