<a href="https://colab.research.google.com/github/Prateek1727/medico-bot/blob/main/Medical_Chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
!pip install triton==2.0.0

In [2]:
%%capture
!pip install torch --upgrade --force-reinstall

In [3]:
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from unsloth import is_bfloat16_supported
from huggingface_hub import login
from transformers import TrainingArguments
from datasets import load_dataset
import wandb
from google.colab import userdata

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [4]:
hugging_face_token = userdata.get("Hugging_Face_Token")
wnb_token = userdata.get("wnb")

login(hugging_face_token)

wandb.login(key=wnb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical Dataset',
    job_type="training",
    anonymous="allow"
)

[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mprateekanandaarambh[0m ([33mprateekanandaarambh1[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [5]:
max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    token=hugging_face_token,
)

==((====))==  Unsloth 2025.5.5: Fast Llama patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [6]:

prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

In [7]:
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or
              sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

FastLanguageModel.for_inference(model)

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])


<think>
Okay, so I'm trying to figure out what cystometry would show for this 61-year-old woman. Let me break it down step by step.

First, the patient has a history of involuntary urine loss when she coughs or sneezes. That makes me think of stress urinary incontinence, which is usually due to the bladder not closing properly during activities that increase abdominal pressure. But she doesn't leak at night, which suggests it's not a 24-hour issue and probably not related to something like night-time incontinence.

She underwent a gynecological exam and Q-tip test. I'm a bit fuzzy on what the Q-tip test specifically looks for. From what I remember, the Q-tip is a small catheter used to measure urethral pressures. It's usually inserted into the urethra to check for leakage and to assess the strength of the urethral sphincter. If the sphincter isn't functioning well, it can't prevent urine from leaking out when there's increased pressure, like from coughing.

So, during the Q-tip test, 

In [8]:

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""


In [9]:

dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True) # Keep only first 500 rows
dataset

README.md:   0%|          | 0.00/1.97k [00:00<?, ?B/s]

medical_o1_sft.json:   0%|          | 0.00/58.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/19704 [00:00<?, ? examples/s]

Dataset({
    features: ['Question', 'Complex_CoT', 'Response'],
    num_rows: 500
})

In [10]:

dataset[1]

{'Question': 'A 33-year-old woman is brought to the emergency department 15 minutes after being stabbed in the chest with a screwdriver. Given her vital signs of pulse 110/min, respirations 22/min, and blood pressure 90/65 mm Hg, along with the presence of a 5-cm deep stab wound at the upper border of the 8th rib in the left midaxillary line, which anatomical structure in her chest is most likely to be injured?',
 'Complex_CoT': "Okay, let's figure out what's going on here. A woman comes in with a stab wound from a screwdriver. It's in her chest, upper border of the 8th rib, left side, kind of around the midaxillary line. First thought, that's pretty close to where the lung sits, right?\n\nLet's talk about location first. This spot is along the left side of her body. Above the 8th rib, like that, is where a lot of important stuff lives, like the bottom part of the left lung, possibly the diaphragm too, especially considering how deep the screwdriver went.\n\nThe wound is 5 cm deep. Tha

In [11]:

EOS_TOKEN = tokenizer.eos_token
EOS_TOKEN

'<｜end▁of▁sentence｜>'

In [12]:

def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]

    texts = []

    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)

    return {
        "text": texts,
    }

In [13]:

dataset_finetune = dataset.map(formatting_prompts_func, batched = True)
dataset_finetune["text"][0]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

"Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.\nPlease answer the following medical question.\n\n### Question:\nGiven the symptoms of sudden weakness in the left arm and leg, recent long-distance travel, and the presence of swollen and tender right lower leg, what specific cardiac abnormality is most likely to be found upon further evaluation that could explain these findings?\n\n### Response:\n<think>\nOkay, let's see what's going on here. We've got sudden weakness in the person's left arm and leg - and that screams something neuro-related, maybe a stroke?\n\nBut wait, there's more. The right lower leg is sw

In [14]:
model_lora = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth 2025.5.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [15]:
trainer = SFTTrainer(
    model=model_lora,
    tokenizer=tokenizer,
    train_dataset=dataset_finetune,
    dataset_text_field="text",
    max_seq_length=1024,
    dataset_num_proc=2,

    args=TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=2,
        num_train_epochs=1,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",

    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/500 [00:00<?, ? examples/s]

In [16]:

trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 2 x 1) = 2
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,1.9429
20,1.5099
30,1.4486
40,1.367
50,1.3311
60,1.3706


In [17]:
trainer.save_model("finetuned_model")

In [None]:

wandb.finish()

0,1
train/epoch,▁▂▄▅▇██
train/global_step,▁▂▄▅▇██
train/grad_norm,█▄▄▁▆▂
train/learning_rate,█▇▅▄▂▁
train/loss,█▃▂▁▁▁

0,1
total_flos,3836721888632832.0
train/epoch,0.24
train/global_step,60.0
train/grad_norm,0.47678
train/learning_rate,0.0
train/loss,1.3623
train_loss,1.48265
train_runtime,275.5087
train_samples_per_second,0.436
train_steps_per_second,0.218


In [18]:
from google.colab import files
!zip -r /content/finetuned_model.zip /content/finetuned_model
files.download('/content/finetuned_model.zip')

  adding: content/finetuned_model/ (stored 0%)
  adding: content/finetuned_model/adapter_model.safetensors (deflated 7%)
  adding: content/finetuned_model/tokenizer_config.json (deflated 95%)
  adding: content/finetuned_model/adapter_config.json (deflated 56%)
  adding: content/finetuned_model/tokenizer.json (deflated 85%)
  adding: content/finetuned_model/README.md (deflated 66%)
  adding: content/finetuned_model/special_tokens_map.json (deflated 69%)
  adding: content/finetuned_model/training_args.bin (deflated 53%)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [32]:
question = """A 40 year old man is brought to the emergency department 15 minutes after being stabbed in the chest with a screwdriver.
 Given her vital signs of pulse 110/min, respirations 22/min, and blood pressure 90/65 mm Hg, along with the presence of a 5-cm deep stab wound
 at the upper border of the 8th rib in the left midaxillary line, which anatomical structure in her chest is most likely to be injured?"""

FastLanguageModel.for_inference(model_lora)

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])


<think>
Alright, let's see what we're dealing with here. We've got a 40-year-old guy who's been stabbed in the chest with a screwdriver, and he's been brought in 15 minutes later. That’s pretty fresh, so we’re definitely dealing with an acute injury.

First off, let's look at his vital signs: pulse is 110/min, respiration is 22/min, and blood pressure is 90/65 mm Hg. That’s a bit on the lower side, but let's not get too hung up on that yet. The real focus is on the stab wound.

Now, where's that stab wound located? It's 5 cm deep at the upper border of the 8th rib in the left midaxillary line. Hmm, that's right in the sweet spot for the internal mammary artery. I mean, the internal mammary artery is exactly where you’d expect a stab wound to hit in this scenario.

Let's think about what happens when you hit the internal mammary artery. It’s responsible for supplying blood to the anterior chest wall. If it's injured, we could be looking at some serious bleeding, especially considering 

In [33]:
question = """A 40-year-old man presents with a 6-month history of increasing joint pain and stiffness located primarily in his knees,
accompanied by redness and warmth. He has a history of diabetes, is not on medication, and experiences occasional diarrhea with fatty stools.
The physical exam shows mild swelling and redness in his knees, and he has unusually tan skin despite limited sun exposure.
Given these symptoms, to which disease is this patient most likely predisposed?"""

FastLanguageModel.for_inference(model_lora)

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])


<think>
Alright, let's think about this. We've got a 40-year-old man with joint pain and stiffness in his knees, along with some redness and warmth. That's definitely making me think about joint inflammation or something like that. He also has a history of diabetes, which is interesting. I know diabetes can lead to some complications, especially with blood flow and circulation, so that's a clue to consider.

Then there's the diarrhea part. He's having diarrhea with fatty stools. Hmm, that's a red flag for something like Crohn's disease or maybe something related to the gut. I remember that Crohn's can sometimes be linked to joint inflammation, especially in the knees.

Now, the skin is unusually tan, but he's not getting much sun. That makes me think about something related to skin coloration. Oh, and he's not on any medication for his diabetes, so that's another clue.

Putting it all together, we're looking at something that involves both joint inflammation and gut issues. Crohn's di

In [34]:
question = """ A 65-year-old man with a history of hypertension reports dizziness when standing up quickly.
 After a physical exam, what would a tilt-table test likely reveal about his blood pressure response?"""

FastLanguageModel.for_inference(model_lora)

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])


<think>
Okay, let's think about this. We have a 65-year-old guy who has a history of hypertension, which means he's prone to high blood pressure. He's complaining about dizziness whenever he gets up too quickly. Hmm, that's a classic sign of orthostatic hypotension. 

Now, what's orthostatic hypotension? It's when someone's blood pressure drops when they stand up. It's often seen in people with a history of high blood pressure because their arteries might not handle the sudden change in blood flow well. 

Alright, let's imagine we're doing a tilt-table test. This test is designed to see how someone's blood pressure responds when they're tilted upright. It's a way to test for orthostatic hypotension. 

So, when we tilt the table, we expect to see a drop in blood pressure. If the person shows a significant drop in blood pressure, it's a strong indication of orthostatic hypotension. If the blood pressure doesn't drop much, it might mean something else is going on, like a different type o

In [35]:
question = """ My 70 year old grandmother complains of body pain.Which test
should be performed to assess the cause of her symptoms """

FastLanguageModel.for_inference(model_lora)

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])


<think>
Okay, so my 70-year-old grandmother is complaining about having pain all over her body. That sounds pretty serious. I wonder what could be causing this. Maybe it's something like rheumatoid arthritis? I've heard that can cause joint pain and swelling, but I'm not sure if that's what's going on here.

Alright, let's think about how to figure out what's wrong. There's a test that's commonly used for assessing joint inflammation. Oh, right, it's called the C-reactive protein test. It's a blood test that measures inflammation levels in the body. Yeah, that seems relevant.

But wait, I should make sure it's the right test for this situation. I mean, it's used a lot in rheumatology, especially for conditions like arthritis. And since she's experiencing widespread pain, it could definitely be related to an inflammatory condition.

Hmm, I should also consider other tests that might be relevant. There's the erythrocyte sedimentation rate, which also measures inflammation. But the C-rea