<a href="https://colab.research.google.com/github/Prateek1727/medico-bot/blob/main/Medical_Chatbot_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
!pip install triton==2.0.0

In [None]:
!pip install torch --upgrade --force-reinstall

Collecting torch
  Using cached torch-2.7.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (29 kB)
Collecting filelock (from torch)
  Downloading filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4.10.0 (from torch)
  Downloading typing_extensions-4.13.2-py3-none-any.whl.metadata (3.0 kB)
Collecting sympy>=1.13.3 (from torch)
  Using cached sympy-1.14.0-py3-none-any.whl.metadata (12 kB)
Collecting networkx (from torch)
  Downloading networkx-3.4.2-py3-none-any.whl.metadata (6.3 kB)
Collecting jinja2 (from torch)
  Downloading jinja2-3.1.6-py3-none-any.whl.metadata (2.9 kB)
Collecting fsspec (from torch)
  Downloading fsspec-2025.3.2-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.6.77 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.6.77 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.many

In [None]:
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from unsloth import is_bfloat16_supported
from huggingface_hub import login
from transformers import TrainingArguments
from datasets import load_dataset
import wandb
from google.colab import userdata

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
hugging_face_token = userdata.get("Hugging_Face_Token")
wnb_token = userdata.get("wnb")

login(hugging_face_token)

wandb.login(key=wnb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset_YouTube Walkthrough',
    job_type="training",
    anonymous="allow"
)

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mprateekanandaarambh[0m ([33mprateekanandaarambh1[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


In [None]:
max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    token=hugging_face_token,
)

==((====))==  Unsloth 2025.4.8: Fast Llama patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [None]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

In [None]:
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or
              sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

FastLanguageModel.for_inference(model)

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])


<think>
Okay, so I'm trying to figure out what cystometry would show for this 61-year-old woman. Let me break this down step by step.

First, the patient has a history of involuntary urine loss when she coughs or sneezes. That makes me think of urge incontinence, especially since she doesn't leak at night. So, the issue is likely happening during activities that increase intra-abdominal pressure. That points towards a possible overactive bladder or some nerve issue affecting the lower urinary tract.

She underwent a gynecological exam and a Q-tip test. I'm not too familiar with the Q-tip test, but I think it's used to check for urethral obstruction. If the Q-tip doesn't stay at 45 degrees when the patient coughs, it might indicate that the urethra is open, which would suggest urethral obstruction. But in this case, she does have involuntary loss, so maybe the Q-tip was positive, meaning it did stay at 45 degrees, indicating no obstruction. Or maybe it wasn't, but I'm not sure. I shoul

In [None]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""


In [None]:
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset

Dataset({
    features: ['Question', 'Complex_CoT', 'Response'],
    num_rows: 500
})

In [None]:
dataset[1]

{'Question': 'A 33-year-old woman is brought to the emergency department 15 minutes after being stabbed in the chest with a screwdriver. Given her vital signs of pulse 110/min, respirations 22/min, and blood pressure 90/65 mm Hg, along with the presence of a 5-cm deep stab wound at the upper border of the 8th rib in the left midaxillary line, which anatomical structure in her chest is most likely to be injured?',
 'Complex_CoT': "Okay, let's figure out what's going on here. A woman comes in with a stab wound from a screwdriver. It's in her chest, upper border of the 8th rib, left side, kind of around the midaxillary line. First thought, that's pretty close to where the lung sits, right?\n\nLet's talk about location first. This spot is along the left side of her body. Above the 8th rib, like that, is where a lot of important stuff lives, like the bottom part of the left lung, possibly the diaphragm too, especially considering how deep the screwdriver went.\n\nThe wound is 5 cm deep. Tha

In [None]:
EOS_TOKEN = tokenizer.eos_token
EOS_TOKEN

'<｜end▁of▁sentence｜>'

In [None]:
def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]

    texts = []

    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)

    return {
        "text": texts,
    }

In [None]:
dataset_finetune = dataset.map(formatting_prompts_func, batched = True)
dataset_finetune["text"][0]

"Below is an instruction that describes a task, paired with an input that provides further context. \nWrite a response that appropriately completes the request. \nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \nPlease answer the following medical question. \n\n### Question:\nGiven the symptoms of sudden weakness in the left arm and leg, recent long-distance travel, and the presence of swollen and tender right lower leg, what specific cardiac abnormality is most likely to be found upon further evaluation that could explain these findings?\n\n### Response:\n<think>\nOkay, let's see what's going on here. We've got sudden weakness in the person's left arm and leg - and that screams something neuro-related, maybe a stroke?\n\nBut wait, there's more. The right lower leg i

In [None]:
model_lora = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth 2025.4.8 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments

trainer = SFTTrainer(
    model=model_lora,
    tokenizer=tokenizer,
    train_dataset=dataset_finetune,
    dataset_text_field="text",
    max_seq_length=1024,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=2,
        num_train_epochs=1,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/500 [00:00<?, ? examples/s]

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 2 x 1) = 2
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)


Step,Training Loss
10,1.9216
20,1.4935
30,1.4389
40,1.3577
50,1.3219
60,1.3623


In [None]:
trainer.save_model("finetuned_model")

In [None]:
wandb.finish()

0,1
train/epoch,▁▂▄▅▇██
train/global_step,▁▂▄▅▇██
train/grad_norm,█▄▄▁▆▂
train/learning_rate,█▇▅▄▂▁
train/loss,█▃▂▁▁▁

0,1
total_flos,3836721888632832.0
train/epoch,0.24
train/global_step,60.0
train/grad_norm,0.47678
train/learning_rate,0.0
train/loss,1.3623
train_loss,1.48265
train_runtime,275.5087
train_samples_per_second,0.436
train_steps_per_second,0.218


In [None]:
from google.colab import files
!zip -r /content/finetuned_model.zip /content/finetuned_model
files.download('/content/finetuned_model.zip')

  adding: content/finetuned_model/ (stored 0%)
  adding: content/finetuned_model/adapter_model.safetensors (deflated 7%)
  adding: content/finetuned_model/training_args.bin (deflated 53%)
  adding: content/finetuned_model/special_tokens_map.json (deflated 69%)
  adding: content/finetuned_model/tokenizer.json (deflated 85%)
  adding: content/finetuned_model/tokenizer_config.json (deflated 95%)
  adding: content/finetuned_model/adapter_config.json (deflated 56%)
  adding: content/finetuned_model/README.md (deflated 66%)


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
from unsloth import FastLanguageModel
     model_lora = FastLanguageModel.from_pretrained("/content/finetuned_model")

In [None]:
question = """A 40 year old man is brought to the emergency department 15 minutes after being stabbed in the chest with a screwdriver.
 Given her vital signs of pulse 110/min, respirations 22/min, and blood pressure 90/65 mm Hg, along with the presence of a 5-cm deep stab wound
 at the upper border of the 8th rib in the left midaxillary line, which anatomical structure in her chest is most likely to be injured?"""

FastLanguageModel.for_inference(model_lora)

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])


<think>
Okay, so we have a 40-year-old man who was stabbed in the chest with a screwdriver 15 minutes ago. He's got a pulse of 110/min, breathing 22/min, and his blood pressure is 90/65 mm Hg. That's a bit low, but we need to figure out what's going on in his chest based on that stab wound.

Alright, let's start by looking at the stab wound. It's 5 centimeters deep and located at the upper border of the 8th rib on the left side. Hmm, that's a pretty deep wound, so we're definitely looking at some internal structures here.

Given that the screwdriver went into the chest, we're probably dealing with injuries to some internal organs. Now, the ribcage is important here because it protects the heart and lungs. The 8th rib is part of the mid-axillary line, which is a key line for evaluating injuries.

Given that the wound is deep and on the left side, we need to think about what's behind it. The left mid-axillary line is a reference for the pericardial sac. So, if the screwdriver went deep 

In [None]:
question = """A 40-year-old man presents with a 6-month history of increasing joint pain and stiffness located primarily in his knees,
accompanied by redness and warmth. He has a history of diabetes, is not on medication, and experiences occasional diarrhea with fatty stools.
The physical exam shows mild swelling and redness in his knees, and he has unusually tan skin despite limited sun exposure.
Given these symptoms, to which disease is this patient most likely predisposed?"""

FastLanguageModel.for_inference(model_lora)

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

response = tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])


<think>
Alright, let's think about this. The patient is a 40-year-old man with a 6-month history of joint pain and stiffness in his knees. He also mentions redness and warmth in his joints, which sounds like inflammation. Okay, that's a clue.

He's got diabetes, and even though he's not on any medications, that's something to note. He's also experiencing diarrhea with fatty stools, which is a bit unusual. It makes me think about how the gut and joint inflammation are sometimes linked.

Now, looking at the physical exam, there's mild swelling and redness in his knees, and his skin is unusually tan, even though he doesn't spend much time in the sun. Hmm, that's interesting. It's like his skin is getting too much pigment.

Putting this all together, I'm thinking about what connects joint inflammation, skin discoloration, and diabetes. Hmm, maybe it's something related to his immune system or chronic inflammation.

Wait, diabetes can sometimes lead to a condition called rheumatoid arthrit