<a href="https://colab.research.google.com/github/bananighosh/mediBuddy-deepseek-r1/blob/main/mediBuddy_fine_tuned_deepseek_r1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Requirements

In [1]:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [3]:
from google.colab import userdata
from huggingface_hub import login
login(token=userdata.get('HF_TOKEN'))

In [57]:
from google.colab import userdata
import wandb
wandb.login(key = userdata.get('wandb_token'))



True

In [4]:
from unsloth import FastLanguageModel

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


# Load the model

In [38]:
max_seq_length = 2048
dtype = None
load_in_4bit = True

In [39]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name= "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length= max_seq_length,
    dtype= dtype,
    load_in_4bit= load_in_4bit,
    token = userdata.get('HF_TOKEN')
)

==((====))==  Unsloth 2025.2.15: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    GPU: NVIDIA A100-SXM4-40GB. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


# Defining the Fine-tunes model

In [40]:
FastLanguageModel.for_training(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096, padding_idx=128004)
    (layers): ModuleList(
      (0): LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear4bit(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((409

In [41]:
model_lora = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config=None,
)

# Running Inference before Fine tuning

In [42]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

In [43]:
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing
              but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""




FastLanguageModel.for_inference(model)

inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids = inputs.input_ids,
    attention_mask = inputs.attention_mask,
    max_new_tokens = 1200,
    use_cache = True
)

response= tokenizer.batch_decode(outputs)


In [44]:
print(response[0].split("### Response:")[1])


<think>
Okay, so I'm trying to figure out what cystometry would show for this 61-year-old woman. She's been having involuntary urine loss when she coughs or sneezes but doesn't leak at night. She had a gynecological exam and Q-tip test. I need to determine what cystometry would reveal about her residual volume and detrusor contractions.

First, I remember that cystometry, also known as a bladder diary study, is used to measure how much urine is left in the bladder (residual volume) and how the detrusor muscle contracts during filling. It's usually done by inserting a catheter into the bladder and filling it with a radioactive isotope or a contrast agent to track the flow.

In this case, the patient has a history of urinary incontinence upon activities like coughing, which suggests she has some bladder instability. However, she doesn't leak at night, which is a bit unusual because typically, people with overactive bladders might leak at night. So, maybe her issue is stress-related rath

# Loading the Dataset

In [45]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

In [46]:
EOS_TOKEN = tokenizer.eos_token

def formatting_prompts_func(examples):
  inputs = examples["Question"]
  cots = examples["Complex_CoT"]
  outputs = examples["Response"]
  texts = []
  for input, cot, output in zip(inputs, cots, outputs):
    text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
    texts.append(text)
  return {
      "text" : texts,
  }

In [47]:
from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:100]",trust_remote_code=True)
dataset_finetune = dataset.map(formatting_prompts_func, batched = True,)

In [48]:
dataset_finetune[0]

{'Question': 'A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?',
 'Complex_CoT': "Okay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdominal pressure like coughing or sneezing. This sounds a lot like stress urinary incontinence to me. Now, it's interesting that she doesn't have any issues at night; she isn't experiencing leakage while sleeping. This likely means her bladder's ability to hold urine is fine when she isn't under physical stress. Hmm, that's a clue that we're dealing with something related to pressure rather than a bladder muscle problem. \n\nThe fact that she underwent a Q-tip test is intriguing too. This 

In [49]:
dataset_finetune["text"][0]

"Below is an instruction that describes a task, paired with an input that provides further context.\nWrite a response that appropriately completes the request.\nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.\nPlease answer the following medical question.\n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her abdomi

# Setting the Training Arguments

In [50]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

### Error Fix link: https://github.com/unslothai/unsloth/issues/1709#issuecomment-2659205283


Error:
AttributeError: 'PeftModelForCausalLM' object has no attribute '_unwrapped_old_generate'

In [52]:
FastLanguageModel.for_training(model)

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(128256, 4096, padding_idx=128004)
    (layers): ModuleList(
      (0): LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): lora.Linear4bit(
            (base_layer): Linear4bit(in_features=4096, out_features=4096, bias=False)
            (lora_dropout): ModuleDict(
              (default): Identity()
            )
            (lora_A): ModuleDict(
              (default): Linear(in_features=4096, out_features=16, bias=False)
            )
            (lora_B): ModuleDict(
              (default): Linear(in_features=16, out_features=4096, bias=False)
            )
            (lora_embedding_A): ParameterDict()
            (lora_embedding_B): ParameterDict()
            (lora_magnitude_vector): ModuleDict()
          )
          (k_proj): lora.Linear4bit(
            (base_layer): Linear4bit(in_features=4096, out_features=1024, bias=False)
            (lora_dropout): ModuleDict(
              

In [54]:
trainer = SFTTrainer(
  model_lora,
  tokenizer = tokenizer,
  train_dataset = dataset_finetune,
  dataset_text_field = "text",
  max_seq_length = max_seq_length,
  dataset_num_proc = 2,


  args = TrainingArguments(
      per_device_train_batch_size = 2,
      gradient_accumulation_steps = 4,
      num_train_epochs = 1,
      warmup_steps = 5,
      max_steps = 60,
      learning_rate = 2e-4,
      fp16=not is_bfloat16_supported(),
      bf16 = is_bfloat16_supported(),
      # fp16=True,
      # bf16 = False,
      logging_steps = 1,
      optim="adamw_8bit",
      weight_decay=0.01,
      lr_scheduler_type="linear",
      seed=3407,
      output_dir="outputs",
  )
)

In [55]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 100 | Num Epochs = 5
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
1,2.1823
2,2.2011
3,2.1896
4,2.3482
5,1.9824
6,1.7543
7,1.7783
8,1.6952
9,1.5633
10,1.6495


In [58]:
wandb.finish()

0,1
train/epoch,▁▁▁▁▁▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▇▇▇▇▇█████
train/global_step,▁▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▃▄▄▄▄▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇████
train/grad_norm,▅▅▅▆█▆▄▃▆▅▄▃▂▂▁▂▂▂▁▃▂▁▁▁▂▂▁▁▂▁▂▂▂▁▂▂▂▂▂▂
train/learning_rate,▂▄▅▇██▇▇▇▇▇▇▆▆▆▆▅▅▅▅▄▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▁▁▁
train/loss,▇▇█▆▅▄▄▄▄▃▃▄▃▃▃▃▂▂▃▂▂▂▂▃▂▂▂▁▂▁▂▁▂▁▂▂▁▂▁▂

0,1
total_flos,1.747039500386304e+16
train/epoch,4.64
train/global_step,60.0
train/grad_norm,0.35899
train/learning_rate,0.0
train/loss,1.1806
train_loss,1.38607
train_runtime,165.0852
train_samples_per_second,2.908
train_steps_per_second,0.363


# Inferencing after finetuning

In [62]:
question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing
              but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""




FastLanguageModel.for_inference(model_lora)

inputs = tokenizer([train_prompt_style.format(question, "", "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids = inputs.input_ids,
    attention_mask = inputs.attention_mask,
    max_new_tokens = 1200,
    use_cache = True
)

response= tokenizer.batch_decode(outputs)

print(response[0].split("### Response:")[1])


<think>

</think>
Alright, let's think about this. We have a 61-year-old woman who's been dealing with involuntary urine loss whenever she coughs or sneezes, but she doesn't have any issues at night. That sounds like a classic case of stress urinary incontinence, right? 

Now, she's had a gynecological exam and they did a Q-tip test. Hmm, the Q-tip test is usually used to check for urethral obstruction. If it's positive, it suggests that there's obstruction, which can lead to problems like difficulty urinating or even retention. 

But wait, in this case, there's no mention of any obstruction or difficulties with urination. She's dealing with stress incontinence, not something like urethral obstruction. So, maybe the Q-tip test was negative, right? That would make sense.

Now, if we think about what happens in stress urinary incontinence, it's all about the urethral pressure and how the bladder behaves when we cough or sneeze. When she coughs, the abdominal pressure increases, and if h