## Install & imports
Read our Unsloth Guide: Optimize and Speed Up LLM Fine-Tuning to learn about Unsloth's key features, various functions, and how to optimize your fine-tuning workflow.

In [None]:
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [6]:
from unsloth import FastLanguageModel #Optimize finetuning
import torch
from trl import SFTTrainer
from unsloth import is_bfloat16_supported

from huggingface_hub import login
from transformers import TrainingArguments
from datasets import load_dataset

import wandb
from google.colab import userdata


In [None]:
hugging_face_token = userdata.get('HF_TOKEN') # huggingface token to retrieve the DeepSeek model
wnb_token = userdata.get('WB_TOKEN') # wnb token for training tracking (optional)

login(hugging_face_token)

# Run WandB session
wandb.login(key=wnb_token)
run = wandb.init(
    project='Fine-tune-DepsSeek-R1-Distilled on medical',
    job_type='training',
    anonymous='allow'
)


## Model parameterization & retrieval
The model used is : https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B


In [9]:
max_seq_length = 2048
dtype = None
load_in_4bit = True

In [30]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = userdata.get('HF_TOKEN')
)

==((====))==  Unsloth 2025.2.12: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## Testing DeepSeek R1 on medical use-case before fine-tuning

In [13]:
# Define system prompt under prompt_style
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thourght to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""


In [14]:
# Creating a test medical question for inference
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"

# Enable optimized inference mode for Unloth models(improves speed and efficiency)
FastLanguageModel.for_inference(model)
# Format the question using the structured prompt and tokenize it (make sure we return pytorch tensor)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate the response using the model
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
# Decode the generated output tokens into human-readable text
response = tokenizer.batch_decode(outputs)

# extract and print only the relevant response part (after ### Response)
print(response[0].split("### Response:")[1])


<think>
Okay, so I'm trying to figure out what the cystometry would show for this 61-year-old woman. Let's break it down step by step.

First, the patient has a history of involuntary urine loss, especially when she coughs or sneezes. That makes me think of stress urinary incontinence. But she doesn't leak at night, so it's probably not the kind of incontinence that affects sleep, like nocturia. 

She underwent a gynecological exam and a Q-tip test. I'm not super familiar with the Q-tip test, but I think it's used to check for urethral obstruction. From what I remember, the Q-tip is a small catheter with a balloon at the tip. It's inserted into the urethra, and if the balloon doesn't fill easily, it might indicate that the urethra is blocked. If it does fill, it might suggest that the urethra is patent but perhaps not functioning properly.

So, if the Q-tip test shows that the urethra is obstructed, that might mean that the flow of urine is impaired, possibly leading to the stress inc

## Fine-tuning step-by-step
The fine tuning dataset is : https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT/viewer/en/train
We are gonna use the english version, having 25.4k rows

In [15]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

In [17]:
from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset

Dataset({
    features: ['Question', 'Complex_CoT', 'Response'],
    num_rows: 500
})

In [18]:
dataset[1]

{'Question': 'A 45-year-old man with a history of alcohol use, who has been abstinent for the past 10 years, presents with sudden onset dysarthria, shuffling gait, and intention tremors. Given this clinical presentation and history, what is the most likely diagnosis?',
 'Complex_CoT': "Alright, let’s break this down. We have a 45-year-old man here, who suddenly starts showing some pretty specific symptoms: dysarthria, shuffling gait, and those intention tremors. This suggests something's going wrong with motor control, probably involving the cerebellum or its connections.\n\nNow, what's intriguing is that he's had a history of alcohol use, but he's been off it for the past 10 years. Alcohol can do a number on the cerebellum, leading to degeneration, and apparently, the effects can hang around or even appear long after one stops drinking.\n\nAt first glance, these symptoms look like they could be some kind of chronic degeneration, maybe something like alcoholic cerebellar degeneration, 

Formating the dataset so it fits our training strategy

In [24]:
EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [25]:
# Update dataset formatting
dataset_finetune = dataset.map(formatting_prompts_func, batched = True)
dataset_finetune["text"][0]

"Below is an instruction that describes a task, paired with an input that provides further context. \nWrite a response that appropriately completes the request. \nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \nPlease answer the following medical question. \n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her ab

## Setting up the model using LoRa (Low-Rank Adaptation)

In [31]:
model_lora = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

In [32]:
# initialized fine tuning process


trainer = SFTTrainer(
    model=model_lora,
    tokenizer=tokenizer,
    train_dataset=dataset_finetune,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=1,
    args=TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        num_train_epochs = 1, #warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

Tokenizing train dataset:   0%|          | 0/500 [00:00<?, ? examples/s]

Tokenizing train dataset:   0%|          | 0/500 [00:00<?, ? examples/s]

## Model Training

In [33]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 500 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 1 | Gradient Accumulation steps = 4
\        /    Total batch size = 4 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040


Step,Training Loss
10,1.9202
20,1.4857
30,1.4187
40,1.3766
50,1.3808
60,1.362


In [None]:
wandb.finish()

## model inference after fine-tuning

In [35]:
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model_lora)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model_lora.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>
Okay, let's see. This woman is 61, and she's been dealing with this issue of losing urine involuntarily for a while now. That sounds like a classic case of urinary incontinence. Now, when it comes to incontinence, there are two types that stick out in my mind: stress incontinence and urge incontinence. Stress incontinence is when you leak when you cough, sneeze, or laugh, which seems to fit her situation perfectly. That's probably what's going on here.

Now, let's think about how we figure out which type of incontinence it is. The Q-tip test is usually used to determine the type of incontinence, and it can help us understand the underlying issue. This test involves placing a Q-tip catheter into the bladder and then asking the person to cough or sneeze. If the incontinence happens when she coughs, that's a big clue that stress incontinence is likely the culprit.

Once we've got a diagnosis of stress incontinence, we need to think about what cystometry would show. Cystometry is 