### Author: Shahed Sabab

# **Supervised Fine-Tuning of DeepSeek-R1-Distill-Qwen-14B for Medical Tasks**

In this notebook, we will fine-tune the **DeepSeek-R1-Distill-Qwen-14B** model using **supervised fine-tuning (SFT)** to enhance its capability for medical reasoning and decision-making. 

To accelerate fine-tuning, we will leverage the **Unsloth** library, which optimizes training efficiency for large models.

## **Fine-Tuning Workflow**
We will follow these key steps to set up and fine-tune the model:

1. **Install dependencies** – Set up the required libraries.  
2. **Load the base model** – Initialize DeepSeek-R1-Distill-Qwen-14B.  
3. **Prepare the medical dataset** – Format and preprocess data for SFT.  
4. **Configure PEFT (Parameter-Efficient Fine-Tuning)** – Optimize training for efficiency.  
5. **Fine-tune and evaluate** – Train the model and assess performance.  

By the end of this notebook, we will have a fine-tuned version of the model that is better suited for medical tasks, ready for further testing and deployment.


In [1]:
import torch
print(torch.version.cuda)

12.4


In [None]:
from unsloth import FastLanguageModel
import torch
import os 
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
from dotenv import load_dotenv
from datasets import load_dataset
import os

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
max_seq_length = 2048 
dtype = None 
load_in_4bit = True
MODEL = "unsloth/DeepSeek-R1-Distill-Qwen-14B"

# Load environment variables from a .env file
load_dotenv()

# Access environment variables
hf_api_key = os.getenv("HUGGINGFACE_TOKEN")

In [4]:
local_model = './model_repo/DeepSeek-R1-Distill-Qwen-14B'

In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = MODEL,
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = hf_api_key
)

==((====))==  Unsloth 2025.2.15: Fast Qwen2 patching. Transformers: 4.49.0.
   \\   /|    GPU: NVIDIA GeForce RTX 3090. Max memory: 24.0 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.5.1. CUDA: 8.6. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


  self.register_buffer("cos_cached", emb.cos().to(dtype=dtype, device=device, non_blocking=True), persistent=False)


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [None]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with specialized expertise in clinical reasoning, diagnostics, and treatment planning. Please answer the following medical question. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

In [10]:
EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [None]:

dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

"Below is an instruction that describes a task, paired with an input that provides further context. \nWrite a response that appropriately completes the request. \nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \nPlease answer the following medical question. \n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her ab

In [12]:
FastLanguageModel.for_training(model)
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=7,
    use_rslora=False,  
    loftq_config=None,
)

Unsloth 2025.2.15 patched 48 layers with 48 QKV layers, 48 O layers and 48 MLP layers.


In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs = 1, 
        warmup_ratio=0.1,
        # warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

Map (num_proc=2):   0%|          | 0/500 [00:00<?, ? examples/s]

In [14]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 500 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 68,812,800


Step,Training Loss
10,1.838
20,1.3863
30,1.3191
40,1.2343
50,1.2517
60,1.2268


In [15]:
trainer_stats

TrainOutput(global_step=60, training_loss=1.3760364850362141, metrics={'train_runtime': 711.8431, 'train_samples_per_second': 0.674, 'train_steps_per_second': 0.084, 'total_flos': 3.404752748660736e+16, 'train_loss': 1.3760364850362141, 'epoch': 0.96})

In [16]:
new_model_local = "sabab05/DeepSeek-R1-Medical-COT-Qwen-14B"
model.save_pretrained(new_model_local) # Local saving
tokenizer.save_pretrained(new_model_local)

('sabab05/DeepSeek-R1-Medical-COT-Qwen-14B\\tokenizer_config.json',
 'sabab05/DeepSeek-R1-Medical-COT-Qwen-14B\\special_tokens_map.json',
 'sabab05/DeepSeek-R1-Medical-COT-Qwen-14B\\tokenizer.json')

In [17]:
model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 28.49 out of 63.92 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 15%|█▍        | 7/48 [00:00<00:02, 16.01it/s]
We will save to Disk and not RAM now.
100%|██████████| 48/48 [00:37<00:00,  1.28it/s]


Unsloth: Saving tokenizer... Done.
Done.
