In [8]:
!pip install unsloth 



## Import Libraries

In [9]:
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from unsloth import is_bfloat16_supported
from huggingface_hub import login
from transformers import TrainingArguments
from datasets import load_dataset
import wandb
from kaggle_secrets import UserSecretsClient 

### Create API keys and login to Hugging Face and Weights and Biases

In [None]:
user_secrets = UserSecretsClient()

# REDACTED_token = user_secrets.get_secret("use your hugging face secret key")
# login(REDACTED_token)

# wb_token = user_secrets.get_secret("use your weight and bias secret key")

# wandb.login(key=wb_token)
run = wandb.init(
    project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset', 
    job_type="training", 
    anonymous="allow"
)



### Loading Deepseek R1 and Tokenizer

In [11]:
max_seq_length = 2048 
dtype = None 
load_in_4bit = True


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = REDACTED_token, 
)

==((====))==  Unsloth 2025.3.9: Fast Llama patching. Transformers: 4.49.0.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


## Model inference before fine-tuning

In [12]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>{}"""

In [13]:
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model) 
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>
Okay, so I've got this medical question to tackle. Let's break it down step by step. 

The patient is a 61-year-old woman with a history of involuntary urine loss during activities like coughing or sneezing, but she doesn't leak at night. She undergoes a gynecological exam and Q-tip test. I need to figure out what cystometry would show regarding her residual volume and detrusor contractions.

First, let me recall what I know about urinary incontinence, especially in women. Involuntary urine loss during coughing or sneezing is typically associated with stress urinary incontinence (SUI). SUI occurs when the urethral sphincter doesn't fully relax, often due to weakened pelvic muscles or urethral issues, leading to leakage upon increased abdominal pressure, like from coughing.

The gynecological exam likely involved a physical check for pelvic muscle tone, and the Q-tip test assesses urethral function. The Q-tip test involves inserting a catheter with a balloon (Foley catheter) to

### Loading and processing the dataset

In [14]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

In [15]:
EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN


def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }

In [16]:
from datasets import load_dataset
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT","en", split = "train[0:500]",trust_remote_code=True)
dataset = dataset.map(formatting_prompts_func, batched = True,)
dataset["text"][0]

Generating train split:   0%|          | 0/25371 [00:00<?, ? examples/s]

Map:   0%|          | 0/500 [00:00<?, ? examples/s]

"Below is an instruction that describes a task, paired with an input that provides further context. \nWrite a response that appropriately completes the request. \nBefore answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.\n\n### Instruction:\nYou are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. \nPlease answer the following medical question. \n\n### Question:\nA 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?\n\n### Response:\n<think>\nOkay, let's think about this step by step. There's a 61-year-old woman here who's been dealing with involuntary urine leakages whenever she's doing something that ups her ab

### Setting up the model

In [17]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,  
    bias="none",  
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  
    loftq_config=None,
)

Unsloth 2025.3.9 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [18]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        # Use num_train_epochs = 1, warmup_ratio for full training runs!
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)

Tokenizing to ["text"] (num_proc=2):   0%|          | 0/500 [00:00<?, ? examples/s]

## Model training

In [19]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 2 | Total steps = 60
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 4 x 1) = 16
 "-____-"     Trainable parameters = 41,943,040/4,670,623,744 (0.90% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,1.9142
20,1.421
30,1.366
40,1.3285
50,1.2951
60,1.3269


## Model inference after fine-tuning

In [20]:
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])



<think>
Okay, let's see. We have a 61-year-old woman with a history of involuntary urine loss, especially during things like coughing or sneezing. That sounds like a classic presentation of urinary incontinence, probably stress incontinence. Now, stress incontinence usually means the bladder neck isn't closing properly, and when the abdominal muscles contract, the bladder doesn't hold up.

Now, during a gynecological exam, they would check the cervix and bladder neck. The Q-tip test is commonly used in stress incontinence. If the Q-tip is positive, that means the bladder neck is not closing properly when the abdominal muscles contract. This makes sense because stress incontinence is all about the bladder neck not sealing.

With stress incontinence, I'm expecting that when we do cystometry, the residual volume might not be too bad. But we need to check what happens during a cough or sneeze. The detrusor contractions, which are like the muscle contractions in the bladder, are usually no

In [21]:
new_model_local = "DeepseekR1_Health"
model.save_pretrained(new_model_local) 
tokenizer.save_pretrained(new_model_local)

model.save_pretrained_merged(new_model_local, tokenizer, save_method = "merged_16bit",)

Unsloth: You have 2 CPUs. Using `safe_serialization` is 10x slower.
We shall switch to Pytorch saving, which might take 3 minutes and not 30 minutes.
To force `safe_serialization`, set it to `None` instead.
Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 6.0G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 16.38 out of 31.35 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 34%|███▍      | 11/32 [00:00<00:01, 13.17it/s]
We will save to Disk and not RAM now.
100%|██████████| 32/32 [00:27<00:00,  1.16it/s]


Unsloth: Saving tokenizer... Done.
Unsloth: Saving DeepseekR1_Health/pytorch_model-00001-of-00004.bin...
Unsloth: Saving DeepseekR1_Health/pytorch_model-00002-of-00004.bin...
Unsloth: Saving DeepseekR1_Health/pytorch_model-00003-of-00004.bin...
Unsloth: Saving DeepseekR1_Health/pytorch_model-00004-of-00004.bin...
Done.


In [22]:
new_model_online = "navneetnt01/DeepSeek_R1_Medical"
model.push_to_hub(new_model_online, token="REDACTED_pWArIAZEqwufXzquptChIIVauZlgCybgRE")
tokenizer.push_to_hub(new_model_online, token="REDACTED_pWArIAZEqwufXzquptChIIVauZlgCybgRE")

model.push_to_hub_merged(new_model_online, tokenizer, save_method = "merged_16bit")

Saved model to https://huggingface.co/navneetnt01/DeepSeek_R1_Medical


Unsloth: You are pushing to hub in Kaggle environment.
To save memory, we shall move navneetnt01/DeepSeek_R1_Medical to /tmp/DeepSeek_R1_Medical
Unsloth: Will remove a cached repo with size 1.6K


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 16.38 out of 31.35 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 32/32 [00:25<00:00,  1.25it/s]


Unsloth: Saving tokenizer...

HfHubHTTPError: (Request ID: Root=1-67d1c786-26a2a34b29cda4340fb98691;f1628bda-d0fd-4666-904a-1a8819b7ff64)

403 Forbidden: You don't have the rights to create a model under the namespace "navneetnt01".
Cannot access content at: https://huggingface.co/api/repos/create.
Make sure your token has the correct permissions.