#This project is a fine-tuned model on a domain-specific task: Medical Reasoning and Clinical Decision-Making.

In [None]:
# Disable TorchDynamo which can cause "PassManager::run failed"
import os
os.environ["TORCH_COMPILE_DEBUG"] = "0"
os.environ["TORCHDYNAMO_DISABLE"] = "1"

# Install dependencies
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
!pip install torch==2.1.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
!pip install trl huggingface_hub transformers datasets wandb accelerate bitsandbytes

Collecting git+https://github.com/unslothai/unsloth.git
  Cloning https://github.com/unslothai/unsloth.git to /tmp/pip-req-build-5l5jkj9v
  Running command git clone --filter=blob:none --quiet https://github.com/unslothai/unsloth.git /tmp/pip-req-build-5l5jkj9v
  Resolved https://github.com/unslothai/unsloth.git to commit 3340eaa41ce58619daf79d2783e49a45f8553a61
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: unsloth
  Building wheel for unsloth (pyproject.toml) ... [?25l[?25hdone
  Created wheel for unsloth: filename=unsloth-2025.5.10-py3-none-any.whl size=276844 sha256=8857a7c0794a92d9c7a5ef70b1fd497d2717b5209cf0ab13c3de5d22566435e0
  Stored in directory: /tmp/pip-ephem-wheel-cache-f377ipej/wheels/d1/17/05/850ab10c33284a4763b0595cd8ea9d01fce6e221cac24b3c01
Successfully built unsloth
Installing collected packages: unsloth


In [None]:
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from unsloth import is_bfloat16_supported
from huggingface_hub import login
from transformers import TrainingArguments
from datasets import load_dataset
import wandb
from google.colab import userdata

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
hf_token = userdata.get('HF_TOKEN')
login(token=hf_token)

In [None]:
import torch
print("CUDA available:", torch.cuda.is_available())
print("GPU device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU")

CUDA available: True
GPU device: Tesla T4


In [None]:
model_name = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
max_seq_length = 1024    # Reduced for T4 compatibility
dtype = None  # Auto-detect dtype
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    token=hf_token,
    trust_remote_code=False
)

==((====))==  Unsloth 2025.5.10: Fast Llama patching. Transformers: 4.52.4.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

if hasattr(model, '_unwrapped_old_generate'):
    del model._unwrapped_old_generate

Unsloth 2025.5.10 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [None]:
dataset = load_dataset("FreedomIntelligence/medical-o1-reasoning-SFT", "en", split="train[:500]", trust_remote_code=False)

EOS_TOKEN = tokenizer.eos_token

prompt_template = """
Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.

Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:

{}

{}
"""

def preprocess(examples):
    questions = examples["Question"]
    cots = examples["Complex_CoT"]
    responses = examples["Response"]

    texts = [
        prompt_template.format(q, cot, resp) + EOS_TOKEN
        for q, cot, resp in zip(questions, cots, responses)
    ]
    return {"text": texts}

dataset = dataset.map(preprocess, batched=True, num_proc=1)

In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=1,

    args=TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=8,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        save_strategy="no",
        report_to="wandb",
        #use_cache=False,  # Important fix
    ),
)

In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 500 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 8 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mtashfeenmehmood510[0m ([33mtashfeenmehmood510-no[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,1.9543
20,1.4417
30,1.4104
40,1.3691
50,1.3998
60,1.3783


In [None]:
output_dir = "/content/model"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
!zip -r /content/model.zip /content/model
print(f"Model saved to {output_dir}.zip")

updating: content/model/ (stored 0%)
updating: content/model/chat_template.jinja (deflated 75%)
updating: content/model/tokenizer_config.json (deflated 96%)
updating: content/model/tokenizer.json (deflated 85%)
updating: content/model/README.md (deflated 66%)
updating: content/model/special_tokens_map.json (deflated 69%)
updating: content/model/adapter_config.json (deflated 56%)
updating: content/model/adapter_model.safetensors (deflated 7%)
Model saved to /content/model.zip


In [None]:
FastLanguageModel.for_inference(model)

question = """A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing
              but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings,
              what would cystometry most likely reveal about her residual volume and detrusor contractions?"""

prompt = f"""
Below is a task description along with additional context provided in the input section. Your goal is to provide a well-reasoned response that effectively addresses the request.

Before crafting your answer, take a moment to carefully analyze the question. Develop a clear, step-by-step thought process to ensure your response is both logical and accurate.

### Task:
You are a medical expert specializing in clinical reasoning, diagnostics, and treatment planning. Answer the medical question below using your advanced knowledge.

### Query:
{question}

### Answer:
"""

inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
outputs = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    max_new_tokens=512,
    use_cache=False,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
print(response.split("### Answer:")[-1].strip())

The following generation flags are not valid and may be ignored: ['cache_implementation']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


Alright, let's think about this. This woman has been dealing with involuntary urine loss for a while, and that's pretty concerning. It sounds like she might be experiencing urinary incontinence, which is definitely something that needs to be checked out. Now, she's had a gynecological exam and a Q-tip test. These tests are pretty standard for women who suspect they have issues with their bladder.

The Q-tip test is usually used to assess the strength of the urethral muscles. It's kind of like taking a muscle test in a way. If the urethral muscles are weak, it might mean that she's experiencing some sort of urinary leakage because those muscles aren't working well to hold things in.

Now, let's think about cystometry. This test is crucial for checking out how well her bladder works. It's like a pressure test where they fill up her bladder and see how much she can hold before it leaks out. That's pretty cool because it gives a clear picture of how her bladder behaves.

If her bladder's c