In [1]:
%%capture
%pip install unsloth

In [2]:
import torch
print(torch.version.cuda)
print(torch.cuda.is_available())  

12.4
True


#### Option 1: Using transformers to Download and Save Locally

In [6]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Specify the model name
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

# Download and save the model and tokenizer locally
local_model_path = "./deepseek_r1_model"  # Path to save the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)

# Save the model and tokenizer to the local path
tokenizer.save_pretrained(local_model_path)
model.save_pretrained(local_model_path)

print(f"Model and tokenizer saved to {local_model_path}")

Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Model and tokenizer saved to ./deepseek_r1_model


#### Option 2: Using huggingface_hub to Download

In [15]:
from huggingface_hub import snapshot_download

# Specify the model name
model_name = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"

# Download the model to a local directory
local_model_path = "./deepseek_r1_model"
snapshot_download(repo_id=model_name, local_dir=local_model_path)

print(f"Model downloaded to {local_model_path}")

Fetching 9 files: 100%|██████████| 9/9 [00:05<00:00,  1.73it/s]

Model downloaded to ./deepseek_r1_model





In [1]:
from unsloth import FastLanguageModel

# Specify the local path to the model
local_model_path = "./deepseek_r1_model"

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=local_model_path,  # Use the local path
    max_seq_length=2048,
    dtype=None,
    load_in_4bit=True,
)

print("Model and tokenizer loaded successfully!")

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


  from .autonotebook import tqdm as notebook_tqdm


🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.2.15: Fast Qwen2 patching. Transformers: 4.49.0.
   \\   /|    GPU: NVIDIA GeForce GTX 1650. Max memory: 3.806 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Sliding Window Attention is enabled but not implemented for `eager`; unexpected results may be encountered.


./deepseek_r1_model does not have a padding token! Will use pad_token = <|vision_pad|>.
Model and tokenizer loaded successfully!


#### Setting up the model , We add the LoRA (Low-Rank Adapter) to the model using the linear modules from the base model. 

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16, 
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0, 
    bias="none", 
   
    use_gradient_checkpointing="unsloth", 
    random_state=3407,
    use_rslora=False, 
    loftq_config=None,
)

Unsloth 2025.2.15 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


In [5]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

In [6]:
question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"


FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])


<think>
Okay, so I'm trying to figure out what cystometry would reveal in this scenario. Let's break it down step by step. The patient is a 61-year-old woman with a long history of involuntary urine loss during activities but no leakage at night. She's undergoing a gynecological exam and a Q-tip test. 

First, I remember that cystometry is a method to measure residual volume in the lungs. It involves placing a Q-tip under the skin and observing how much air comes out with each breath. Depending on the amount of air expelled, it can indicate different lung conditions.

Now, the patient's history of involuntary urine loss during activities but not at night makes me think about the underlying cause. Involuntary urine loss can often be due to conditions like acute respiratory failure or hypoventilation. Since she's not losing urine at night, maybe it's related to her breathing pattern during the activities.

The Q-tip test is used to assess lung function. If the Q-tip doesn't expel any ai