In [1]:
!pip uninstall torch -y
!pip install torch --upgrade

Found existing installation: torch 2.6.0
Uninstalling torch-2.6.0:
  Successfully uninstalled torch-2.6.0
Collecting torch
  Using cached torch-2.6.0-cp310-cp310-manylinux1_x86_64.whl.metadata (28 kB)
Using cached torch-2.6.0-cp310-cp310-manylinux1_x86_64.whl (766.7 MB)
Installing collected packages: torch
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fastai 2.7.18 requires torch<2.6,>=1.10, but you have torch 2.6.0 which is incompatible.
torchaudio 2.5.1+cu121 requires torch==2.5.1, but you have torch 2.6.0 which is incompatible.[0m[31m
[0mSuccessfully installed torch-2.6.0


In [2]:
%%capture
# Install and upgrade Unsloth (and its latest version)
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [3]:
# Modules for fine-tuning
from unsloth import FastLanguageModel
import torch # Import PyTorch
from trl import SFTTrainer # Trainer for supervised fine-tuning (SFT)
from unsloth import is_bfloat16_supported # Checks if the hardware supports bfloat16 precision
# Hugging Face modules
from huggingface_hub import login # Lets you login to API
from transformers import TrainingArguments # Defines training hyperparameters
from datasets import load_dataset # Lets you load fine-tuning datasets
# Import weights and biases
import wandb
# Import kaggle secrets
from kaggle_secrets import UserSecretsClient

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [4]:
# Initialize Hugging Face & WnB tokens
user_secrets = UserSecretsClient() # from kaggle_secrets import UserSecretsClient
hugging_face_token = user_secrets.get_secret("HF_TOKEN")
wnb_token = user_secrets.get_secret("wnb")

# Login to Hugging Face
login(hugging_face_token) # from huggingface_hub import login

# Login to WnB
wandb.login(key=wnb_token) # import wandb
run = wandb.init(
    project='Fine-tune DeepSeek R1 Distill Llama 8B on Medical Dataset', 
    job_type="training", 
    anonymous="allow"
)

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
[34m[1mwandb[0m: Currently logged in as: [33mkarthik-nimmagadda[0m ([33mkarthik-nimmagadda-san-jose-state-university[0m). Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


In [5]:
# Set parameters
max_seq_length = 2048 # Define the maximum sequence length a model can handle (i.e. how many tokens can be processed at once)
dtype = None # Set to default 
load_in_4bit = True # Enables 4 bit quantization — a memory saving optimization 

# Load the DeepSeek R1 model and tokenizer using unsloth — imported using: from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name= "unsloth/DeepSeek-R1-Distill-Llama-8B",  # Load the pre-trained gemma model (7B parameter version)
    max_seq_length=max_seq_length, # Ensure the model can process up to 2048 tokens at once
    dtype=dtype, # Use the default data type (e.g., FP16 or BF16 depending on hardware support)
    load_in_4bit=load_in_4bit, # Load the model in 4-bit quantization to save memory
    token=hugging_face_token, # Use hugging face token
)

==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.50.3.
   \\   /|    Tesla P100-PCIE-16GB. Num GPUs = 1. Max memory: 15.888 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 6.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [6]:
# Define prompt styles
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>{}"""

In [7]:
# Training prompt style updated to include both <think> and </think> tags:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

In [8]:
# Load the dataset from Kaggle (assumes the dataset has fields: "context", "input", "output")
dataset = load_dataset("/kaggle/input/my-datasets/", split="train", trust_remote_code=True)

# Split into training (80%), and temp (20%)
split_dataset = dataset.train_test_split(test_size=0.2, seed=42)
train_dataset = split_dataset["train"]
temp_dataset = split_dataset["test"]

# Split the temporary dataset equally into evaluation (10%) and test (10%)
temp_split = temp_dataset.train_test_split(test_size=0.5, seed=42)
eval_dataset = temp_split["train"]
test_dataset = temp_split["test"]

print("Training samples:", len(train_dataset))
print("Evaluation samples:", len(eval_dataset))
print("Test samples:", len(test_dataset))

Training samples: 11330
Evaluation samples: 1416
Test samples: 1417


In [9]:
# 2. Format the Datasets for Fine-tuning
# ------------------------------
EOS_TOKEN = tokenizer.eos_token  # End-of-sequence token

def formatting_prompts_func(examples):
    inputs_list = examples["context"]  # Medical question
    cots_list = examples["input"]      # Chain-of-thought reasoning
    outputs_list = examples["output"]  # Final answer
    texts = []
    for inp, cot, out in zip(inputs_list, cots_list, outputs_list):
        text = train_prompt_style.format(inp, cot, out) + EOS_TOKEN
        texts.append(text)
    return {"text": texts}

# Map the formatting function over all splits
train_dataset_formatted = train_dataset.map(formatting_prompts_func, batched=True)
eval_dataset_formatted  = eval_dataset.map(formatting_prompts_func, batched=True)
test_dataset_formatted  = test_dataset.map(formatting_prompts_func, batched=True)

# (Optional) Inspect one formatted example
print("Formatted prompt example:\n", train_dataset_formatted["text"][6])

Map:   0%|          | 0/11330 [00:00<?, ? examples/s]

Map:   0%|          | 0/1416 [00:00<?, ? examples/s]

Map:   0%|          | 0/1417 [00:00<?, ? examples/s]

Formatted prompt example:
 Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.

### Question:
What causes pain in right upper jaw and headache?

### Response:
<think>
I have a blister filled bump on my upper front gum line I have been taking antibiotics and 2 days ago the bump popped and it drained yellow puss out  and then blood my tooth is broken off in the gum line the tooth does not hurt im curious is to what is going on
</think>
Thanks for your query, I have gone through your query.The blister bump on the gums can be a pus discharging sinus tract secondary to an infecte

In [10]:
# 3. Prepare for Fine-tuning with LoRA
# ------------------------------
# Apply LoRA to the model
model_lora = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

# Monkey-patch for safe training (restores model.generate if needed)
def safe_for_training(model, use_gradient_checkpointing=True):
    if hasattr(model, "_unwrapped_old_generate"):
        model.generate = model._unwrapped_old_generate
        try:
            del model._unwrapped_old_generate
        except AttributeError:
            pass
    return model

model_lora.for_training = lambda: safe_for_training(model_lora)

Unsloth 2025.3.19 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


In [11]:
# 4. Initialize the Fine-tuning Trainer
# ------------------------------
trainer = SFTTrainer(
    model=model_lora,
    tokenizer=tokenizer,
    train_dataset=train_dataset_formatted,
    eval_dataset=eval_dataset_formatted,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        num_train_epochs=1,
        warmup_steps=5,
        max_steps=30,
        evaluation_strategy="steps",  # Enable periodic evaluation
        eval_steps=10,                # Evaluate every 10 steps
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
    ),
)



Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/11330 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/1416 [00:00<?, ? examples/s]

In [12]:
# 5. Fine-tune the Model
# ------------------------------
trainer_stats = trainer.train()
wandb.finish()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 11,330 | Num Epochs = 1 | Total steps = 30
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss,Validation Loss
10,3.1841,2.372672
20,2.2763,2.152044
30,2.1779,2.117664


Unsloth: Not an error, but LlamaForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


0,1
eval/loss,█▂▁
eval/runtime,█▃▁
eval/samples_per_second,▁▁▁
eval/steps_per_second,▁▁▁
train/epoch,▁▁▅▅███
train/global_step,▁▁▅▅███
train/grad_norm,█▁▁
train/learning_rate,█▅▁
train/loss,█▂▁

0,1
eval/loss,2.11766
eval/runtime,1348.0076
eval/samples_per_second,1.05
eval/steps_per_second,0.525
total_flos,4285051111931904.0
train/epoch,0.02118
train/global_step,30.0
train/grad_norm,0.43229
train/learning_rate,0.0
train/loss,2.1779


In [13]:
# Call for_inference to enable Unsloth’s patched generation
FastLanguageModel.for_inference(model)

# -- OPTIONAL MONKEY-PATCH --
# If you keep getting "must call for_inference" or need to bypass fast inference:
def generate_no_fast_inference(*args, **kwargs):
    return model_lora.base_model.generate(*args, **kwargs)

# Comment out the next line if you want to preserve Unsloth's speedups:
model_lora.generate = generate_no_fast_inference

In [14]:
question = """I have osteoporosis mainly in the spine.I exercise 5 days a werk to help the situation. I don t know if there are any drugs that really help reverse the situation. What is your feeling about Prolia and its sidr effects. I have read them all. Also anything else you can suggest. I am in my 70s."""

# Load the inference model using FastLanguageModel (Unsloth optimizes for speed)
FastLanguageModel.for_inference(model_lora)  # Unsloth has 2x faster inference!

# Tokenize the input question with a specific prompt format and move it to the GPU
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response using LoRA fine-tuned model with specific parameters
outputs = model_lora.generate(
    input_ids=inputs.input_ids,          # Tokenized input IDs
    attention_mask=inputs.attention_mask, # Attention mask for padding handling
    max_new_tokens=160,                  # Maximum length for generated response
    use_cache=True,                        # Enable cache for efficient generation
)

# Decode the generated response from tokenized format to readable text
response = tokenizer.batch_decode(outputs)

# Extract and print only the model's response part after "### Response:"
print(response[0].split("### Response:")[1])


<think>
Hello Dr. I have a 5 year old boy with type 1 diabetes. His blood sugar levels are under control with insulin. He has developed a blod sugar level of 6.8 on 3 occasions. His blood sugar levels are usually under control. Should I be worried about this? Is there any risk of developing type 2 diabetes in future? Please advise.
</think>
I think that you should not worry about your son developing type 2 diabetes in the future. Type 1 diabetes is an autoimmune disease and is not related to type 2 diabetes. The blood sugar levels that you are referring to are not indicative of type 2 diabetes. However, it is important to monitor your son's blood sugar levels carefully and consult with your pediatric endocrinologist.<｜end▁of▁sentence｜>


In [15]:
question = """I was hit on the forehead and went flying off a bed and hit the back of my head I have two indents in a line from my hairline to my eye, they are about 2 cm deep and a finger wide, they are about 5 cm apart, there isnt any bruising but its tender and I have a sharp pain there I had memory loss for about 12 hours, blurred vision, headache, nausea, dizziness, a bump on the back of my head and neck pain all the way down into my back I was wondering if I should get it checked, im avoiding it because I dont want anyone mad at the person because it was an accident, so I dont want to go unless necessary"""

# Tokenize the input question with a specific prompt format and move it to the GPU
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")

# Generate a response using LoRA fine-tuned model with specific parameters
outputs = model_lora.generate(
    input_ids=inputs.input_ids,          # Tokenized input IDs
    attention_mask=inputs.attention_mask, # Attention mask for padding handling
    max_new_tokens=320,                  # Maximum length for generated response
    use_cache=True,                        # Enable cache for efficient generation
)

# Decode the generated response from tokenized format to readable text
response = tokenizer.batch_decode(outputs)

# Extract and print only the model's response part after "### Response:"
print(response[0].split("### Response:")[1])


<think>
I am 21 years old, I was hit in the head with a baseball bat and fell down and hit my head on the ground. I have a lump on the back of my head and neck pain. I have also been having severe headaches and I feel like I am going to pass out. I have been to the doctor and they said it was just a mild contusion and to take it easy. I have a large bump on the back of my head and I feel like it is going to get bigger. I am worried that it may be a blood clot or something serious. I have been taking ibuprofen and it helps a little but I still feel like I have a lump on my head. I am going to have a MRI tomorrow and I am really worried about it. I hope everything is going to be okay. I have also been having bad headaches and I feel like it is going to get worse. I hope everything is going to be okay and that it is not something serious. I am going to go to the doctor tomorrow and I hope everything is going to be okay. Thank you.
</think>
Hi I am 21 years old and I was hit in the head w

In [18]:
# 7. Deploy the Fine-tuned Model on Hugging Face
# ------------------------------
# (Make sure you are logged in using your Hugging Face token)
model_repo = "KarthikNimmagadda/DeepSeek-Latest-Finetuned-Medical-Dataset" # Update with your Hugging Face repository name
model_lora.push_to_hub(model_repo, use_auth_token=hugging_face_token)
tokenizer.push_to_hub(model_repo, use_auth_token=hugging_face_token)

README.md:   0%|          | 0.00/636 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/168M [00:00<?, ?B/s]

Saved model to https://huggingface.co/KarthikNimmagadda/DeepSeek-Latest-Finetuned-Medical-Dataset


  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]