# ðŸ§  MedChatGuard - Unsloth Fine-Tuning in Colab
Fine-tune a long-context LLM (`unsloth/gemma-3-4b-it-unsloth-bnb-4bit`) on synthetic EHR data using QLoRA.


### Install Dependencies

In [None]:
!pip install -q unsloth datasets trl accelerate peft transformers bitsandbytes huggingface_hub

### Load SQuAD-style dataset from Drive

In [None]:
# âœ… Mount Google Drive to access dataset and save model
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

DATA_PATH = "/content/drive/MyDrive/Colab Notebooks/FineTuning/preprocessed"

In [None]:
# âœ… Load dataset from Google Drive
from datasets import load_from_disk

dataset = load_from_disk(DATA_PATH)

### Load model and tokenizer

In [None]:
from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

tokenizer.padding_side = "right"
tokenizer.pad_token = tokenizer.eos_token

model = FastLanguageModel.get_peft_model(
    model,
    r = 64,
    target_modules = ["q_proj", "v_proj"],
    lora_alpha = 16,
    lora_dropout = 0.05,
    bias = "none",
    use_gradient_checkpointing = True,
)

ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.
ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.19: Fast Gemma3 patching. Transformers: 4.50.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gemma3 won't work! Using float32.


Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


Unsloth: Making `base_model.model.vision_tower.vision_model` require gradients


### Fine-tune

In [None]:
from transformers import TrainingArguments
from trl import SFTTrainer
import os

# os.environ["FLASH_ATTENTION_DISABLE"] = "1"

args = TrainingArguments(
    output_dir = "/content/drive/MyDrive/Colab Notebooks/FineTuning/finetuned_model",
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,
    num_train_epochs = 3,
    learning_rate = 2e-4,
    save_strategy = "epoch",
    logging_steps = 10,
    bf16 = False,
    fp16 = False,
    report_to = "none",
)

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    # formatting_func = formatting_func,
    args = args,
    max_seq_length = 2048,
)

trainer.train()

Unsloth: Switching to float32 training since model cannot work with float16


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 110 | Num Epochs = 3 | Total steps = 39
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 25,788,416/4,000,000,000 (0.64% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,31.4015
20,0.0
30,0.0


TrainOutput(global_step=39, training_loss=8.051665477263622, metrics={'train_runtime': 1172.9419, 'train_samples_per_second': 0.281, 'train_steps_per_second': 0.033, 'total_flos': 1.435197826400256e+16, 'train_loss': 8.051665477263622})

### Merge and Save

In [None]:
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_PATH = "/content/drive/MyDrive/Colab Notebooks/FineTuning/finetuned_model/checkpoint-39"
SAVE_PATH = "/content/drive/MyDrive/Colab Notebooks/FineTuning/cpu_model"
MODEL = "unsloth/gemma-3-4b-it-unsloth-bnb-4bit"

# Load base model (e.g., gemma-3b or whatever you started from)
base_model = AutoModelForCausalLM.from_pretrained(
    MODEL,
    device_map="auto",
    torch_dtype="float32"
)

# Load adapter on top of base
model = PeftModel.from_pretrained(base_model, MODEL_PATH)

# Merge weights
model = model.merge_and_unload()

# Save merged model
model.save_pretrained(SAVE_PATH, safe_serialization=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL)
tokenizer.save_pretrained(SAVE_PATH)