To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://unsloth.ai/docs/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a> Join Discord if you need help + ‚≠ê <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ‚≠ê
</div>

To install Unsloth on your local device, follow [our guide](https://unsloth.ai/docs/get-started/install). This notebook is licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & how to save it

### News

Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. [Blog](https://unsloth.ai/docs/new/faster-moe)

You can now train embedding models 1.8-3.3x faster with 20% less VRAM. [Blog](https://unsloth.ai/docs/new/embedding-finetuning)

Ultra Long-Context Reinforcement Learning is here with 7x more context windows! [Blog](https://unsloth.ai/docs/new/grpo-long-context)

3x faster LLM training with 30% less VRAM and 500K context. [3x faster](https://unsloth.ai/docs/new/3x-faster-training-packing) ‚Ä¢ [500K Context](https://unsloth.ai/docs/new/500k-context-length-fine-tuning)

New in Reinforcement Learning: [FP8 RL](https://unsloth.ai/docs/new/fp8-reinforcement-learning) ‚Ä¢ [Vision RL](https://unsloth.ai/docs/new/vision-reinforcement-learning-vlm-rl) ‚Ä¢ [Standby](https://unsloth.ai/docs/basics/memory-efficient-rl) ‚Ä¢ [gpt-oss RL](https://unsloth.ai/docs/new/gpt-oss-reinforcement-learning)

Visit our docs for all our [model uploads](https://unsloth.ai/docs/get-started/unsloth-model-catalog) and [notebooks](https://unsloth.ai/docs/get-started/unsloth-notebooks).

### Installation

In [1]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth  # Do this in local & cloud setups
else:
    import torch; v = re.match(r'[\d]{1,}\.[\d]{1,}', str(torch.__version__)).group(0)
    xformers = 'xformers==' + {'2.10':'0.0.34','2.9':'0.0.33.post1','2.8':'0.0.32.post2'}.get(v, "0.0.34")
    !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth_zoo bitsandbytes accelerate {xformers} peft trl triton unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2

### Unsloth

In [2]:
from unsloth import FastVisionModel # FastLanguageModel for LLMs
import torch

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit", # Llama 3.2 vision support
    "unsloth/Llama-3.2-11B-Vision-bnb-4bit",
    "unsloth/Llama-3.2-90B-Vision-Instruct-bnb-4bit", # Can fit in a 80GB card!
    "unsloth/Llama-3.2-90B-Vision-bnb-4bit",

    "unsloth/Pixtral-12B-2409-bnb-4bit",              # Pixtral fits in 16GB!
    "unsloth/Pixtral-12B-Base-2409-bnb-4bit",         # Pixtral base model

    "unsloth/Qwen2-VL-2B-Instruct-bnb-4bit",          # Qwen2 VL support
    "unsloth/Qwen2-VL-7B-Instruct-bnb-4bit",
    "unsloth/Qwen2-VL-72B-Instruct-bnb-4bit",

    "unsloth/llava-v1.6-mistral-7b-hf-bnb-4bit",      # Any Llava variant works!
    "unsloth/llava-1.5-7b-hf-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, processor = FastVisionModel.from_pretrained(
    "unsloth/medgemma-4b-it-bnb-4bit",
    load_in_4bit = True, # Use 4bit to reduce memory use. False for 16bit LoRA.
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for long context
)

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2026.2.1: Fast Gemma3 patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gemma3 won't work! Using float32.


model.safetensors:   0%|          | 0.00/3.23G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/210 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/70.0 [00:00<?, ?B/s]

chat_template.json: 0.00B [00:00, ?B/s]

preprocessor_config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/4.69M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/35.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/670 [00:00<?, ?B/s]

We now add LoRA adapters for parameter efficient fine-tuning, allowing us to train only 1% of all model parameters efficiently.

**[NEW]** We also support fine-tuning only the vision component, only the language component, or both. Additionally, you can choose to fine-tune the attention modules, the MLP layers, or both!

In [3]:
model = FastVisionModel.get_peft_model(
    model,
    finetune_vision_layers     = True, # False if not finetuning vision layers
    finetune_language_layers   = True, # False if not finetuning language layers
    finetune_attention_modules = True, # False if not finetuning attention layers
    finetune_mlp_modules       = True, # False if not finetuning MLP layers

    r = 16,                           # The larger, the higher the accuracy, but might overfit
    lora_alpha = 16,                  # Recommended alpha == r at least
    lora_dropout = 0,
    bias = "none",
    random_state = 3407,
    use_rslora = False,               # We support rank stabilized LoRA
    loftq_config = None,               # And LoftQ
    target_modules = "all-linear",    # Optional now! Can specify a list if needed
)

Unsloth: Making `base_model.model.model.vision_tower.vision_model` require gradients


<a name="Data"></a>
### Data Prep
We'll use a sampled dataset of handwritten math formulas. The objective is to convert these images into a computer-readable format‚Äîspecifically LaTeX‚Äîso they can be rendered. This is particularly useful for complex expressions.

You can access the dataset [here](https://huggingface.co/datasets/unsloth/LaTeX_OCR). The full dataset is [here](https://huggingface.co/datasets/linxy/LaTeX_OCR).

In [6]:
from datasets import load_dataset

# Load CRC dataset directly from HuggingFace ‚Äî no downloading needed
dataset = load_dataset("1aurent/NCT-CRC-HE", split="NCT_CRC_HE_100K")

# Use a small subset to stay safe on T4 memory
dataset = dataset.train_test_split(train_size=3000, test_size=200, seed=42)
train_data = dataset["train"]
test_data  = dataset["test"]

print(f"Train size: {len(train_data)}, Test size: {len(test_data)}")
print(f"Example: {train_data[0]}")

Resolving data files:   0%|          | 0/31 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/31 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/31 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/31 [00:00<?, ?it/s]

Loading dataset shards:   0%|          | 0/31 [00:00<?, ?it/s]

Train size: 3000, Test size: 200
Example: {'image': <PIL.Image.Image image mode=RGB size=224x224 at 0x7F06B944CAD0>, 'label': 7}


In [7]:
# Tissue class labels (same as original notebook)
TISSUE_CLASSES = [
    "ADI", "BACK", "DEB", "LYM", "MUC", "MUS", "NORM", "STR", "TUM"
]

def format_data(sample):
    label_name = TISSUE_CLASSES[sample["label"]]
    return {
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "image", "image": sample["image"]},
                    {"type": "text",  "text": "What type of tissue is shown in this histological image? Choose from: ADI, BACK, DEB, LYM, MUC, MUS, NORM, STR, TUM."}
                ]
            },
            {
                "role": "assistant",
                "content": [
                    {"type": "text", "text": label_name}
                ]
            }
        ]
    }

train_data = [format_data(s) for s in train_data]
test_data  = [format_data(s) for s in test_data]

To format the dataset, all vision fine-tuning tasks should follow this format:

```python
[
    {
        "role": "user",
        "content": [
            {"type": "text", "text": instruction},
            {"type": "image", "image": sample["image"]},
        ],
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": instruction},
            {"type": "image", "image": sample["image"]},
        ],
    },
]
```

Lets take the Gemma 3 instruction chat template and use it in our base model

In [8]:
from unsloth import get_chat_template

processor = get_chat_template(
    processor,
    "gemma-3"
)

Before fine-tuning, let us evaluate the base model's performance. We do not expect strong results, as it has not encountered this chat template before.

You can see it's absolutely terrible! It doesn't follow instructions at all

<a name="Train"></a>
### Train the model
Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support `DPOTrainer` and `GRPOTrainer` for reinforcement learning!!

We use our new `UnslothVisionDataCollator` which will help in our vision finetuning setup.

In [9]:
from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig

FastVisionModel.for_training(model)

trainer = SFTTrainer(
    model = model,
    train_dataset = train_data,                          # ‚Üê Change 1
    processing_class = processor.tokenizer,
    data_collator = UnslothVisionDataCollator(model, processor),
    args = SFTConfig(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 4,
        gradient_checkpointing = True,
        gradient_checkpointing_kwargs = {"use_reentrant": False},
        max_grad_norm = 0.3,
        warmup_ratio = 0.03,
        max_steps = 100,                                 # ‚Üê Change 2
        learning_rate = 2e-4,
        logging_steps = 1,
        save_strategy = "steps",
        optim = "adamw_8bit",                            # ‚Üê Change 3
        weight_decay = 0.001,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none",
        remove_unused_columns = False,
        dataset_text_field = "",
        dataset_kwargs = {"skip_prepare_dataset": True},
        max_length = 2048,
    )
)

trainer.train()

Unsloth: Switching to float32 training since model cannot work with float16


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 3,000 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 38,497,792 of 4,338,577,264 (0.89% trained)


Step,Training Loss
1,6.016
2,6.024
3,5.5809
4,4.8226
5,4.0826
6,3.3871
7,2.7673
8,2.3078
9,1.8107
10,1.3532


TrainOutput(global_step=100, training_loss=0.4483689033519477, metrics={'train_runtime': 5773.0994, 'train_samples_per_second': 0.069, 'train_steps_per_second': 0.017, 'total_flos': 2711222257990848.0, 'train_loss': 0.4483689033519477, 'epoch': 0.13333333333333333})

In [15]:
from transformers import pipeline
import torch

# --- Load pretrained (baseline) model separately using FastVisionModel ---
baseline_model, baseline_processor = FastVisionModel.from_pretrained(
    "unsloth/medgemma-4b-it-bnb-4bit",
    load_in_4bit = True, # Use 4bit to reduce memory use.
    device_map = "cpu", # Explicitly load on CPU to avoid GPU OOM
)
baseline_model.eval() # Set to evaluation mode

# --- Load fine-tuned pipeline ---
FastVisionModel.for_inference(model)  # switch model to inference mode

# --- Run on 100 test samples ---
test_100 = test_data[:100]

def get_prediction(current_model, current_processor, sample):
    messages = sample["messages"][:-1]
    inputs = current_processor.tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt"
    )

    # Extract the image from the first user message's content
    image_content = next((item for item in messages[0]["content"] if item["type"] == "image"), None)
    image = image_content["image"] if image_content else None

    # Process inputs including image and move to the model's device
    processed_inputs = current_processor(
        image,
        inputs.to(current_model.device), # Move inputs to the model's device
        add_special_tokens=False,
        return_tensors="pt",
    )

    with torch.no_grad():
        out = current_model.generate(**processed_inputs, max_new_tokens=10)
    decoded = current_processor.tokenizer.decode(out[0], skip_special_tokens=True).strip().upper()

    pred = next((l for l in TISSUE_CLASSES if l in decoded), "UNKNOWN")
    return pred

print("Running baseline evaluation...")
pt_preds = [get_prediction(baseline_model, baseline_processor, s) for s in test_100]

print("Running fine-tuned evaluation...")
ft_preds = [get_prediction(model, processor, s) for s in test_100]

true_labels = [s["messages"][1]["content"][0]["text"] for s in test_100]

==((====))==  Unsloth 2026.2.1: Fast Gemma3 patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.563 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gemma3 won't work! Using float32.


ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, ConfusionMatrixDisplay

# Compute metrics
pt_acc = accuracy_score(true_labels, pt_preds)
ft_acc = accuracy_score(true_labels, ft_preds)
pt_f1  = f1_score(true_labels, pt_preds, average="weighted", zero_division=0)
ft_f1  = f1_score(true_labels, ft_preds, average="weighted", zero_division=0)

print(f"Pretrained  ‚Äî Accuracy: {pt_acc:.3f} | F1: {pt_f1:.3f}")
print(f"Fine-tuned  ‚Äî Accuracy: {ft_acc:.3f} | F1: {ft_f1:.3f}")

# --- Bar Chart ---
fig, ax = plt.subplots(figsize=(8, 5))
x = np.arange(2)
width = 0.35
b1 = ax.bar(x - width/2, [pt_acc, ft_acc], width, label="Accuracy", color="steelblue")
b2 = ax.bar(x + width/2, [pt_f1,  ft_f1],  width, label="F1 Score",  color="coral")
ax.set_xticks(x)
ax.set_xticklabels(["Pretrained MedGemma", "Fine-tuned MedGemma"])
ax.set_ylim(0, 1.15)
ax.set_ylabel("Score")
ax.set_title("Pretrained vs Fine-tuned MedGemma on CRC Tissue Classification (100 samples)")
ax.legend()
for bar in list(b1) + list(b2):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,
            f"{bar.get_height():.2f}", ha="center", fontsize=10)
plt.tight_layout()
plt.savefig("comparison_chart.png", dpi=150)
plt.show()

# --- Confusion Matrix (fine-tuned only) ---
cm = confusion_matrix(true_labels, ft_preds, labels=TISSUE_CLASSES)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=TISSUE_CLASSES)
fig2, ax2 = plt.subplots(figsize=(10, 8))
disp.plot(ax=ax2, xticks_rotation=45, colorbar=False)
ax2.set_title("Fine-tuned MedGemma ‚Äî Confusion Matrix (100 samples)")
plt.tight_layout()
plt.savefig("confusion_matrix.png", dpi=150)
plt.show()

In [None]:
# Save and push to HuggingFace Hub
from google.colab import userdata

hf_token = userdata.get("HF_TOKEN")

model.push_to_hub("Bimokuncoro/medgemma-4b-crc-finetuned", token=hf_token)
processor.push_to_hub("Bimokuncoro/medgemma-4b-crc-finetuned", token=hf_token)
print("Model successfully pushed to HuggingFace!")

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other resources:
1. Looking to use Unsloth locally? Read our [Installation Guide](https://unsloth.ai/docs/get-started/install) for details on installing Unsloth on Windows, Docker, AMD, Intel GPUs.
2. Learn how to do Reinforcement Learning with our [RL Guide and notebooks](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide).
3. Read our guides and notebooks for [Text-to-speech (TTS)](https://unsloth.ai/docs/basics/text-to-speech-tts-fine-tuning) and [vision](https://unsloth.ai/docs/basics/vision-fine-tuning) model support.
4. Explore our [LLM Tutorials Directory](https://unsloth.ai/docs/models/tutorials-how-to-fine-tune-and-run-llms) to find dedicated guides for each model.
5. Need help with Inference? Read our [Inference & Deployment page](https://unsloth.ai/docs/basics/inference-and-deployment) for details on using vLLM, llama.cpp, Ollama etc.

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://unsloth.ai/docs/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ‚≠êÔ∏è <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ‚≠êÔ∏è

  This notebook and all Unsloth notebooks are licensed [LGPL-3.0](https://github.com/unslothai/notebooks?tab=LGPL-3.0-1-ov-file#readme)
</div>