# Ultimate Gemma 3n Guide - know how to get the inference and fine-tune Gemma 3n model

Please **upvote** this notebook, if you find this useful. 

# 1. Setup and Installation

In [1]:
!pip install -q timm==1.0.17
!pip install -q transformers==4.53.2

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.9/59.9 kB[0m [31m485.7 kB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m34.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m32.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m39.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━

# 2. Imports

In [2]:
import kagglehub
import torch
import gc

from transformers import AutoProcessor, AutoModelForImageTextToText

2025-07-28 12:49:33.999028: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753706974.193733      19 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753706974.248661      19 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


# 3. Load Gemma model 

In [3]:
# import Gemma3n - 2B model from kaggle hub 
# for more model variations check here - https://www.kaggle.com/models/google/gemma-3n/transformers
gemma3n_2b_model_path = kagglehub.model_download("google/gemma-3n/transformers/gemma-3n-e2b-it")

In [4]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

processor = AutoProcessor.from_pretrained(gemma3n_2b_model_path)
model = AutoModelForImageTextToText.from_pretrained(gemma3n_2b_model_path, torch_dtype="auto").to(device)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

# 4. Helper function for model inference

In [5]:
def generate(messages):
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt"
    ).to(device, dtype=model.dtype)
    
    outputs = model.generate(**inputs, max_new_tokens=512, disable_compile=True)
    text = processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])
    
    # clean-up the variables to free-up GPU RAM
    del inputs
    del outputs
    torch.cuda.empty_cache()
    gc.collect()
    
    return text

# 5. Inference - Text 

In [6]:
prompt = """It was a dark and stormy night in Gotham city. So far way there was an"""

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": prompt}
        ]
    }
]

generate(messages)

"It was a dark and stormy night in Gotham City. So far away there was an **unfamiliar screech echoing through the rain-slicked streets, a sound that didn't belong to any bat, cat, or creature known to the city’s denizens.** \n\nThe wind howled, whipping the rain into a frenzy, and lightning illuminated a figure silhouetted against a flickering neon sign – something…different. It wasn't a man, nor a beast, but a creature of impossible angles and shimmering scales, leaving a trail of ozone and unsettling silence in its wake. \n\n\n\n<end_of_turn>"

# 6. Inference - Image

In [7]:
image_url = "https://source.roboflow.com/v2IDbvwf8vFhER7eeJsv/06k5H3MN6JnHgM4Ox7SE/original.jpg"

In [8]:
from IPython.display import Image
Image(url=image_url,height=480,width=480)

In [9]:
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image_url},
            {"type": "text", "text": "Check this Make a list and discuss section in the image and give me some places"}
        ]
    }
]
generate(messages)

"## Make a List and Discuss: Places Affected by Landslides\n\nBased on the provided text, here are some examples of places that might be susceptible to landslides and the reasons why:\n\n**1. Mountainous Regions:** \n* **Reasons:** Steep slopes increase the risk of soil erosion and instability. Heavy rainfall can saturate the soil, making it even more prone to landslides. \n* **Examples:** The Himalayas (India, Nepal, Bhutan), the Andes (South America), the Alps (Europe), the Rocky Mountains (North America).\n\n**2. Areas with Heavy Rainfall:**\n* **Reasons:** Prolonged periods of heavy rain can lead to waterlogging and soil saturation, weakening the soil's structure and increasing the risk of landslides.\n* **Examples:** The monsoon regions of India, Southeast Asia, parts of Africa, and South America.\n\n**3. Areas with Deforestation:**\n* **Reasons:** Clearing forests for agriculture, urbanization, or other purposes removes the natural barrier that helps hold soil together. This can 

# 7. Inference - Audio

In [10]:
from IPython.display import Audio, display
Audio("https://erogol.com/ddc-samples/wavs/s1.wav")

In [11]:
!wget -qqq https://erogol.com/ddc-samples/wavs/s1.wav -O audio.wav

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [12]:
audio_file = "audio.wav"

messages = [{
    "role" : "user",
    "content": [
        { "type": "audio", "audio" : audio_file },
        { "type": "text",  "text" : "What is this audio about?" }
    ]
}]
generate(messages)

'This audio excerpt appears to be from a **podcast or spoken word piece** focusing on **critical thinking and intellectual honesty**. \n\nThe speaker is describing how Bil got used to questioning the truthfulness of things. He would actively ask "Is that for true?" and if he wasn\'t completely sure, he would accept it as being correct and move on. \n\nThis suggests the audio explores themes of:\n\n* **Skepticism:** The act of questioning claims.\n* **Doubt:** The possibility of uncertainty.\n* **Intellectual Honesty:**  Acknowledging uncertainty rather than blindly accepting information. \n* **The importance of questioning:** Encouraging listeners to think critically about what they hear. \n\nIt\'s likely a segment aimed at promoting a thoughtful and discerning approach to information and beliefs.<end_of_turn>'

# 8. Fine-tuning Gemma-3n model

**Using HF transformers library for fine-tuning Gemma for text generation tasks**
https://ai.google.dev/gemma/docs/core/huggingface_text_finetune_qlora

**Using HF transformers library for fine-tuning Gemma for Vision tasks**

https://ai.google.dev/gemma/docs/core/huggingface_vision_finetune_qlora


**Unsloth library based fine-tuning**

See this notebook for detailed instructions - https://www.kaggle.com/competitions/google-gemma-3n-hackathon/discussion/587725/
