To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

Unsloth now supports Text-to-Speech (TTS) models. Read our [guide here](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning).

Read our **[Gemma 3N Guide](https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-tune)** and check out our new **[Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)** quants which outperforms other quantization methods!

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [None]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth

In [None]:
%%capture
# Install latest transformers for Gemma 3N
!pip install --no-deps --upgrade timm # Only for Gemma 3N

### Unsloth

`FastModel` supports loading nearly any model now! This includes Vision and Text models!

In [None]:
from unsloth import FastModel
import torch

fourbit_models = [
    # 4bit dynamic quants for superior accuracy and low memory use
    "unsloth/gemma-3n-E4B-it-unsloth-bnb-4bit",
    "unsloth/gemma-3n-E2B-it-unsloth-bnb-4bit",
    # Pretrained models
    "unsloth/gemma-3n-E4B-unsloth-bnb-4bit",
    "unsloth/gemma-3n-E2B-unsloth-bnb-4bit",

    # Other Gemma 3 quants
    "unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-27b-it-unsloth-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3n-E4B-it",
    dtype = None, # None for auto detection
    max_seq_length = 1024, # Choose any for long context!
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    # token = "hf_...", # use one if using gated models
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.8.1: Fast Gemma3N patching. Transformers: 4.54.0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Gemma3N does not support SDPA - switching to eager!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/3.72G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/1.15G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/210 [00:00<?, ?B/s]

processor_config.json:   0%|          | 0.00/98.0 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

preprocessor_config.json: 0.00B [00:00, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/4.70M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/33.4M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/777 [00:00<?, ?B/s]

In [None]:
model.eval()

Gemma3nForConditionalGeneration(
  (model): Gemma3nModel(
    (vision_tower): TimmWrapperModel(
      (timm_model): MobileNetV5Encoder(
        (conv_stem): ConvNormAct(
          (conv): Conv2dSame(3, 64, kernel_size=(3, 3), stride=(2, 2))
          (bn): RmsNormAct2d(
            (drop): Identity()
            (act): GELU(approximate='tanh')
          )
        )
        (blocks): Sequential(
          (0): Sequential(
            (0): EdgeResidual(
              (conv_exp): Conv2dSame(64, 256, kernel_size=(3, 3), stride=(2, 2), bias=False)
              (bn1): RmsNormAct2d(
                (drop): Identity()
                (act): GELU(approximate='tanh')
              )
              (aa): Identity()
              (se): Identity()
              (conv_pwl): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
              (bn2): RmsNormAct2d(
                (drop): Identity()
                (act): Identity()
              )
              (drop_path): Identity()
       

In [None]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
from datasets import load_dataset
#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
# Login using e.g. `huggingface-cli login` to access this dataset
dataset = load_dataset("ai4bharat/Indic-Bias", "stereotype-judgement", split="train")

train-00000-of-00001.parquet:   0%|          | 0.00/1.37M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5285 [00:00<?, ? examples/s]

In [None]:
import pandas as pd
import itertools

def generate_prompts_from_identity_columns(df_templates):
    """
    df_templates: DataFrame with columns ['template', 'identity_type', 'identity', ...]
    Returns: DataFrame with filled prompt variants for each identity pair
    """
    all_prompts = []

    # Step 1: Group identities by identity_type
    identity_groups = df_templates.groupby("identity_type")["identity"].unique().to_dict()

    # Step 2: For each template row, generate identity pairs from its identity_type
    for idx, row in df_templates.iterrows():
        template = row["template"]
        identity_type = row["identity_type"]
        stereotype = row.get("stereotype", "")
        category = row.get("category", "")
        template_id = row.get("template_id", "")

        if "<identity_1>" not in template or "<identity_2>" not in template:
            continue  # skip if it's not a pairwise prompt

        identities = identity_groups.get(identity_type, [])
        identity_pairs = list(itertools.permutations(identities, 2))

        for id1, id2 in identity_pairs:
            filled_prompt = template.replace("<identity_1>", id1).replace("<identity_2>", id2)
            all_prompts.append({
                "template_id": template_id,
                "identity_type": identity_type,
                "category": category,
                "stereotype": stereotype,
                "identity_1": id1,
                "identity_2": id2,
                "prompt": filled_prompt
            })

    return pd.DataFrame(all_prompts)


In [None]:
import pandas as pd

# Convert HF dataset to pandas dataframe
df = pd.DataFrame(dataset)

# Preview the columns
print(df.columns)

# Generate pairwise prompts
df_pairwise_prompts = generate_prompts_from_identity_columns(df)

# Preview
print(df_pairwise_prompts.head())

# Save to CSV
df_pairwise_prompts.to_csv("identity_pair_filled_prompts.csv", index=False)


Index(['template', 'identity_type', 'identity', 'category', 'stereotype',
       'template_id'],
      dtype='object')
                            template_id identity_type  \
0  740b2d59-5847-4ec1-b095-1fe7d3ec0fa0      religion   
1  740b2d59-5847-4ec1-b095-1fe7d3ec0fa0      religion   
2  740b2d59-5847-4ec1-b095-1fe7d3ec0fa0      religion   
3  740b2d59-5847-4ec1-b095-1fe7d3ec0fa0      religion   
4  740b2d59-5847-4ec1-b095-1fe7d3ec0fa0      religion   

                            category                     stereotype  \
0  political and societal engagement  intolerant of other religions   
1  political and societal engagement  intolerant of other religions   
2  political and societal engagement  intolerant of other religions   
3  political and societal engagement  intolerant of other religions   
4  political and societal engagement  intolerant of other religions   

  identity_1 identity_2                                             prompt  
0      Hindu     Muslim  A neighbo

In [None]:



from transformers import TextStreamer

def do_gemma_3n_inference_return(messages, max_new_tokens = 128):
    input_ids = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_tensors="pt"
    ).to("cuda")

    outputs = model.generate(
        input_ids,
        max_new_tokens=max_new_tokens,
        temperature=1.0,
        top_p=0.95,
        top_k=64
    )

    return tokenizer.decode(outputs[0], skip_special_tokens=True)


In [None]:
results = []
for _, row in df_pairwise_prompts.head(10).iterrows():
    print(type(row))
    prompt      = row["prompt"]
    identity_1  = row["identity_1"]
    identity_2  = row["identity_2"]
    # original messages + wrap string->[{type,text}]
    messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}]

    response = do_gemma_3n_inference_return(messages)

    results.append({
        "template_id":   row["template_id"],
        "identity_type": row["identity_type"],
        "category":      row["category"],
        "stereotype":    row["stereotype"],
        "identity_1":    identity_1,
        "identity_2":    identity_2,
        "prompt":        prompt,
        "response":      response
    })

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>
<class 'pandas.core.series.Series'>


In [None]:
print(type(df_pairwise_prompts["prompt"][0]))


In [None]:
df_results = pd.DataFrame(results)
df_results.to_csv("gemma3n_bias_test_outputs.csv", index=False)
