To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

Unsloth now supports Text-to-Speech (TTS) models. Read our [guide here](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning).

Read our **[Gemma 3N Guide](https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-tune)** and check out our new **[Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)** quants which outperforms other quantization methods!

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Unsloth

`FastModel` supports loading nearly any model now! This includes Vision and Text models!

In [None]:
from unsloth import FastModel
import torch

torch._dynamo.config.cache_size_limit = 32

fourbit_models = [
    # 4bit dynamic quants for superior accuracy and low memory use
    "unsloth/gemma-3n-E4B-it-unsloth-bnb-4bit",
    "unsloth/gemma-3n-E2B-it-unsloth-bnb-4bit",
    # Pretrained models
    "unsloth/gemma-3n-E4B-unsloth-bnb-4bit",
    "unsloth/gemma-3n-E2B-unsloth-bnb-4bit",

    # Other Gemma 3 quants
    "unsloth/gemma-3-1b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-4b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
    "unsloth/gemma-3-27b-it-unsloth-bnb-4bit",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastModel.from_pretrained(
    model_name = "unsloth/gemma-3n-E4B-it",
    dtype = None, # None for auto detection
    max_seq_length = 1024, # Choose any for long context!
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    # token = "hf_...", # use one if using gated models
)

Try running `python -m bitsandbytes` then `python -m xformers.info`
We tried running `ldconfig /usr/lib64-nvidia` ourselves, but it didn't work.
You need to run in your terminal `sudo ldconfig /usr/lib64-nvidia` yourself, then import Unsloth.
Also try `sudo ldconfig /usr/local/cuda-xx.x` - find the latest cuda version.
Unsloth will still run for now, but maybe it might crash - let's hope it works!


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


  from .autonotebook import tqdm as notebook_tqdm


🦥 Unsloth Zoo will now patch everything to make training faster!


  GPU_BUFFERS = tuple([torch.empty(2*256*2048, dtype = dtype, device = f"{DEVICE_TYPE}:{i}") for i in range(n_gpus)])


==((====))==  Unsloth 2025.7.5: Fast Gemma3N patching. Transformers: 4.53.2.
   \\   /|    AMD Radeon PRO W7900. Num GPUs = 1. Max memory: 44.984 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0.dev20250718+rocm6.3. CUDA: 11.0. CUDA Toolkit: None. Triton: 3.1.0+cf34004b8a
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Gemma3N does not support SDPA - switching to eager!


Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.52s/it]


# Gemma 3N can process Text, Vision and Audio!

Let's first experience how Gemma 3N can handle multimodal inputs. We use Gemma 3N's recommended settings of `temperature = 1.0, top_p = 0.95, top_k = 64`

In [None]:
from transformers import TextStreamer
import gc
# Helper function for inference
def do_gemma_3n_inference(model, tokenizer, messages, max_new_tokens=128):
    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    ).to("cuda")

    # generate returns the full sequence of IDs because we dropped `streamer=…`
    out_ids = model.generate(
        **inputs,
        max_new_tokens=max_new_tokens,
        temperature=1.15,
        top_p=0.95,
        top_k=64,
    )

    # slice off the prompt part and decode
    gen_ids = out_ids[0][inputs["input_ids"].shape[-1]:]
    text = tokenizer.decode(gen_ids, skip_special_tokens=True)

    del inputs
    torch.cuda.empty_cache()
    gc.collect()
    return text

# Gemma 3N can see images!

<img src="https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg" alt="Alt text" height="256">

In [None]:
sloth_link = "https://files.worldwildlife.org/wwfcmsprod/images/Sloth_Sitting_iStock_3_12_2014/story_full_width/8l7pbjmj29_iStock_000011145477Large_mini__1_.jpg"

messages = [{
    "role" : "user",
    "content": [
        { "type": "image", "image" : sloth_link },
        { "type": "text",  "text" : "Which films does this animal feature in?" }
    ]
}]
# You might have to wait 1 minute for Unsloth's auto compiler
do_gemma_3n_inference(model, tokenizer = tokenizer, messages = messages, max_new_tokens = 256)

'This adorable animal is a **sloth**, and it has featured in several films! Here are some notable ones:\n\n* **Zootopia (2016):** The character Judy Hopps has a playful interaction with a sloth named Bellwether.\n* **The Jungle Book 2 (2003):** A sloth named Costa helps Mowgli and Baloo on their journey.\n* **Madagascar (2005):** Sloths appear in the background and add to the vibrant wildlife of the film.\n* **Kung Fu Panda 3 (2016):** The sloth Shen has a memorable and important role in the movie. \n\nSloths are popular characters in animation and live-action films due to their unique and endearing nature! \n\n\n\n'

In [None]:
import csv, time
from pathlib import Path
from tqdm import tqdm

# ------------------------------------------------------------------ CONFIG
SLEEP       = 0.05

TARGET_LANGS = [
    "Chinese","Hindi","Spanish","Arabic","French","Bengali","Portuguese",
    "Russian","Indonesian","Urdu","German","Japanese","Nigerian Pidgin",
    "Marathi","Vietnamese","Telugu","Hausa","Turkish","Swahili","Tagalog",
    "Tamil","Korean","Thai","Javanese","Italian","Hebrew"
]

SRC  = Path("./plant_state_descriptions.csv")
DEST = Path("./plant_state_descriptions_output.csv")

# ------------------------------------------------------------- HELPERS
def translate(text: str, language: str) -> str:
    prompt = f"Translate this to {language} (only return translated text no explanations): {text}"
    out = do_gemma_3n_inference(
        model,
        tokenizer = tokenizer,
        messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}],
        max_new_tokens=512
    )
    out = out if isinstance(out, str) else str(out)
    return out

# -------------------------------------------------------------- MAIN
with SRC.open(newline="", encoding="utf-8") as fin:
    rows = [r for r in csv.reader(fin)][1:]        # skip header
english_rows = [r for r in rows if r[2] == "English"]

total = len(english_rows) * (1 + len(TARGET_LANGS))
with DEST.open("w", newline="", encoding="utf-8") as fout, tqdm(total=total, desc="Rows") as bar:
    w = csv.writer(fout)
    w.writerow(["Plant", "State", "Language", "Text"])

    for plant, state, lang, text in english_rows:
        w.writerow([plant, state, "English", text]);    bar.update(1)

        for tgt in TARGET_LANGS:
            translated = translate(text, tgt)
            w.writerow([plant, state, tgt, translated]); bar.update(1)
            time.sleep(SLEEP)


Rows: 100%|██████████████████████████████████████████████████████████████████████| 5130/5130 [18:40:28<00:00, 13.11s/it]
