To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

**NEW** Unsloth now supports training the new **gpt-oss** model from OpenAI! You can start finetune gpt-oss for free with our **[Colab notebook](https://x.com/UnslothAI/status/1953896997867729075)**!

Unsloth now supports Text-to-Speech (TTS) models. Read our [guide here](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning).

Read our **[Gemma 3N Guide](https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-tune)** and check out our new **[Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)** quants which outperforms other quantization methods!

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [1]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9\.]{3,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.32.post2" if v == "2.8.0" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.55.4

In [5]:
!pip install sympy==1.12

Collecting sympy==1.12
  Downloading sympy-1.12-py3-none-any.whl.metadata (12 kB)
Downloading sympy-1.12-py3-none-any.whl (5.7 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/5.7 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m5.7/5.7 MB[0m [31m237.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.7/5.7 MB[0m [31m123.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sympy
  Attempting uninstall: sympy
    Found existing installation: sympy 1.13.3
    Uninstalling sympy-1.13.3:
      Successfully uninstalled sympy-1.13.3
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.8.0+cu126 requires sympy>=1.13.3, but you have sympy 1.12 which is incompatible.[0m[31m
[0mSuccessfully installed sympy-1.12


### Unsloth

#### Text Completion / Raw Text Training
This is a community notebook collaboration with [Mithex].

We train on `Tiny Stories` (link [here](https://huggingface.co/datasets/roneneldan/TinyStories)) which is a collection of small stories. For example:
```
Once upon a time, there was a little car named Beep. Beep loved to go fast and play in the sun.
Beep was a healthy car because he always had good fuel....
```
Instead of `Alpaca`'s Question Answer format, one only needs 1 column - the `"text"` column. This means you can finetune on any dataset and let your model act as a text completion model, like for novel writing.


In [2]:
%env UNSLOTH_RETURN_LOGITS=1 # Run this to disable CCE since it is not supported for CPT

env: UNSLOTH_RETURN_LOGITS=1 # Run this to disable CCE since it is not supported for CPT


In [5]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = False # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-4B-Instruct-2507", # "unsloth/mistral-7b" for 16bit loading
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_8bit = True, # [NEW!] A bit more accurate, uses 2x memory
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

==((====))==  Unsloth 2025.8.10: Fast Qwen3 patching. Transformers: 4.55.4.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.161 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.08G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/237 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/707 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

We also add `embed_tokens` and `lm_head` to allow the model to learn out of distribution data.

In [6]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 128, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",

                      "embed_tokens", "lm_head",], # Add for continual pretraining
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)



Unsloth: Making `model.base_model.model.model.embed_tokens` require gradients


<a name="Data"></a>
### Data Prep
We now use the Tiny Stories dataset from https://huggingface.co/datasets/roneneldan/TinyStories. We only sample the first 5000 rows to speed training up. We must add `EOS_TOKEN` or `tokenizer.eos_token` or else the model's generation will go on forever.

If you want to use the `llama-3` template for ShareGPT datasets, try our conversational [notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb)

In [7]:
from datasets import load_dataset
dataset = load_dataset("corniclr25/stack-mined-java-v1", split = "java")
EOS_TOKEN = tokenizer.eos_token
def formatting_prompts_func(examples):
    return { "content" : [example + EOS_TOKEN for example in examples["content"]] }
dataset = dataset.map(formatting_prompts_func, batched = True,)

README.md: 0.00B [00:00, ?B/s]

Resolving data files:   0%|          | 0/57 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/74 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/177 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/108 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/130 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/57 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/74 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/177 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/108 [00:00<?, ?it/s]

Resolving data files:   0%|          | 0/130 [00:00<?, ?it/s]

Downloading data:   0%|          | 0/57 [00:00<?, ?files/s]

cpp-00000-of-00057.parquet:   0%|          | 0.00/173M [00:00<?, ?B/s]

cpp-00001-of-00057.parquet:   0%|          | 0.00/168M [00:00<?, ?B/s]

cpp-00002-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00003-of-00057.parquet:   0%|          | 0.00/170M [00:00<?, ?B/s]

cpp-00004-of-00057.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

cpp-00005-of-00057.parquet:   0%|          | 0.00/173M [00:00<?, ?B/s]

cpp-00006-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00007-of-00057.parquet:   0%|          | 0.00/171M [00:00<?, ?B/s]

cpp-00008-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00009-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00010-of-00057.parquet:   0%|          | 0.00/171M [00:00<?, ?B/s]

cpp-00011-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00012-of-00057.parquet:   0%|          | 0.00/170M [00:00<?, ?B/s]

cpp-00013-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00014-of-00057.parquet:   0%|          | 0.00/173M [00:00<?, ?B/s]

cpp-00015-of-00057.parquet:   0%|          | 0.00/170M [00:00<?, ?B/s]

cpp-00016-of-00057.parquet:   0%|          | 0.00/171M [00:00<?, ?B/s]

cpp-00017-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00018-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00019-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00020-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00021-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00022-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00023-of-00057.parquet:   0%|          | 0.00/173M [00:00<?, ?B/s]

cpp-00024-of-00057.parquet:   0%|          | 0.00/173M [00:00<?, ?B/s]

cpp-00025-of-00057.parquet:   0%|          | 0.00/173M [00:00<?, ?B/s]

cpp-00026-of-00057.parquet:   0%|          | 0.00/170M [00:00<?, ?B/s]

cpp-00027-of-00057.parquet:   0%|          | 0.00/173M [00:00<?, ?B/s]

cpp-00028-of-00057.parquet:   0%|          | 0.00/171M [00:00<?, ?B/s]

cpp-00029-of-00057.parquet:   0%|          | 0.00/178M [00:00<?, ?B/s]

cpp-00030-of-00057.parquet:   0%|          | 0.00/170M [00:00<?, ?B/s]

cpp-00031-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00032-of-00057.parquet:   0%|          | 0.00/168M [00:00<?, ?B/s]

cpp-00033-of-00057.parquet:   0%|          | 0.00/175M [00:00<?, ?B/s]

cpp-00034-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00035-of-00057.parquet:   0%|          | 0.00/177M [00:00<?, ?B/s]

cpp-00036-of-00057.parquet:   0%|          | 0.00/170M [00:00<?, ?B/s]

cpp-00037-of-00057.parquet:   0%|          | 0.00/173M [00:00<?, ?B/s]

cpp-00038-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00039-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00040-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00041-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00042-of-00057.parquet:   0%|          | 0.00/171M [00:00<?, ?B/s]

cpp-00043-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00044-of-00057.parquet:   0%|          | 0.00/168M [00:00<?, ?B/s]

cpp-00045-of-00057.parquet:   0%|          | 0.00/175M [00:00<?, ?B/s]

cpp-00046-of-00057.parquet:   0%|          | 0.00/169M [00:00<?, ?B/s]

cpp-00047-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

cpp-00048-of-00057.parquet:   0%|          | 0.00/170M [00:00<?, ?B/s]

cpp-00049-of-00057.parquet:   0%|          | 0.00/175M [00:00<?, ?B/s]

cpp-00050-of-00057.parquet:   0%|          | 0.00/176M [00:00<?, ?B/s]

cpp-00051-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00052-of-00057.parquet:   0%|          | 0.00/170M [00:00<?, ?B/s]

cpp-00053-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00054-of-00057.parquet:   0%|          | 0.00/170M [00:00<?, ?B/s]

cpp-00055-of-00057.parquet:   0%|          | 0.00/174M [00:00<?, ?B/s]

cpp-00056-of-00057.parquet:   0%|          | 0.00/172M [00:00<?, ?B/s]

Downloading data:   0%|          | 0/74 [00:00<?, ?files/s]

c-00000-of-00074.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

c-00001-of-00074.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

c-00002-of-00074.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

c-00003-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00004-of-00074.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

c-00005-of-00074.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

c-00006-of-00074.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

c-00007-of-00074.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

c-00008-of-00074.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

c-00009-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00010-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00011-of-00074.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

c-00012-of-00074.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

c-00013-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00014-of-00074.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

c-00015-of-00074.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

c-00016-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00017-of-00074.parquet:   0%|          | 0.00/187M [00:00<?, ?B/s]

c-00018-of-00074.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

c-00019-of-00074.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

c-00020-of-00074.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

c-00021-of-00074.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

c-00022-of-00074.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

c-00023-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00024-of-00074.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

c-00025-of-00074.parquet:   0%|          | 0.00/188M [00:00<?, ?B/s]

c-00026-of-00074.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

c-00027-of-00074.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

c-00028-of-00074.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

c-00029-of-00074.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

c-00030-of-00074.parquet:   0%|          | 0.00/190M [00:00<?, ?B/s]

c-00031-of-00074.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

c-00032-of-00074.parquet:   0%|          | 0.00/176M [00:00<?, ?B/s]

c-00033-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00034-of-00074.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

c-00035-of-00074.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

c-00036-of-00074.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

c-00037-of-00074.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

c-00038-of-00074.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

c-00039-of-00074.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

c-00040-of-00074.parquet:   0%|          | 0.00/178M [00:00<?, ?B/s]

c-00041-of-00074.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

c-00042-of-00074.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

c-00043-of-00074.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

c-00044-of-00074.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

c-00045-of-00074.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

c-00046-of-00074.parquet:   0%|          | 0.00/178M [00:00<?, ?B/s]

c-00047-of-00074.parquet:   0%|          | 0.00/187M [00:00<?, ?B/s]

c-00048-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00049-of-00074.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

c-00050-of-00074.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

c-00051-of-00074.parquet:   0%|          | 0.00/187M [00:00<?, ?B/s]

c-00052-of-00074.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

c-00053-of-00074.parquet:   0%|          | 0.00/177M [00:00<?, ?B/s]

c-00054-of-00074.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

c-00055-of-00074.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

c-00056-of-00074.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

c-00057-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00058-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00059-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00060-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00061-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00062-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00063-of-00074.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

c-00064-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00065-of-00074.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

c-00066-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00067-of-00074.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

c-00068-of-00074.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

c-00069-of-00074.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

c-00070-of-00074.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

c-00071-of-00074.parquet:   0%|          | 0.00/179M [00:00<?, ?B/s]

c-00072-of-00074.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

c-00073-of-00074.parquet:   0%|          | 0.00/176M [00:00<?, ?B/s]

Downloading data:   0%|          | 0/177 [00:00<?, ?files/s]

java-00000-of-00177.parquet:   0%|          | 0.00/187M [00:00<?, ?B/s]

java-00001-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00002-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00003-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00004-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00005-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00006-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00007-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00008-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00009-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00010-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00011-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00012-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00013-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00014-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00015-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00016-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00017-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00018-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00019-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00020-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00021-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00022-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00023-of-00177.parquet:   0%|          | 0.00/190M [00:00<?, ?B/s]

java-00024-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00025-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00026-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00027-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00028-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00029-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00030-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00031-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00032-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00033-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00034-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00035-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00036-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00037-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00038-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00039-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00040-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00041-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00042-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00043-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00044-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00045-of-00177.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

java-00046-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00047-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00048-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00049-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00050-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00051-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00052-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00053-of-00177.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

java-00054-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00055-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00056-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00057-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00058-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00059-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00060-of-00177.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

java-00061-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00062-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00063-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00064-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00065-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00066-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00067-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00068-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00069-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00070-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00071-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00072-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00073-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00074-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00075-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00076-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00077-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00078-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00079-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00080-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00081-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00082-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00083-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00084-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00085-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00086-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00087-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00088-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00089-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00090-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00091-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00092-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00093-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00094-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00095-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00096-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00097-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00098-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00099-of-00177.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

java-00100-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00101-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00102-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00103-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00104-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00105-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00106-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00107-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00108-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00109-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00110-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00111-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00112-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00113-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00114-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00115-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00116-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00117-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00118-of-00177.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

java-00119-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00120-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00121-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00122-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00123-of-00177.parquet:   0%|          | 0.00/187M [00:00<?, ?B/s]

java-00124-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00125-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00126-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00127-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00128-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00129-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00130-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00131-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00132-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00133-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00134-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00135-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00136-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00137-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00138-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00139-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00140-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00141-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00142-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00143-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00144-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00145-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00146-of-00177.parquet:   0%|          | 0.00/188M [00:00<?, ?B/s]

java-00147-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00148-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00149-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00150-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00151-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00152-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00153-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00154-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00155-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00156-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00157-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00158-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00159-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00160-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00161-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00162-of-00177.parquet:   0%|          | 0.00/186M [00:00<?, ?B/s]

java-00163-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00164-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00165-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

java-00166-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00167-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00168-of-00177.parquet:   0%|          | 0.00/180M [00:00<?, ?B/s]

java-00169-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00170-of-00177.parquet:   0%|          | 0.00/185M [00:00<?, ?B/s]

java-00171-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00172-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00173-of-00177.parquet:   0%|          | 0.00/184M [00:00<?, ?B/s]

java-00174-of-00177.parquet:   0%|          | 0.00/182M [00:00<?, ?B/s]

java-00175-of-00177.parquet:   0%|          | 0.00/181M [00:00<?, ?B/s]

java-00176-of-00177.parquet:   0%|          | 0.00/183M [00:00<?, ?B/s]

Downloading data:   0%|          | 0/108 [00:00<?, ?files/s]

python-00000-of-00108.parquet:   0%|          | 0.00/198M [00:00<?, ?B/s]

python-00001-of-00108.parquet:   0%|          | 0.00/199M [00:00<?, ?B/s]

python-00002-of-00108.parquet:   0%|          | 0.00/197M [00:00<?, ?B/s]

python-00003-of-00108.parquet:   0%|          | 0.00/197M [00:00<?, ?B/s]

python-00004-of-00108.parquet:   0%|          | 0.00/197M [00:00<?, ?B/s]

python-00005-of-00108.parquet:   0%|          | 0.00/197M [00:00<?, ?B/s]

python-00006-of-00108.parquet:   0%|          | 0.00/197M [00:00<?, ?B/s]

python-00007-of-00108.parquet:   0%|          | 0.00/198M [00:00<?, ?B/s]

python-00008-of-00108.parquet:   0%|          | 0.00/198M [00:00<?, ?B/s]

python-00009-of-00108.parquet:   0%|          | 0.00/198M [00:00<?, ?B/s]

python-00010-of-00108.parquet:   0%|          | 0.00/198M [00:00<?, ?B/s]

python-00011-of-00108.parquet:   0%|          | 0.00/201M [00:00<?, ?B/s]

python-00012-of-00108.parquet:   0%|          | 0.00/198M [00:00<?, ?B/s]

python-00013-of-00108.parquet:   0%|          | 0.00/196M [00:00<?, ?B/s]

python-00014-of-00108.parquet:   0%|          | 0.00/200M [00:00<?, ?B/s]

python-00015-of-00108.parquet:   0%|          | 0.00/199M [00:00<?, ?B/s]

python-00016-of-00108.parquet:   0%|          | 0.00/200M [00:00<?, ?B/s]

python-00017-of-00108.parquet:   0%|          | 0.00/198M [00:00<?, ?B/s]

python-00018-of-00108.parquet:   0%|          | 0.00/199M [00:00<?, ?B/s]

python-00019-of-00108.parquet:   0%|          | 0.00/195M [00:00<?, ?B/s]

python-00020-of-00108.parquet:   0%|          | 0.00/197M [00:00<?, ?B/s]

python-00021-of-00108.parquet:   0%|          | 0.00/200M [00:00<?, ?B/s]

python-00022-of-00108.parquet:   0%|          | 0.00/198M [00:00<?, ?B/s]

python-00023-of-00108.parquet:   0%|          | 0.00/203M [00:00<?, ?B/s]

python-00024-of-00108.parquet:   0%|          | 0.00/200M [00:00<?, ?B/s]

python-00025-of-00108.parquet:   0%|          | 0.00/196M [00:00<?, ?B/s]

KeyboardInterrupt: 

Print out 5 stories from `Tiny Stories`

In [None]:
for row in dataset[:5]["text"]:
    print("=========================")
    print(row)

One day, a little girl named Lily found a needle in her room. She knew it was difficult to play with it because it was sharp. Lily wanted to share the needle with her mom, so she could sew a button on her shirt.

Lily went to her mom and said, "Mom, I found this needle. Can you share it with me and sew my shirt?" Her mom smiled and said, "Yes, Lily, we can share the needle and fix your shirt."

Together, they shared the needle and sewed the button on Lily's shirt. It was not difficult for them because they were sharing and helping each other. After they finished, Lily thanked her mom for sharing the needle and fixing her shirt. They both felt happy because they had shared and worked together.
Once upon a time, there was a little car named Beep. Beep loved to go fast and play in the sun. Beep was a healthy car because he always had good fuel. Good fuel made Beep happy and strong.

One day, Beep was driving in the park when he saw a big tree. The tree had many leaves that were falling. B

<a name="Train"></a>
### Continued Pretraining
Now let's use Unsloth's `UnslothTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 20 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

Also set `embedding_learning_rate` to be a learning rate at least 2x or 10x smaller than `learning_rate` to make continual pretraining work!

In [None]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import UnslothTrainer, UnslothTrainingArguments

trainer = UnslothTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 8,

    args = UnslothTrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 8,

        warmup_ratio = 0.1,
        num_train_epochs = 1,

        learning_rate = 5e-5,
        embedding_learning_rate = 5e-6,

        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.00,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

  self.pid = os.fork()


Map (num_proc=8):   0%|          | 0/2500 [00:00<?, ? examples/s]

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.748 GB.
6.367 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 2,500 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 8
\        /    Total batch size = 16 | Total steps = 156
 "-____-"     Number of trainable parameters = 603,979,776


Unsloth: Setting lr = 5.00e-06 instead of 5.00e-05 for embed_tokens.
Unsloth: Setting lr = 5.00e-06 instead of 5.00e-05 for lm_head.


Step,Training Loss
1,1.4365
2,1.4708
3,1.4645
4,1.3102
5,1.416
6,1.368
7,1.4214
8,1.1742
9,1.1735
10,1.2879


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

2780.9282 seconds used for training.
46.35 minutes used for training.
Peak reserved memory = 11.432 GB.
Peak reserved memory for training = 5.065 GB.
Peak reserved memory % of max memory = 77.516 %.
Peak reserved memory for training % of max memory = 34.344 %.


<a name="Inference"></a>
### Inference
Let's run the model!

We first will try to see if the model follows the style and understands to write a story that is within the distribution of "Tiny Stories". Ie a story fit for a bed time story most likely.

We select "Once upon a time, in a galaxy, far far away," since it normally is associated with Star Wars.

In [None]:
from transformers import TextIteratorStreamer
from threading import Thread
text_streamer = TextIteratorStreamer(tokenizer)
import textwrap
max_print_width = 100

# Before running inference, call `FastLanguageModel.for_inference` first

FastLanguageModel.for_inference(model)

inputs = tokenizer(
[
    "Once upon a time, in a galaxy, far far away,"
]*1, return_tensors = "pt").to("cuda")

generation_kwargs = dict(
    inputs,
    streamer = text_streamer,
    max_new_tokens = 256,
    use_cache = True,
)
thread = Thread(target = model.generate, kwargs = generation_kwargs)
thread.start()

length = 0
for j, new_text in enumerate(text_streamer):
    if j == 0:
        wrapped_text = textwrap.wrap(new_text, width = max_print_width)
        length = len(wrapped_text[-1])
        wrapped_text = "\n".join(wrapped_text)
        print(wrapped_text, end = "")
    else:
        length += len(new_text)
        if length >= max_print_width:
            length = 0
            print()
        print(new_text, end = "")
    pass
pass

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s>Once upon a time, in a galaxy, far faraway, there was a little girl named Lily. She loved to 
play with her toys and explore the universe. One day, she found a big, shiny rock. She picked it up and 
put it in her pocket.

Lily went to play with her friends, but she forgot about the rock. When she 
came back home, she realized that she had lost the rock. She was very sad and started to cry.

Her mom 
saw her crying and asked her what was wrong. Lily told her about the rock and how she lost it. Her mom 
said, "Don't worry, we can find it again." They went back to the place where Lily found the rock and 
searched for it. After a while, they found the rock and Lily was very happy. She learned that it's 
important to take care of her things and not to lose them. The end.</s>

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>
