To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

Read our **[TTS Guide](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning)** for instructions and all our notebooks.

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

Read our **[Qwen3 Guide](https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune)** and check out our new **[Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)** quants which outperforms other quantization methods!

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [None]:
%%capture
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
    !pip install --no-deps unsloth

# Must install latest transformers for Sesame!
!pip install git+https://github.com/huggingface/transformers.git

### Unsloth

`FastModel` supports loading nearly any model now! This includes Vision and Text models!

In [None]:
from unsloth import FastModel
from transformers import CsmForConditionalGeneration
import torch

model, processor = FastModel.from_pretrained(
    model_name = "unsloth/csm-1b",
    max_seq_length= 2048, # Choose any for long context!
    dtype = None, # Leave as None for auto-detection
    auto_model = CsmForConditionalGeneration,
    load_in_4bit = False, # Select True for 4bit - reduces memory usage
)

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [None]:

model = FastModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

<a name="Data"></a>
### Data Prep  

We will use the `MrDragonFox/Elise`, which is designed for training TTS models. Ensure that your dataset follows the required format: **text, audio** for single-speaker models or **source, text, audio** for multi-speaker models. You can modify this section to accommodate your own dataset, but maintaining the correct structure is essential for optimal training.

In [None]:
#@title Dataset Prep functions
from datasets import load_dataset, Audio, Dataset
import os
import torch
import numpy as np
from transformers import AutoProcessor
from sklearn.model_selection import train_test_split
processor = AutoProcessor.from_pretrained("unsloth/csm-1b")

# Load dataset
raw_ds = load_dataset("taresh18/AnimeVox", split="train[:1250]")
if "source" not in raw_ds.column_names:
    new_column = ["0"] * len(raw_ds)
    raw_ds = raw_ds.add_column("source", new_column)

target_sampling_rate = 24000
raw_ds = raw_ds.cast_column("audio", Audio(sampling_rate=target_sampling_rate))

# Add audio length check
def check_audio_length(example):
    return {"audio_length": len(example["audio"]["array"])}

raw_ds = raw_ds.map(check_audio_length)

# Filter out long audio files
max_audio_length = 240000  # Slightly less than 240001 to be safe
filtered_raw_ds = raw_ds.filter(lambda x: x["audio_length"] <= max_audio_length)
print(f"Filtered out {len(raw_ds) - len(filtered_raw_ds)} examples out of {len(raw_ds)} total")

# Keep original preprocessing function as close as possible
def preprocess_example(example):
    conversation = [
        {
            "role": str(example["source"]),
            "content": [
                {"type": "text", "text": example["transcription"]},
                {"type": "audio", "path": example["audio"]["array"]},
            ],
        }
    ]

    try:
        model_inputs = processor.apply_chat_template(
            conversation,
            tokenize=True,
            return_dict=True,
            output_labels=True,
            text_kwargs = {
                "padding": "max_length", # pad to the max_length
                "max_length": 250, # this should be the max length of audio
                "pad_to_multiple_of": 8,
                "padding_side": "right",
            },
            audio_kwargs = {
                "sampling_rate": 24_000,
                "max_length": 240001, # max input_values length of the whole dataset
                "padding": "max_length",
            },
            common_kwargs = {"return_tensors": "pt"},
        )
    except Exception as e:
        print(f"Error processing example with transcription '{example['transcription'][:50]}...': {e}")
        return None

    required_keys = ["input_ids", "attention_mask", "labels", "input_values", "input_values_cutoffs"]
    processed_example = {}
    # print(model_inputs.keys())
    for key in required_keys:
        if key not in model_inputs:
            print(f"Warning: Required key '{key}' not found in processor output for example.")
            return None

        value = model_inputs[key][0]
        processed_example[key] = value


    # Final check (optional but good)
    if not all(isinstance(processed_example[key], torch.Tensor) for key in processed_example):
         print(f"Error: Not all required keys are tensors in final processed example. Keys: {list(processed_example.keys())}")
         return None

    return processed_example

# Process the dataset
processed_ds = filtered_raw_ds.map(
    preprocess_example,
    remove_columns=[col for col in filtered_raw_ds.column_names],
    desc="Preprocessing dataset",
)

# Filter out None values (failed preprocessing)
valid_indices = [i for i, example in enumerate(processed_ds) if example is not None]
processed_ds = processed_ds.select(valid_indices)
print(f"Removed {len(filtered_raw_ds) - len(processed_ds)} examples that failed preprocessing")

# Split into train and validation (5%)
train_idx, eval_idx = train_test_split(
    range(len(processed_ds)),
    test_size=0.20,
    random_state=42
)

train_ds = processed_ds.select(train_idx)
eval_ds = processed_ds.select(eval_idx)

print(f"Training set size: {len(train_ds)}")
print(f"Validation set size: {len(eval_ds)}")


<a name="Train"></a>
### Train the model
Now let's use Huggingface  `Trainer`! More docs here: [Transformers docs](https://huggingface.co/docs/transformers/main_classes/trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

In [None]:
from transformers import TrainingArguments, Trainer
from unsloth import is_bfloat16_supported

trainer = Trainer(
    model = model,
    train_dataset = processed_ds,
     eval_dataset  = eval_ds,
    args = TrainingArguments(
        per_device_train_batch_size = 16,
        gradient_accumulation_steps = 2,
        warmup_ratio = 0.08,
        num_train_epochs = 14.5,
        #max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        # weight_decay = 0.01, # Turn this on if overfitting
     lr_scheduler_type   = "cosine_with_min_lr",
    lr_scheduler_kwargs = {"min_lr_rate": 0.10},   # floor = 2e-5
        seed = 3407,
        output_dir = "outputs",
        # report_to = "wandb", # Use this for WandB etc

        # Checkpoint saving parameters
        save_strategy = "steps",     # Save based on steps, not epochs
        save_steps = 	10,
         eval_strategy               = "steps",
         eval_steps                  = 	10,
        # Save every 50 steps
        metric_for_best_model       = "eval_loss",
        save_total_limit = 10,        # Keep only the 3 most recent checkpoints
        load_best_model_at_end      = True,
    ),
)

In [None]:
import wandb

wandb.login()  # This will provide a URL and key to authenticate


In [None]:
trainer_stats = trainer.train(resume_from_checkpoint = True)

In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

<a name="Inference"></a>
### Inference
Let's run the model! You can change the prompts

In [None]:
from IPython.display import Audio, display
import soundfile as sf
import torch

# ── 1.  Your five evaluation sentences ───────────────────────────────────
test_lines = [
    "Please remind me to send the invoice before noon, then check if the client confirmed the lunch reservation at the corner cafe for tomorrow afternoon.",
    "I forgot my umbrella on the train again this morning, which means either I get soaked later or the weather app is finally wrong for once.",
    "The coffee machine keeps flashing some strange error code, yet somehow still manages to make a better espresso than any cafe near the office.",
    "If the meeting actually starts on time and doesn't spiral into a group therapy session, I’ll order pizza for everyone and pretend I believe in structure.",
    "Traffic was unusually light this morning, so for a solid five minutes I genuinely believed I had forgotten a national holiday or time itself was broken."
]






speaker_id = 0              # pick your speaker token if you have multiples
sample_rate = 24_000        # CSM default

# ── 2.  Generate, save, and play each line ───────────────────────────────
for idx, text in enumerate(test_lines, start=1):
    prompt = f"[{speaker_id}]{text}"
    inputs  = processor(prompt, add_special_tokens=True).to("cuda")

    audio_values = model.generate(
        **inputs,
        max_new_tokens = 275,        # enough for ~10 s of audio
        output_audio   = True,
    )

    audio = audio_values[0].to(torch.float32).cpu().numpy()

    fname = f"test_line_{idx:02d}.wav"
    sf.write(fname, audio, sample_rate)
    print(f"Saved → {fname}")

    display(Audio(audio, rate=sample_rate))


In [None]:
text = "Chayce is a faggot dayum"

speaker_id = 1
# Another equivalent way to prepare the inputs
conversation = [
    {"role": str(speaker_id), "content": [{"type": "text", "text": text}]},
]
audio_values = model.generate(
    **processor.apply_chat_template(
        conversation,
        tokenize=True,
        return_dict=True,
    ).to("cuda"),
    max_new_tokens=125, # 125 tokens is 10 seconds of audio, for longer text increase this
    output_audio=True
)
audio = audio_values[0].to(torch.float32).cpu().numpy()
sf.write("example_without_context.wav", audio, 24000)
display(Audio(audio, rate=24000))

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model")  # Local saving
processor.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# processor.push_to_hub("your_name/lora_model", token = "...") # Online saving

### Saving to float16

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if True: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>