To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


### News

**NEW!** Unsloth now supports OpenAI's **gpt-oss**! We fixed issues in the model and **[Read our Guide](https://docs.unsloth.ai/basics/gpt-oss)** for more info!

Unsloth now supports Text-to-Speech (TTS) models. Read our [guide here](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning).

Read our **[Gemma 3N Guide](https://docs.unsloth.ai/basics/gemma-3n-how-to-run-and-fine-tune)** and check out our new **[Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)** quants which outperforms other quantization methods!

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

In [None]:
%%capture
# We're installing the latest Torch, Triton, OpenAI's Triton kernels, Transformers and Unsloth!
!pip install --upgrade -qqq uv
try: import numpy; install_numpy = f"numpy=={numpy.__version__}"
except: install_numpy = "numpy"
!uv pip install -qqq \
    "torch>=2.8.0" "triton>=3.4.0" {install_numpy} \
    "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \
    "unsloth[base] @ git+https://github.com/unslothai/unsloth" \
    torchvision bitsandbytes \
    git+https://github.com/huggingface/transformers \
    git+https://github.com/triton-lang/triton.git@05b2c186c1b6c9a08375389d5efe9cb4c401c075#subdirectory=python/triton_kernels

### Unsloth

We're about to demonstrate the power of the new OpenAI GPT-OSS 20B model through a finetuning example. To use our `MXFP4` inference example, use this [notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/GPT_OSS_MXFP4_(20B)-Inference.ipynb) instead.

In [None]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 1024
dtype = None

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/gpt-oss-20b-unsloth-bnb-4bit", # 20B model using bitsandbytes 4bit quantization
    "unsloth/gpt-oss-120b-unsloth-bnb-4bit",
    "unsloth/gpt-oss-20b", # 20B model using MXFP4 format
    "unsloth/gpt-oss-120b",
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gpt-oss-20b",
    dtype = dtype, # None for auto detection
    max_seq_length = max_seq_length, # Choose any for long context!
    load_in_4bit = True,  # 4 bit quantization to reduce memory
    full_finetuning = False, # [NEW!] We have full finetuning now!
    # token = "hf_...", # use one if using gated models
)

ImportError: cannot import name 'bmm_cublas' from 'bitsandbytes.autograd._functions' (/usr/local/lib/python3.11/dist-packages/bitsandbytes/autograd/_functions.py)

We now add LoRA adapters for parameter efficient finetuning - this allows us to only efficiently train 1% of all parameters.

In [None]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 8, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

Unsloth: Making `model.base_model.model.model` require gradients


### Reasoning Effort
The `gpt-oss` models from OpenAI include a feature that allows users to adjust the model's "reasoning effort." This gives you control over the trade-off between the model's performance and its response speed (latency) which by the amount of token the model will use to think.

----

The `gpt-oss` models offer three distinct levels of reasoning effort you can choose from:

* **Low**: Optimized for tasks that need very fast responses and don't require complex, multi-step reasoning.
* **Medium**: A balance between performance and speed.
* **High**: Provides the strongest reasoning performance for tasks that require it, though this results in higher latency.

In [None]:
from transformers import TextStreamer

messages = [
    {"role": "user", "content": "Solve x^5 + 3x^4 - 10 = 3."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "low", # **NEW!** Set reasoning effort to low, medium or high
).to(model.device)

_ = model.generate(**inputs, max_new_tokens = 64, streamer = TextStreamer(tokenizer))

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-16

Reasoning: low

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>Equation: x^5 + 3x^4 - 10 = 3. So x^5 + 3x^4 - 13 =0. Solve for x real? maybe find integer roots. try x=1:1+3-13=-9. x=2:


Changing the `reasoning_effort` to `medium` will make the model think longer. We have to increase the `max_new_tokens` to occupy the amount of the generated tokens but it will give better and more correct answer

In [None]:
from transformers import TextStreamer

messages = [
    {"role": "user", "content": "Solve x^5 + 3x^4 - 10 = 3."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "medium", # **NEW!** Set reasoning effort to low, medium or high
).to(model.device)

_ = model.generate(**inputs, max_new_tokens = 64, streamer = TextStreamer(tokenizer))

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-16

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>We need to solve equation: x^5 + 3x^4 - 10 = 3? Wait the equation given: "x^5 + 3x^4 - 10 = 3." But maybe the equation is given: x^5 + 3x^4 -


Lastly we will test it using `reasoning_effort` to `high`

In [None]:
from transformers import TextStreamer

messages = [
    {"role": "user", "content": "Solve x^5 + 3x^4 - 10 = 3."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "high", # **NEW!** Set reasoning effort to low, medium or high
).to(model.device)

_ = model.generate(**inputs, max_new_tokens = 64, streamer = TextStreamer(tokenizer))

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-16

Reasoning: high

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>We need to solve the equation \(x^5 + 3x^4 - 10 = 3\). Actually it's \(x^5 + 3x^4 - 10 = 3\), which simplifies to \(x^5 + 3x^4 - 13 =


<a name="Data"></a>
### Data Prep

The `HuggingFaceH4/Multilingual-Thinking` dataset will be utilized as our example. This dataset, available on Hugging Face, contains reasoning chain-of-thought examples derived from user questions that have been translated from English into four other languages. It is also the same dataset referenced in OpenAI's [cookbook](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers) for fine-tuning. The purpose of using this dataset is to enable the model to learn and develop reasoning capabilities in these four distinct languages.

In [None]:
def formatting_prompts_func(examples):
    convos = examples["messages"]
    texts = [tokenizer.apply_chat_template(convo, tokenize = False, add_generation_prompt = False) for convo in convos]
    return { "text" : texts, }
pass

from datasets import load_dataset

dataset = load_dataset("HuggingFaceH4/Multilingual-Thinking", split="train")
dataset

README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/5.29M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Dataset({
    features: ['reasoning_language', 'developer', 'user', 'analysis', 'final', 'messages'],
    num_rows: 1000
})

To format our dataset, we will apply our version of the GPT OSS prompt

In [None]:
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Let's take a look at the dataset, and check what the 1st example shows

In [None]:
print(dataset[0]['text'])

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-16

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

reasoning language: French

You are an AI chatbot with a lively and energetic personality.<|end|><|start|>user<|message|>Can you show me the latest trends on Twitter right now?<|end|><|start|>assistant<|channel|>analysis<|message|>D'accord, l'utilisateur demande les tendances Twitter les plus récentes. Tout d'abord, je dois vérifier si j'ai accès à des données en temps réel. Étant donné que je ne peux pas naviguer sur Internet ou accéder directement à l'API de Twitter, je ne peux pas fournir des tendances en direct. Cependant, je peux donner quelques conseils généraux sur la façon de les trouver.

Je devrais préciser que les 

What is unique about GPT-OSS is that it uses OpenAI [Harmony](https://github.com/openai/harmony) format which support conversation structures, reasoning output, and tool calling.

<a name="Train"></a>
### Train the model
Now let's train our model. We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

In [None]:
from trl import SFTConfig, SFTTrainer
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    args = SFTConfig(
        per_device_train_batch_size = 1,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps = 30,
        learning_rate = 2e-4,
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)

Unsloth: Switching to float32 training since model cannot work with float16


Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/1000 [00:00<?, ? examples/s]

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
12.811 GB of memory reserved.


In [None]:
trainer_stats = trainer.train()

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 199998, 'pad_token_id': 200017}.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,000 | Num Epochs = 1 | Total steps = 30
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 3,981,312 of 20,918,738,496 (0.02% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,2.1369
2,2.905
3,2.4247
4,2.1662
5,1.9783
6,2.124
7,1.8144
8,1.6906
9,1.9499
10,1.7755


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

664.5486 seconds used for training.
11.08 minutes used for training.
Peak reserved memory = 12.859 GB.
Peak reserved memory for training = 0.048 GB.
Peak reserved memory % of max memory = 87.233 %.
Peak reserved memory for training % of max memory = 0.326 %.


<a name="Inference"></a>
### Inference
Let's run the model! You can change the instruction and input - leave the output blank!

In [None]:
messages = [
    {"role": "system", "content": "reasoning language: French\n\nYou are a helpful assistant that can solve mathematical problems."},
    {"role": "user", "content": "Solve x^5 + 3x^4 - 10 = 3."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "medium",
).to(model.device)
from transformers import TextStreamer
_ = model.generate(**inputs, max_new_tokens = 64, streamer = TextStreamer(tokenizer))

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-16

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

reasoning language: French

You are a helpful assistant that can solve mathematical problems.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>We need to solve the equation: \(x^5 + 3x^4 - 10 = 3\). Let's rewrite: \(x^5 + 3x^4 - 13 = 0\). We want roots of this quintic equation. It might factor or have rational


<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** Currently finetunes can only be loaded via Unsloth in the meantime - we're working on vLLM and GGUF exporting!

In [None]:
model.save_pretrained("finetuned_model")
# model.push_to_hub("hf_username/finetuned_model", token = "hf_...") # Save to HF

To run the finetuned model, you can do the below after setting `if False` to `if True` in a new instance.

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "finetuned_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 1024,
        dtype = None,
        load_in_4bit = True,
    )

messages = [
    {"role": "system", "content": "reasoning language: French\n\nYou are a helpful assistant that can solve mathematical problems."},
    {"role": "user", "content": "Solve x^5 + 3x^4 - 10 = 3."},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt = True,
    return_tensors = "pt",
    return_dict = True,
    reasoning_effort = "high",
).to(model.device)
from transformers import TextStreamer
_ = model.generate(**inputs, max_new_tokens = 64, streamer = TextStreamer(tokenizer))

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: high

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

reasoning language: French

You are a helpful assistant that can solve mathematical problems.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>We need to solve the equation for x. The equation: x^5 + 3x^4 - 10 = 3. So bring 3 to left side: x^5 + 3x^4 -10 -3 = 0 → x^5 + 3x^


And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>


# **Multitask Classification**

In [None]:
%%capture
# We're installing the latest Torch, Triton, OpenAI's Triton kernels, Transformers and Unsloth!
!pip install --upgrade -qqq uv
try: import numpy; install_numpy = f"numpy=={numpy.__version__}"
except: install_numpy = "numpy"
!uv pip install -qqq \
    "torch>=2.8.0" "triton>=3.4.0" {install_numpy} \
    "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \
    "unsloth[base] @ git+https://github.com/unslothai/unsloth" \
    torchvision bitsandbytes \
    git+https://github.com/huggingface/transformers \
    git+https://github.com/triton-lang/triton.git@05b2c186c1b6c9a08375389d5efe9cb4c401c075#subdirectory=python/triton_kernels

In [None]:
%%capture
# Reinstalling sympy, torch, and bitsandbytes to resolve potential dependency conflicts
!uv pip install -qqq sympy==1.12 torch==2.3.0 bitsandbytes==0.43.1

In [None]:
import torch
from unsloth import FastLanguageModel
from transformers import TextStreamer
from trl import SFTConfig, SFTTrainer
from datasets import Dataset
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np
from transformers.trainer_callback import EarlyStoppingCallback
from multiprocessing import Pool
import re

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
# import torch
# import numpy as np
# from unsloth import FastLanguageModel
# from datasets import load_dataset
# from transformers import TrainingArguments, Trainer, TrainerCallback
# from sklearn.metrics import precision_recall_fscore_support, accuracy_score
# import torch.nn as nn
# import os
# import json
# from transformers.trainer_utils import EvalPrediction


# max_seq_length = 512
# dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16

# # 4bit pre quantized models we support for 4x faster downloading + no OOMs.
# fourbit_models = [
#     #"unsloth/gpt-oss-20b-unsloth-bnb-4bit", # 20B model using bitsandbytes 4bit quantization
#     #"unsloth/gpt-oss-120b-unsloth-bnb-4bit",
#     "unsloth/gpt-oss-20b", # 20B model using MXFP4 format
#     #"unsloth/gpt-oss-120b",
# ] # More models at https://huggingface.co/unsloth

# model, tokenizer = FastLanguageModel.from_pretrained(
#     model_name = "unsloth/gpt-oss-20b",
#     dtype = dtype, # None for auto detection
#     max_seq_length = max_seq_length, # Choose any for long context!
#     load_in_4bit = True,  # 4 bit quantization to reduce memory
#     full_finetuning = False, # [NEW!] We have full finetuning now!
#     # token = "hf_...", # use one if using gated models
# )

# Load model and tokenizer
max_seq_length = 512
dtype = None
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/gpt-oss-20b",
    dtype=dtype,
    max_seq_length=max_seq_length,
    load_in_4bit=True,
)

==((====))==  Unsloth 2025.8.7: Fast Gpt_Oss patching. Transformers: 4.56.0.dev0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gpt_oss won't work! Using float32.
Unsloth: Gpt_Oss does not support SDPA - switching to eager!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

# **LoRA**

In [None]:
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # Increased rank for better capacity
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=32,  # Increased alpha for stronger adaptation
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None,
)

Unsloth: Making `model.base_model.model.model` require gradients


# **Data Cleaning**

In [None]:
import string
arabic_punctuations = '''`÷« »×؛<>٩٨'٧٦٥٤٣٢١٠_()↗*•&^%][ـ،/:"؟.,'{}⋮≈~¦+|٪!”…“–ـ/[]%=#*+\•~@£·_{}©^®`→°€™›♥←×§″′Â█à…“★”–●â►−¢¬░¶↑±▾     ═¦║―¥▓—‹─▒：⊕▼▪†■’▀¨▄♫☆é¯♦¤▲è¸Ã⋅‘∞∙）↓、│（»，♪╩╚³・╦╣╔╗▬❤ïØ¹≤‡₹´'''
english_punctuations = string.punctuation
punctuations_list = arabic_punctuations + english_punctuations

arabic_diacritics = re.compile("""
                             ّ    | # Tashdid
                             َ    | # Fatha
                             ً    | # Tanwin Fath
                             ُ    | # Damma
                             ٌ    | # Tanwin Damm
                             ِ    | # Kasra
                             ٍ    | # Tanwin Kasr
                             ْ    | # Sukun
                             ـ     # Tatwil/Kashida
                         """, re.VERBOSE)

In [None]:
def remove_diacritics(df):
    df['text'] = df['text'].apply(lambda x: _remove_diacritics(x))
    return df
def _remove_diacritics(x):
    x = str(x)
    x = re.sub(arabic_diacritics, '', x)
    return x

def normalize_arabic(df):
    df['text'] = df['text'].apply(lambda x: _normalize_arabic(x))
    return df
def _normalize_arabic(x):
    x = str(x)
    # added space around puncts after replace
    x = re.sub("[إأآا]", "ا", x)
    x = re.sub("ى", "ي", x)
    x = re.sub("ؤ", "ء", x)
    x = re.sub("ئ", "ء", x)
    x = re.sub("ة", "ه", x)
    x = re.sub("گ", "ك", x)
    return x

def remove_punctuations(df):
    df['text'] = df['text'].apply(lambda x: _remove_punctuations(x))
    return df
def _remove_punctuations(x):
    x = str(x)
    #translator = str.maketrans(' ', ' ', punctuations_list)
    translator = str.maketrans(punctuations_list, ' '*len(punctuations_list))
    return x.translate(translator)

def remove_repeating_char(df):
    df['text'] = df['text'].apply(lambda x: _remove_repeating_char(x))
    return df
def _remove_repeating_char(x):
    x = str(x)
    return re.sub(r'(.)\1+', r'\1', x)

def remove_english_word_and_numbers(df):
    df['text'] = df['text'].apply(lambda x: _remove_english_word_and_numbers(x))
    return df
def _remove_english_word_and_numbers(x):
    x = str(x)
    return re.sub(r'[a-zA-Z0-9]+', '', x)

def clean_space(df):
    compiled_re = re.compile(r"\s+")
    df['text'] = df["text"].apply(lambda x: _clean_space(x, compiled_re))
    return df
def _clean_space(x, compiled_re):
    return compiled_re.sub(" ", x)

In [None]:
def clean(df):
    df = remove_diacritics(df)
    df = normalize_arabic(df)
    df = remove_punctuations(df)
    df = remove_repeating_char(df)
    df= remove_english_word_and_numbers(df)
    df=clean_space(df)
    return df


In [None]:
num_cores = 2
def df_parallelize_run(df, func, num_cores=2):
    df_split = np.array_split(df, num_cores)
    pool = Pool(num_cores)
    df = pd.concat(pool.map(func, df_split))
    pool.close()
    pool.join()
    return df


# **Data Preprocessing**

In [None]:
# Load and preprocess dataset
def load_and_preprocess_data():
    # Try different encodings to handle UnicodeDecodeError
    encodings = ['windows-1256', 'utf-8-sig', 'latin1']
    data = None
    for encoding in encodings:
        try:
            data = pd.read_csv("Task2_train.csv", encoding=encoding)
            print(f"Successfully read CSV with encoding: {encoding}")
            break
        except UnicodeDecodeError:
            print(f"Failed to read CSV with encoding: {encoding}")
            continue
    if data is None:
        raise ValueError("Could not read CSV with any encoding. Please check the file.")

    # Clean data: remove rows with missing or invalid labels
    data = data.dropna(subset=["text"])
    data = df_parallelize_run(data, clean)
    data['Emotion'] = data['Emotion'].fillna('neutral')
    data["Offensive"] = data["Offensive"].fillna('no').map({"yes": 1, "no": 0})
    data["Hate"] = data["Hate"].fillna("not_hate").map({"hate": 1, "not_hate": 0})
    # Convert emotions to one-hot encoding
    emotions = data["Emotion"].unique()
    emotion_map = {e: i for i, e in enumerate(emotions)}
    data["Emotion_id"] = data["Emotion"].map(emotion_map)

    # # Oversample minority classes for Emotion and Offensive
    # def oversample_minority(data):
    #     max_size = data["Emotion"].value_counts().max()
    #     offensive_yes = data[data["Offensive"] == 1]
    #     offensive_no = data[data["Offensive"] == 0]
    #     if len(offensive_yes) < len(offensive_no):
    #         offensive_yes = offensive_yes.sample(len(offensive_no), replace=True, random_state=3407)
    #     data_balanced = pd.concat([offensive_yes, offensive_no])
    #     data_final = []
    #     for emotion in emotions:
    #         emo_data = data_balanced[data_balanced["Emotion"] == emotion]
    #         data_final.append(emo_data.sample(max_size, replace=True, random_state=3407))
    #     return pd.concat(data_final).sample(frac=1, random_state=3407)

    # data = oversample_minority(data)

    # Calculate class weights
    total_samples = len(data)
    emotion_counts = data["Emotion"].value_counts()
    num_emotion_classes = len(emotion_counts)
    emotion_weights = {emo: total_samples / (num_emotion_classes * count) for emo, count in emotion_counts.items()}
    offensive_counts = data["Offensive"].value_counts()
    offensive_weights = {off: total_samples / (2 * count) for off, count in offensive_counts.items()}
    hate_counts = data["Hate"].value_counts()
    hate_weights = {hate: total_samples / (2 * count) for hate, count in hate_counts.items()}

    # Normalize weights to sum to 1
    max_weight = max(max(emotion_weights.values()), max(offensive_weights.values()), max(hate_weights.values()))
    emotion_weights = {k: v / max_weight for k, v in emotion_weights.items()}
    offensive_weights = {k: v / max_weight for k, v in offensive_weights.items()}
    hate_weights = {k: v / max_weight for k, v in hate_weights.items()}

    print(f"Emotion weights: {emotion_weights}")
    print(f"Offensive weights: {offensive_weights}")
    print(f"Hate weights: {hate_weights}")

    # Format prompts with few-shot examples
    few_shot_examples = [
        {
            "text": "أحد التجار الشباب العمانيين يقول للاسف لما يكون عندهم كاش يروحوا هايبرماركت ولما يريدوا صبر يتسوقوا من عندي!!",
            "labels": "Emotion: neutral, Offensive: 0, Hate: 0"
        },
        {
            "text": "@JALHARBISKY مجموعه القدرة الجنسيه👍 بديل الفياجرا والسنافي💞",
            "labels": "Emotion: optimism, Offensive: 0, Hate: 0"
        },
        {
            "text": "سيسي خاين..سيسي قاتل #هتافات_ثورية",
            "labels": "Emotion: anger, Offensive: 1, Hate: 0"
        },
    ]

    def format_prompt(row):
        examples_str = "\n".join([f"Example {i+1}: Text: {ex['text']}\nLabels: {ex['labels']}" for i, ex in enumerate(few_shot_examples)])
        return (
            f"<|START|>System: You are a classifier for Arabic social media text. Analyze the following text and predict the Emotion, Offensive, and Hate labels based on the examples below.\n"
            f"{examples_str}\n\n"
            f"Text: {row['text']}\n"
            f"Output format: Emotion: <emotion>, Offensive: <0 or 1>, Hate: <0 or 1><|END|>\n"
            f"Ground truth: Emotion: {row['Emotion']}, Offensive: {row['Offensive']}, Hate: {row['Hate']}<|END|>"
        )

    data["formatted_text"] = data.apply(format_prompt, axis=1)
    return data, emotions, emotion_map, emotion_weights, offensive_weights, hate_weights

In [None]:
data, emotions, emotion_map, emotion_weights, offensive_weights, hate_weights = load_and_preprocess_data()

# Split dataset into train (80%), validation (10%), and test (10%)
train_data, temp_data = train_test_split(data, test_size=0.2, random_state=3407)  # 80% train, 20% temp
val_data, test_data = train_test_split(temp_data, test_size=0.5, random_state=3407)  # Split temp into 10% val, 10% test
train_dataset = Dataset.from_pandas(train_data[["formatted_text"]].rename(columns={"formatted_text": "text"}))
val_dataset = Dataset.from_pandas(val_data[["formatted_text"]].rename(columns={"formatted_text": "text"}))
test_dataset = Dataset.from_pandas(test_data[["formatted_text"]].rename(columns={"formatted_text": "text"}))

Successfully read CSV with encoding: windows-1256


  return bound(*args, **kwds)


Emotion weights: {'anger': 0.03255963894261766, 'disgust': 0.064993564993565, 'neutral': 0.07639939485627836, 'love': 0.08516020236087689, 'joy': 0.0947467166979362, 'anticipation': 0.10285132382892055, 'optimism': 0.12052505966587111, 'sadness': 0.1507462686567164, 'confidence': 0.24047619047619048, 'pessimism': 0.2603092783505154, 'surprise': 0.3531468531468531, 'fear': 0.9528301886792452}
Offensive weights: {0: 0.07186907020872864, 1: 0.17373853211009171}
Hate weights: {0: 0.053561958635319075, 1: 1.0}


In [None]:
# Standardize dataset format
from unsloth.chat_templates import standardize_sharegpt
train_dataset = standardize_sharegpt(train_dataset)
val_dataset = standardize_sharegpt(val_dataset)

# **Custom SFTTrainer with class-weighted loss**

In [None]:
# Custom SFTTrainer with class-weighted loss
class WeightedSFTTrainer(SFTTrainer):
    def __init__(self, emotion_weights, offensive_weights, hate_weights, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.emotion_weights = emotion_weights
        self.offensive_weights = offensive_weights
        self.hate_weights = hate_weights

    def compute_loss(self, model, inputs, return_outputs=False):
        # Get the original loss
        outputs = model(**inputs)
        loss = outputs.loss

        # Extract labels from the input prompt
        batch_size = inputs["input_ids"].size(0)
        labels = []
        for i in range(batch_size):
            input_text = self.tokenizer.decode(inputs["input_ids"][i], skip_special_tokens=True)
            ground_truth_match = re.search(r"Ground truth: Emotion: (.*?), Offensive: (\d), Hate: (\d)", input_text)
            if ground_truth_match:
                emo, off, hate = ground_truth_match.groups()
                off = int(off)
                hate = int(hate)
                # Combine weights (use minimum to avoid overemphasizing any single label)
                weight = min(self.emotion_weights.get(emo, 1.0), self.offensive_weights.get(off, 1.0), self.hate_weights.get(hate, 1.0))
                labels.append(weight)
            else:
                labels.append(1.0)  # Default weight if parsing fails

        # Apply weights to the loss
        weights = torch.tensor(labels, device=model.device, dtype=loss.dtype)
        weighted_loss = (loss * weights).mean()

        return (weighted_loss, outputs) if return_outputs else weighted_loss

In [None]:
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    args=SFTConfig(
        max_seq_length=max_seq_length,
        per_device_train_batch_size=1,
        per_device_eval_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=100,  # Increased for more training
        learning_rate=1e-4,  # Lowered for stability
        logging_steps=1,
        eval_strategy="steps",
        eval_steps=50,
        save_strategy="steps",
        save_steps=50,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs_gpt5",
        report_to="none",
        load_best_model_at_end=True,
        metric_for_best_model="eval_loss",
        greater_is_better=False,
    ),
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3, early_stopping_threshold=0.01)]
)
trainer_stats = trainer.train()

Unsloth: Switching to float32 training since model cannot work with float16


Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/4768 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/596 [00:00<?, ? examples/s]

The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': 199998, 'pad_token_id': 200017}.
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 4,768 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 7,962,624 of 20,922,719,808 (0.04% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss,Validation Loss
50,1.2128,1.044425
100,0.7862,0.769814


Unsloth: Not an error, but GptOssForCausalLM does not accept `num_items_in_batch`.
Using gradient accumulation will be very slightly less accurate.
Read more on gradient accumulation issues here: https://unsloth.ai/blog/gradient


# **Evaluation function**

In [None]:
# Evaluation function
def evaluate_model(model, tokenizer, test_data, emotions, emotion_map, cnt):
    model.eval()
    predictions = {"Emotion": [], "Offensive": [], "Hate": []}
    ground_truth = {"Emotion": test_data["Emotion_id"].tolist(), "Offensive": test_data["Offensive"].tolist(), "Hate": test_data["Hate"].tolist()}

    for _, row in test_data.iterrows():
        messages = [{"role": "user", "content": f"Classify the following text for Emotion, Offensive, and Hate based on the examples provided:\n"
                                               f"Example 1: Text: أحد التجار الشباب العمانيين يقول للاسف لما يكون عندهم كاش يروحوا هايبرماركت ولما يريدوا صبر يتسوقوا من عندي!!\nLabels: Emotion: neutral, Offensive: 0, Hate: 0\n"
                                               f"Example 2: Text: @JALHARBISKY مجموعه القدرة الجنسيه👍 بديل الفياجرا والسنافي💞\nLabels: Emotion: optimism, Offensive: 0, Hate: 0\n"
                                               f"Example 3: Text: سيسي خاين..سيسي قاتل #هتافات_ثورية\nLabels: Emotion: anger, Offensive: 1, Hate: 0\n\n"
                                               f"Text: {row['text']}\nOutput format: Emotion: <emotion>, Offensive: <0 or 1>, Hate: <0 or 1>"}]
        inputs = tokenizer.apply_chat_template(
            messages,
            add_generation_prompt=True,
            return_tensors="pt",
            return_dict=True,
            reasoning_effort="high",
        ).to(model.device)

        output = model.generate(**inputs, max_new_tokens=64)
        output_text = tokenizer.decode(output[0], skip_special_tokens=True)
        cnt+=1
        print(f"Raw model output: {output_text}")
        print(f"Sample Number : {cnt}")

        # Parse output
        emo, off, hate = "neutral", 0, 0
        try:
            for line in output_text.split("\n"):
                if line.startswith("Emotion:"):
                    emo = line.split(":")[1].strip()
                elif line.startswith("Offensive:"):
                    off = int(line.split(":")[1].strip())
                elif line.startswith("Hate:"):
                    hate = int(line.split(":")[1].strip())
        except:
            pass

        predictions["Emotion"].append(emotion_map.get(emo, emotion_map["neutral"]))
        predictions["Offensive"].append(off)
        predictions["Hate"].append(hate)

    # Calculate metrics
    metrics = {}
    metrics["Emotion"] = {
        "accuracy": accuracy_score(ground_truth["Emotion"], predictions["Emotion"]),
        **dict(zip(["precision", "recall", "f1"], precision_recall_fscore_support(ground_truth["Emotion"], predictions["Emotion"], average="macro")))
    }
    metrics["Offensive"] = {
        "accuracy": accuracy_score(ground_truth["Offensive"], predictions["Offensive"]),
        **dict(zip(["precision", "recall", "f1"], precision_recall_fscore_support(ground_truth["Offensive"], predictions["Offensive"], average="macro", zero_division=0)))
    }
    metrics["Hate"] = {
        "accuracy": accuracy_score(ground_truth["Hate"], predictions["Hate"]),
        **dict(zip(["precision", "recall", "f1"], precision_recall_fscore_support(ground_truth["Hate"], predictions["Hate"], average="macro", zero_division=0)))
    }

    return metrics

In [None]:
# Evaluate the model
cnt = 0
metrics = evaluate_model(model, tokenizer, test_data, emotions, emotion_map, cnt)
print("Evaluation Metrics:")
for label, scores in metrics.items():
    print(f"\n{label}:")
    print(f"  Accuracy: {scores['accuracy']:.4f}")
    print(f"  Precision: {scores['precision']:.4f}")
    print(f"  Recall: {scores['recall']:.4f}")
    print(f"  F1-Score: {scores['f1']:.4f}")

# Save the model
model.save_pretrained("gpt_oss_finetuned")
tokenizer.save_pretrained("gpt_oss_finetuned")

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

Example 1: Text: أحد التجار شبائيظٹظ ط ظٹ ط طµظˆظٹ ط­ ط ط طھ ط­ط­
Sample Number : 330
Raw model output: systemYou are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-19

Reasoning: high

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.userClassify the following text for Emotion, Offensive, and Hate based on the examples provided:
Example 1: Text: أحد التجار الشباب العمانيين يقول للاسف لما يكون عندهم كاش يروحوا هايبرماركت ولما يريدوا صبر يتسوقوا من عندي!!
Labels: Emotion: neutral, Offensive: 0, Hate: 0
Example 2: Text: @JALHARBISKY مجموعه القدرة الجنسيه👍 بديل الفياجرا والسنافي💞
Labels: Emotion: optimism, Offensive: 0, Hate: 0
Example 3: Text: سيسي خاين..سيسي قاتل #هتافات_ثورية
Labels: Emotion: anger, Offensive: 1, Hate: 0

Text:  ط ظ„ظ طµط ط ظ„ظˆط­ط ظ ط ظ„ظپ ظ ط 

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


('gpt_oss_finetuned/tokenizer_config.json',
 'gpt_oss_finetuned/special_tokens_map.json',
 'gpt_oss_finetuned/chat_template.jinja',
 'gpt_oss_finetuned/tokenizer.json')

In [None]:
# Save metrics to a JSON file
import json
with open("evaluation_metrics.json", "w") as f:
    json.dump(metrics, f, indent=4)