To run this, press "*Runtime*" and press "*Run all*" on a **free** Tesla T4 Google Colab instance!
<div class="align-center">
<a href="https://unsloth.ai/"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
<a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord button.png" width="145"></a>
<a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐
</div>

To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).

You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)


In [4]:
import pandas as pd
import numpy as np
from sklearn.utils import resample

def create_balanced_dataset(input_file, output_file, samples_per_class=1304):
    """
    Create a balanced dataset with specified number of samples per class
    
    Parameters:
    input_file (str): Path to the input CSV file
    output_file (str): Path to save the balanced dataset
    samples_per_class (int): Number of samples to extract per class
    """
    
    # Read the dataset
    print("Loading dataset...")
    df = pd.read_csv(input_file)
    
    # Filter for the three target labels
    target_labels = ['hate', 'hope', 'not_applicable']
    df_filtered = df[df['label'].isin(target_labels)]
    
    print(f"Original dataset size: {len(df)}")
    print(f"Filtered dataset size: {len(df_filtered)}")
    
    # Check class distribution
    class_counts = df_filtered['label'].value_counts()
    print("\nClass distribution:")
    for label in target_labels:
        count = class_counts.get(label, 0)
        print(f"{label}: {count} samples")
    
    # Create balanced dataset
    balanced_dfs = []
    
    for label in target_labels:
        class_data = df_filtered[df_filtered['label'] == label]
        
        if len(class_data) >= samples_per_class:
            # If we have enough samples, randomly sample
            sampled_data = class_data.sample(n=samples_per_class, random_state=42)
            print(f"✓ Sampled {samples_per_class} rows from '{label}' class")
        
        elif len(class_data) > 0:
            # If we don't have enough samples, use resampling with replacement
            sampled_data = resample(class_data, 
                                  n_samples=samples_per_class, 
                                  random_state=42, 
                                  replace=True)
            print(f"⚠ Resampled {samples_per_class} rows from '{label}' class (original: {len(class_data)})")
        
        else:
            print(f"❌ No samples found for '{label}' class")
            continue
            
        balanced_dfs.append(sampled_data)
    
    # Combine all balanced classes
    if balanced_dfs:
        final_df = pd.concat(balanced_dfs, ignore_index=True)
        
        # Shuffle the final dataset
        final_df = final_df.sample(frac=1, random_state=42).reset_index(drop=True)
        
        # Save to CSV
        final_df.to_csv(output_file, index=False, encoding='utf-8')
        
        print(f"\n✅ Balanced dataset created successfully!")
        print(f"Total rows: {len(final_df)}")
        print(f"Saved to: {output_file}")
        
        # Final class distribution
        final_counts = final_df['label'].value_counts()
        print("\nFinal class distribution:")
        for label in target_labels:
            count = final_counts.get(label, 0)
            print(f"{label}: {count} samples")
            
        return final_df
    
    else:
        print("❌ No data available to create balanced dataset")
        return None

# Usage
if __name__ == "__main__":
    # Set your file paths
    input_file = "balanced_arabic_only_main_train.csv"  # Your original dataset
    output_file = "balanced_dataset.csv"  # Output file name
    
    # Create balanced dataset with 1304 samples per class
    balanced_df = create_balanced_dataset(input_file, output_file, samples_per_class=1304)
    
    # Optional: Display sample data
    if balanced_df is not None:
        print("\nSample of the balanced dataset:")
        print(balanced_df.head(10))
        print(f"\nDataset shape: {balanced_df.shape}")


Loading dataset...
Original dataset size: 5873
Filtered dataset size: 5873

Class distribution:
hate: 912 samples
hope: 1812 samples
not_applicable: 3149 samples
⚠ Resampled 1304 rows from 'hate' class (original: 912)
✓ Sampled 1304 rows from 'hope' class
✓ Sampled 1304 rows from 'not_applicable' class

✅ Balanced dataset created successfully!
Total rows: 3912
Saved to: balanced_dataset.csv

Final class distribution:
hate: 1304 samples
hope: 1304 samples
not_applicable: 1304 samples

Sample of the balanced dataset:
     id                                               text           label
0  7753  أَبدى هواهُ ولم يَزَلْ محجوبا دمعٌ غدا في خدِّ...  not_applicable
1  1748  ان شاء الله الكاراتيه لما يدخل الاوليمبياد هيج...            hope
2  1293  السوريين الله يرحمهم برحمته تراجف ايديني من ال...            hope
3  6391  في واحد من الاوليمبياد اتسحبت منه الفضيه عشان ...            hope
4  9028  اغار بجنون ولسعات غيرتي تخدش تفاصيل اللقاء ولا...  not_applicable
5  8524  أمِن شعرٍ في الرّأس 

### News

Unsloth now supports Text-to-Speech (TTS) models. Read our [guide here](https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning).

Read our **[Qwen3 Guide](https://docs.unsloth.ai/basics/qwen3-how-to-run-and-fine-tune)** and check out our new **[Dynamic 2.0](https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs)** quants which outperforms other quantization methods!

Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).


### Installation

### Unsloth

In [5]:
from unsloth import FastLanguageModel
import torch

fourbit_models = [
    "unsloth/Qwen3-1.7B-unsloth-bnb-4bit", # Qwen 14B 2x faster
    "unsloth/Qwen3-4B-unsloth-bnb-4bit",
    "unsloth/Qwen3-8B-unsloth-bnb-4bit",
    "unsloth/Qwen3-14B-unsloth-bnb-4bit",
    "unsloth/Qwen3-32B-unsloth-bnb-4bit",

    # 4bit dynamic quants for superior accuracy and low memory use
    "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
    "unsloth/Phi-4",
    "unsloth/Llama-3.1-8B",
    "unsloth/Llama-3.2-3B",
    "unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit" # [NEW] We support TTS models!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "Qwen/Qwen3-8B",
    max_seq_length = 2048,   # Context length - can be longer, but uses more memory
    load_in_4bit = True,     # 4bit uses much less memory
    load_in_8bit = False,    # A bit more accurate, uses 2x memory
    full_finetuning = False, # We have full finetuning now!
    # token = "hf_...",      # use one if using gated models
)

==((====))==  Unsloth 2025.4.7: Fast Qwen3 patching. Transformers: 4.51.3. vLLM: 0.8.5.post1.
   \\   /|    NVIDIA GeForce RTX 4090. Num GPUs = 1. Max memory: 23.988 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [6]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32,           # Choose any number > 0! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 64,  # Best to choose alpha = rank or rank*2
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,   # We support rank stabilized LoRA
    loftq_config = None,  # And LoftQ
)

<a name="Data"></a>
### Data Prep
Qwen3 has both reasoning and a non reasoning mode. So, we should use 2 datasets:

1. We use the [Open Math Reasoning]() dataset which was used to win the [AIMO](https://www.kaggle.com/competitions/ai-mathematical-olympiad-progress-prize-2/leaderboard) (AI Mathematical Olympiad - Progress Prize 2) challenge! We sample 10% of verifiable reasoning traces that used DeepSeek R1, and whicht got > 95% accuracy.

2. We also leverage [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. But we need to convert it to HuggingFace's normal multiturn format as well.

In [9]:
from datasets import Dataset, load_dataset
from trl import SFTTrainer, SFTConfig
import pandas as pd

# Step 1: Load your CSV dataset and convert to the required format
train_df = pd.read_csv("balanced_dataset.csv")  # Replace with your train file path

# Step 2: Convert your dataset to conversation format
def create_conversation_format(row):
    # Create the Arabic prompt
    prompt = f"""أنت خبير في تصنيف النصوص العربية.

المهمة: صنف النص العربي التالي إلى إحدى الفئات أدناه.

النص العربي:
{row['text']}

الفئات:
0. غير قابل للتطبيق (not_applicable)
1. أمل (hope)
2. كراهية (hate)

الإجابة:"""
    
    # Map string labels to boxed string labels
    label_map = {"not_applicable": "\\boxed{not_applicable}", "hope": "\\boxed{hope}", "hate": "\\boxed{hate}"}
    if isinstance(row['label'], str):
        answer = label_map[row['label']]
    else:
        # If label is numeric, map back to string label first
        reverse_map = {"0": "not_applicable", "1": "hope", "2": "hate"}
        answer = label_map[reverse_map[str(row['label'])]]
    
    return {
        "messages": [
            {"role": "system", "content": "You are Qwen3, created by Alibaba Cloud. You are a helpful assistant specialized in Arabic text classification."},
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": answer}
        ]
    }

# Convert DataFrame to list of dictionaries
data_list = [create_conversation_format(row) for _, row in train_df.iterrows()]

# Step 3: Create HuggingFace Dataset
dataset = Dataset.from_list(data_list)

# Step 4: Extract conversation messages
def extract_conversations(examples):
    return {"conversations": examples["messages"]}

dataset = dataset.map(extract_conversations)

# Step 5: Apply chat template
formatted_texts = tokenizer.apply_chat_template(
    dataset["conversations"],
    tokenize=False
)

# Step 6: Wrap into a HuggingFace Dataset
train_dataset = Dataset.from_dict({"text": formatted_texts})


Map:   0%|          | 0/3912 [00:00<?, ? examples/s]

In [12]:
train_dataset[3500]

{'text': '<|im_start|>system\nYou are Qwen3, created by Alibaba Cloud. You are a helpful assistant specialized in Arabic text classification.<|im_end|>\n<|im_start|>user\nأنت خبير في تصنيف النصوص العربية.\n\nالمهمة: صنف النص العربي التالي إلى إحدى الفئات أدناه.\n\nالنص العربي:\nايا ليت اكون دمعه من دموع عينيك فان بكيت يوما التمس نعومه وجنتيك صباح الخير خواطر قصيره صباح الحب\n\nالفئات:\n0. غير قابل للتطبيق (not_applicable)\n1. أمل (hope)\n2. كراهية (hate)\n\nالإجابة:<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n\\boxed{hope}<|im_end|>\n'}

Let's see the structure of both datasets:

We now convert the reasoning dataset into conversational format:

Let's see the first transformed row:

Next we take the non reasoning dataset and convert it to conversational format as well.

We have to use Unsloth's `standardize_sharegpt` function to fix up the format of the dataset first.

Let's see the first row

Now let's see how long both datasets are:

The non reasoning dataset is much longer. Let's assume we want the model to retain some reasoning capabilities, but we specifically want a chat model.

Let's define a ratio of chat only data. The goal is to define some mixture of both sets of data.

Let's select 25% reasoning and 75% chat based:

Let's sample the reasoning dataset by 25% (or whatever is 100% - chat_percentage)

Finally combine both datasets:

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

In [13]:
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    eval_dataset = None, # Can set up evaluation!
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4, # Use GA to mimic batch size!
        warmup_steps = 5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        num_train_epochs = 1,
        learning_rate = 2e-4, # Reduce to 2e-5 for long training runs
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        output_dir = "/mnt/c/Users/T2410260/model/qwen_3_8B_arab_cache_5",
        seed = 3407,
        report_to = "none", # Use this for WandB etc
    ),
)

Unsloth: Tokenizing ["text"] (num_proc=32):   0%|          | 0/3912 [00:00<?, ? examples/s]

In [6]:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|im_start|>user\n",
    response_part = "<|im_start|>assistant\n",
)

Map (num_proc=32):   0%|          | 0/3912 [00:00<?, ? examples/s]

In [14]:
tokenizer.decode(trainer.train_dataset[5]["input_ids"])

'<|im_start|>system\nYou are Qwen3, created by Alibaba Cloud. You are a helpful assistant specialized in Arabic text classification.<|im_end|>\n<|im_start|>user\nأنت خبير في تصنيف النصوص العربية.\n\nالمهمة: صنف النص العربي التالي إلى إحدى الفئات أدناه.\n\nالنص العربي:\nأمِن شعرٍ في الرّأس بُدّلَ لونُه تبدّلتِ يا أسماءُ عنّي وعن ودِّي فإنْ يكُ هذا الهجرُ منكِ أو القِلى فليس بياضُ الرّأس يا أسْمُ من عندي تصدّين عمداً والهوى أنتِ كلّهُ وما كان شيبي لو تأمّلتِ من عمدي وليس لمن جازته ستُّون حجّةً من الشيب إنْ لم يردِه الموتُ من بُدِّ ولا لومَ يوماً من تغيّرِ صِبغتي إِذا لم يكن ذاك التغيُّرُ من عهدِي\n\nالفئات:\n0. غير قابل للتطبيق (not_applicable)\n1. أمل (hope)\n2. كراهية (hate)\n\nالإجابة:<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\n\\boxed{not_applicable}<|im_end|>\n'

In [15]:
space = tokenizer(" ", add_special_tokens = False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[5]["labels"]])

KeyError: 'labels'

In [None]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
11.898 GB of memory reserved.


Let's train the model! To resume a training run, set `trainer.train(resume_from_checkpoint = True)`

In [16]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 3,912 | Num Epochs = 1 | Total steps = 489
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 87,293,952/8,000,000,000 (1.09% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,3.6795
2,3.7558
3,3.7539
4,3.2115
5,2.5924
6,2.3731
7,1.996
8,1.4435
9,1.3745
10,1.3656


In [None]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

261.8103 seconds used for training.
4.36 minutes used for training.
Peak reserved memory = 14.066 GB.
Peak reserved memory for training = 0.0 GB.
Peak reserved memory % of max memory = 35.559 %.
Peak reserved memory for training % of max memory = 0.0 %.


<a name="Inference"></a>
### Inference
Let's run the model via Unsloth native inference! According to the `Qwen-3` team, the recommended settings for reasoning inference are `temperature = 0.6, top_p = 0.95, top_k = 20`

For normal chat based inference, `temperature = 0.7, top_p = 0.8, top_k = 20`

In [18]:
# Updated instruction prompt to match training format
instruction_prompt = """You are Qwen3, created by Alibaba Cloud. You are a helpful assistant specialized in Arabic text classification."""

# Example Arabic text from your dataset
question = """أنت خبير في تصنيف النصوص العربية.

المهمة: صنف النص العربي التالي إلى إحدى الفئات أدناه.

النص العربي:

أتمنى رؤيتك مجددًا.

الفئات:
0. غير قابل للتطبيق (not_applicable)
1. أمل (hope)
2. كراهية (hate)

الإجابة:"""

messages = [
    {"role": "system", "content": instruction_prompt},
    {"role": "user", "content": question}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,  # Must add for generation
    enable_thinking=False,  # Enable thinking for better reasoning
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors="pt").to("cuda"),
    max_new_tokens=500,  # Allow for thinking process
    temperature=0.6,     # Recommended for thinking mode
    top_p=0.95,         # Recommended for thinking mode
    top_k=20,           # Recommended for thinking mode
    streamer=TextStreamer(tokenizer, skip_prompt=True),
)


\boxed{hope}<|im_end|>


In [19]:
import pandas as pd
import torch
import re
from transformers import AutoTokenizer, AutoModelForCausalLM

# System prompt to match training format
system_prompt = """You are Qwen3, created by Alibaba Cloud. You are a helpful assistant specialized in Arabic text classification."""

def create_user_prompt(text):
    """Create user prompt in the same format as training data"""
    return f"""أنت خبير في تصنيف النصوص العربية.

المهمة: صنف النص العربي التالي إلى إحدى الفئات أدناه.

النص العربي:
{text}

الفئات:
0. غير قابل للتطبيق (not_applicable)
1. أمل (hope)
2. كراهية (hate)

الإجابة:"""

def predict_with_thinking(text):
    user_prompt = create_user_prompt(text)
    
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
    
    # Apply chat template with thinking mode enabled
    formatted_text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=True  # Enable thinking mode
    )
    
    inputs = tokenizer(formatted_text, return_tensors="pt").to(model.device)
    
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=500,
            temperature=0.6,
            top_p=0.95,
            top_k=20,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )
    
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

def extract_boxed_answer(generated_text):
    """
    Extract answer from boxed format and handle both numeric and label formats
    """
    # Look for boxed content
    pattern = r'\\boxed\{([^}]+)\}'
    match = re.search(pattern, generated_text)
    
    if match:
        answer = match.group(1).strip()
        
        # Handle numeric format (0, 1, 2)
        if answer == '0':
            return 'not_applicable'
        elif answer == '1':
            return 'hope'
        elif answer == '2':
            return 'hate'
        
        # Handle direct label format
        elif answer.lower() in ['not_applicable', 'hope', 'hate']:
            return answer.lower()
        
        # Handle Arabic labels (in case model responds in Arabic)
        elif 'غير قابل للتطبيق' in answer:
            return 'not_applicable'
        elif 'أمل' in answer:
            return 'hope'
        elif 'كراهية' in answer:
            return 'hate'
    
    # Fallback: search for patterns in the entire text if boxed format fails
    fallback_answer = extract_fallback_answer(generated_text)
    if fallback_answer:
        return fallback_answer
    
    # Default fallback
    return 'not_applicable'

def extract_fallback_answer(text):
    """
    Fallback method to extract answer if boxed format is not found
    """
    text_lower = text.lower()
    
    # Look for numeric patterns
    if re.search(r'\b0\b', text) and ('not_applicable' in text_lower or 'غير قابل للتطبيق' in text):
        return 'not_applicable'
    elif re.search(r'\b1\b', text) and ('hope' in text_lower or 'أمل' in text):
        return 'hope'
    elif re.search(r'\b2\b', text) and ('hate' in text_lower or 'كراهية' in text):
        return 'hate'
    
    # Look for direct labels
    if 'hate' in text_lower or 'كراهية' in text:
        return 'hate'
    elif 'hope' in text_lower or 'أمل' in text:
        return 'hope'
    elif 'not_applicable' in text_lower or 'غير قابل للتطبيق' in text:
        return 'not_applicable'
    
    return None

# Load model and tokenizer (you'll need to add this part)
# tokenizer = AutoTokenizer.from_pretrained("your_model_name")
# model = AutoModelForCausalLM.from_pretrained("your_model_name")

# Load validation dataset
validation_df = pd.read_csv("/mnt/c/Users/T2410260/Downloads/validation.csv")

# Make predictions
predictions = []
for i, text in enumerate(validation_df['text']):
    try:
        generated = predict_with_thinking(text)
        prediction = extract_boxed_answer(generated)
        predictions.append(prediction)
        
        if (i + 1) % 10 == 0:
            print(f"Processed {i + 1}/{len(validation_df)} texts")
            
    except Exception as e:
        print(f"Error processing text {i}: {e}")
        predictions.append('not_applicable')  # Default fallback

# Create submission file
submission_df = pd.DataFrame({
    'id': validation_df['id'],
    'prediction': predictions
})

# Validate predictions before saving
valid_labels = ['hate', 'hope', 'not_applicable']
invalid_predictions = submission_df[~submission_df['prediction'].isin(valid_labels)]

if not invalid_predictions.empty:
    print(f"Warning: Found {len(invalid_predictions)} invalid predictions")
    print(invalid_predictions['prediction'].value_counts())
    # Fix invalid predictions
    submission_df.loc[~submission_df['prediction'].isin(valid_labels), 'prediction'] = 'not_applicable'

submission_df.to_csv("submission.csv", index=False)
print("Submission file created!")
print(f"Prediction distribution:")
print(submission_df['prediction'].value_counts())


Processed 10/1476 texts
Processed 20/1476 texts
Processed 30/1476 texts
Processed 40/1476 texts
Processed 50/1476 texts
Processed 60/1476 texts
Processed 70/1476 texts
Processed 80/1476 texts
Processed 90/1476 texts
Processed 100/1476 texts
Processed 110/1476 texts
Processed 120/1476 texts
Processed 130/1476 texts
Processed 140/1476 texts
Processed 150/1476 texts
Processed 160/1476 texts
Processed 170/1476 texts
Processed 180/1476 texts
Processed 190/1476 texts
Processed 200/1476 texts
Processed 210/1476 texts
Processed 220/1476 texts
Processed 230/1476 texts
Processed 240/1476 texts
Processed 250/1476 texts
Processed 260/1476 texts
Processed 270/1476 texts
Processed 280/1476 texts
Processed 290/1476 texts
Processed 300/1476 texts
Processed 310/1476 texts
Processed 320/1476 texts
Processed 330/1476 texts
Processed 340/1476 texts
Processed 350/1476 texts
Processed 360/1476 texts
Processed 370/1476 texts
Processed 380/1476 texts
Processed 390/1476 texts
Processed 400/1476 texts
Processed

In [13]:
import pandas as pd
import re
from transformers import TextStreamer

# Load validation dataset
validation_df = pd.read_csv("/mnt/c/Users/T2410260/Downloads/validation.csv")
print(f"Loaded {len(validation_df)} validation samples")
print(validation_df.head())


Loaded 1476 validation samples
     id                                               text
0  7434  558: اخترتك حباً لقلبي فلا تكن بغيابك له وجعاً...
1  5133  ضعي ما تشائين من مساحيق التجميل فانا اتلذذ في ...
2  1418  إِنَّ شَعُورِي بِالْغَضَبِ وَالرَّغْبَةِ فِي ا...
3   334  RT @MAlaleany: @lbsry11 بل إيران جندت نفسها لل...
4  1600  الحب :ياتي مره واحده في العمر او قد لا ياتي اب...


In [14]:
# Instruction prompt for Arabic text classification with boxed response
instruction_prompt = """You are an expert in Arabic text classification. You can accurately analyze Arabic texts and classify them into appropriate categories. Use your deep understanding of Arabic language and cultural context to identify emotions and meanings in texts.

أنت خبير في تصنيف النصوص العربية. صنف كل نص إلى إحدى الفئات التالية:
0. غير قابل للتطبيق (not_applicable)
1. أمل (hope) 
2. كراهية (hate)

Think carefully about the content and context before making your final decision.

After your analysis, provide your final answer inside \\boxed{ } which must be exactly one of these three options:
- hate
- hope
- not_applicable"""


In [16]:
def extract_boxed_answer(generated_text):
    """Extract the answer from \\boxed{ } format"""
    pattern = r'\\boxed\{([^}]+)\}'
    match = re.search(pattern, generated_text)
    if match:
        answer = match.group(1).strip()
        # Ensure the answer is one of the valid options
        if answer in ['hate', 'hope', 'not_applicable']:
            return answer
    return 'not_applicable'  # Default fallback


In [18]:
def predict_batch(texts, batch_size=10):
    """Process texts in batches to avoid memory issues"""
    predictions = []
    
    for i in range(0, len(texts), batch_size):
        batch_texts = texts[i:i+batch_size]
        batch_predictions = []
        
        for text in batch_texts:
            # Create messages for each text
            messages = [
                {"role": "system", "content": instruction_prompt},
                {"role": "user", "content": str(text)}
            ]
            
            # Apply chat template
            formatted_text = tokenizer.apply_chat_template(
                messages,
                tokenize=False,
                add_generation_prompt=True,
                enable_thinking=True,
            )
            
            # Generate prediction
            inputs = tokenizer(formatted_text, return_tensors="pt").to("cuda")
            
            with torch.no_grad():
                outputs = model.generate(
                    **inputs,
                    max_new_tokens=500,
                    temperature=0.6,
                    top_p=0.95,
                    top_k=20,
                    do_sample=True,
                    pad_token_id=tokenizer.eos_token_id
                )
            
            # Decode the generated text
            generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
            
            # Extract the prediction from the generated text
            prediction = extract_boxed_answer(generated_text)
            batch_predictions.append(prediction)
            
            print(f"Processed {len(predictions) + len(batch_predictions)}/{len(texts)} texts")
        
        predictions.extend(batch_predictions)
    
    return predictions


In [19]:
# Extract texts from validation dataset
validation_texts = validation_df['text'].tolist()
validation_ids = validation_df['id'].tolist()

# Make predictions (choose one approach)
print("Starting predictions...")

# Option 1: With thinking mode (more accurate but slower)
# predictions = predict_batch(validation_texts, batch_size=5)

# Option 2: Simple approach (faster)
predictions = predict_simple(validation_texts)

print("Predictions completed!")

# Create submission dataframe
submission_df = pd.DataFrame({
    'id': validation_ids,
    'prediction': predictions
})

# Verify the format
print("\nSubmission file preview:")
print(submission_df.head(10))

print(f"\nPrediction distribution:")
print(submission_df['prediction'].value_counts())

# Save submission file
submission_df.to_csv("submission.csv", index=False)
print("\nSubmission file saved as 'submission.csv'")

# Verify no missing predictions
assert len(submission_df) == len(validation_df), "Mismatch in number of predictions"
assert all(pred in ['hate', 'hope', 'not_applicable'] for pred in predictions), "Invalid predictions found"
print("✅ All validations passed!")


Starting predictions...


NameError: name 'predict_simple' is not defined

In [13]:
model.save_pretrained_merged("/mnt/c/Users/T2410260/model/qwen_3_8B_acot_extes", tokenizer, save_method="merged_16bit")


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 13.41 out of 31.23 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 67%|███████████████████████████████████████████████████████▎                           | 24/36 [00:00<00:00, 42.11it/s]
We will save to Disk and not RAM now.
100%|███████████████████████████████████████████████████████████████████████████████████| 36/36 [00:03<00:00, 11.42it/s]


Unsloth: Saving tokenizer... Done.
Done.


In [14]:
model.push_to_hub_merged("Ash2749/qwen_3_8B_acot_extes", tokenizer, save_method="merged_16bit", token="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")


Unsloth: You are pushing to hub, but you passed your HF username = Ash2749.
We shall truncate Ash2749/qwen_3_8B_acot_extes to qwen_3_8B_acot_extes


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 13.39 out of 31.23 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|███████████████████████████████████████████████████████████████████████████████████| 36/36 [00:03<00:00, 10.85it/s]


Unsloth: Saving tokenizer...

  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

 Done.


README.md:   0%|          | 0.00/586 [00:00<?, ?B/s]

  0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.58G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/Ash2749/qwen_3_8B_acot_extes


In [2]:
from unsloth import FastLanguageModel
from unsloth.chat_templates import get_chat_template
from transformers import TextStreamer
import torch

# 1. Load model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Ash2749/Qwen3_4B_acot",  # Replace with your Hugging Face repo name
    max_seq_length=2048,
    dtype=torch.float16,               # Use torch.float16 if your model supports it
    load_in_4bit=False                 # Set to True only if your model was quantized to 4-bit
)

# 2. Enable optimized inference
FastLanguageModel.for_inference(model)

# 3. Apply appropriate chat template
tokenizer = get_chat_template(
    tokenizer,
    chat_template="qwen-3",  # Use the template your model was trained with
)

# 4. Prompt and user input
instruction_prompt = (
    "You are an expert in emotional psychology and you can accurately assess people's emotional states "
    "and help them process difficult situations using cognitive appraisal theory. Understand the listener's "
    "situation, follow the listener's point of view, and respond empathetically using primary appraisal, "
    "secondary appraisal, and reappraisal processes. Avoid dismissing their feelings, and instead, help them "
    "see the situation from a growth-oriented perspective."
)

user_input = "I am sad"

messages = [
    {"role": "system", "content": instruction_prompt},
    {"role": "user", "content": user_input}
]

# 5. Format text with chat template (non-tokenized for streaming)
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,  # Optional: enable simulated "thinking" pauses
)

# 6. Generate and stream response
inputs = tokenizer(text, return_tensors="pt").to(model.device)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

_ = model.generate(
    **inputs,
    max_new_tokens=500,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
    streamer=streamer
)


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


FileNotFoundError: Ash2749/Qwen3_4B_acot/*.json (repository not found)

In [None]:
messages = [
    {"role" : "user", "content" : "Solve (x + 2)^2 = 0."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, # Must add for generation
    enable_thinking = True, # Disable thinking
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 1024, # Increase for longer outputs!
    temperature = 0.6, top_p = 0.95, top_k = 20, # For thinking
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

<think>
Okay, so I need to solve the equation (x + 2)^2 = 0. Hmm, let's see. I remember that when you have something squared equals zero, the only solution is when the inside part is zero. Because if you square any real number, it's either positive or zero. So if the square is zero, the original number must be zero. So maybe I can take the square root of both sides?

Wait, but the equation is (x + 2)^2 = 0. If I take the square root of both sides, that would give me x + 2 = 0, right? Because the square root of 0 is 0. Then solving for x would just be subtracting 2 from both sides, so x = -2. But wait, isn't that the only solution? Because squaring a number can't be negative, so the only way (x + 2)^2 is zero is if x + 2 is zero. So x must be -2. But since it's squared, does that mean there's a multiplicity here? Like, maybe x = -2 is a repeated root?

Let me think. If we expand the equation, (x + 2)^2 = x^2 + 4x + 4. So the original equation is x^2 + 4x + 4 = 0. To solve this quadratic

<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [None]:
model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/vocab.json',
 'lora_model/merges.txt',
 'lora_model/added_tokens.json',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [None]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 2048,
        load_in_4bit = True,
    )

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False:
    model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: # Pushing to HF Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False:
    model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: # Pushing to HF Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False:
    model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: # Pushing to HF Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)

In [None]:
# Save to 8bit Q8_0
if False:
    model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False:
    model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False:
    model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: # Pushing to HF Hub
    model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False:
    model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: # Pushing to HF Hub
    model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>
