# Text classification with Unsloth

This modified Unsloth notebook trains an LLM on any text classification dataset, where the input is a csv with columns "text" and "label".

### Added features:

- Trims the classification head to contain only the number tokens such as "1", "2" etc, which saves 1 GB of VRAM, allows you to train the head without massive memory usage, and makes the start of the training session more stable.
- Only the last token in the sequence contributes to the loss, the model doesn't waste its capacity by trying to predict the input
- includes "group_by_length = True" which speeds up training significantly for unbalanced sequence lengths
- Efficiently evaluates the accuracy on the validation set using batched inference

### Update 4th of May 2025:

- Added support for more than 2 classes
- The classification head is now built back up to the original size after training, no more errors in external libraries.
- Made the batched inference part much faster and cleaner
- Changed model to Qwen 3
- Improved comments to explain the complicated parts

In [1]:
# needed as this function doesn't like it when the lm_head has its size changed
from unsloth import tokenizer_utils
def do_nothing(*args, **kwargs):
    pass
tokenizer_utils.fix_untrained_tokens = do_nothing

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


  from .autonotebook import tqdm as notebook_tqdm


Unsloth: Failed to patch Gemma3ForConditionalGeneration.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [3]:
import torch
major_version, minor_version = torch.cuda.get_device_capability()
print(f"Major: {major_version}, Minor: {minor_version}")
from datasets import load_dataset
import datasets
from trl import SFTTrainer
import pandas as pd
import numpy as np
import os
import pandas as pd
import numpy as np
from unsloth import FastLanguageModel
from trl import SFTTrainer
from transformers import TrainingArguments, Trainer
from typing import Tuple
import warnings
from typing import Any, Dict, List, Union
from transformers import DataCollatorForLanguageModeling
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

Major: 8, Minor: 9


In [4]:
NUM_CLASSES = 4 # number of classes in the csv

max_seq_length = 24000 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+

# model_name = "unsloth/Qwen3-0.6B-Base";load_in_4bit = False
# model_name = "unsloth/Qwen3-1.7B-Base";load_in_4bit = False
model_name = "unsloth/Qwen3-4B-Base";load_in_4bit = False
# model_name = "Qwen3-4B-Base";load_in_4bit = False

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,load_in_4bit = load_in_4bit,
    max_seq_length = max_seq_length,
    dtype = dtype,
)

==((====))==  Unsloth 2025.5.7: Fast Qwen3 patching. Transformers: 4.51.3.
   \\   /|    NVIDIA GeForce RTX 4070 Laptop GPU. Num GPUs = 1. Max memory: 7.996 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu121. CUDA: 8.9. CUDA Toolkit: 12.1. Triton: 3.1.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards: 100%|██████████| 2/2 [00:02<00:00,  1.40s/it]
Some parameters are on the meta device because they were offloaded to the cpu.


We now trim the classification head so the model can only say numbers 0-NUM_CLASSES and no other words. (We don't use 0 here but keeping it makes everything simpler)

In [None]:
number_token_ids = []
for i in range(0, NUM_CLASSES+1):
    number_token_ids.append(tokenizer.encode(str(i), add_special_tokens=False)[0])
# keep only the number tokens from lm_head
par = torch.nn.Parameter(model.lm_head.weight[number_token_ids, :])

old_shape = model.lm_head.weight.shape
old_size = old_shape[0]
print(par.shape)
print(old_shape)

model.lm_head.weight = par

reverse_map = {value: idx for idx, value in enumerate(number_token_ids)} # will be used later to convert an idx from the old tokenizer to the new lm_head
reverse_map

In [None]:
from peft import LoftQConfig

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = [
        "lm_head", # can easily be trained because it now has a small size
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = True,  # We support rank stabilized LoRA
    # init_lora_weights = 'loftq',
    # loftq_config = LoftQConfig(loftq_bits = 4, loftq_iter = 1), # And LoftQ
)
print("trainable parameters:", sum(p.numel() for p in model.parameters() if p.requires_grad))

In [None]:
train_df = pd.read_csv(r"C:\Users\rizky\Documents\Projects\LLM-Finetuning\Malicious-Web\dataset\netpro_raw_25k_train.csv")
val_df = pd.read_csv(r"C:\Users\rizky\Documents\Projects\LLM-Finetuning\Malicious-Web\dataset\netpro_raw_25k_val.csv")

In [None]:
type(train_df), type(val_df)

In [None]:
print(len(train_df))

In [None]:
train_df

In [None]:
token_counts = [len(tokenizer.encode(x)) for x in train_df.Content]
# plot the token counts
a = plt.hist(token_counts, bins=30)

ClfC

In [None]:
CLASSIFY_CONTENT_ONLY = False  # Set to False to use Domain + Content

if CLASSIFY_CONTENT_ONLY:
    prompt = """You are an expert Website Classifier.

Website Content: {}

Classify the website based on its content into one of the following categories:
- 0: Benign (general info, news, safe entertainment, educational, marketplace, social media, etc.)
- 1: Gambling (betting, casino, lottery, real money games, judi, slot)
- 2: Pornography (explicit sexual content, adult themes, nudity, sexual, bokep)
- 3: Harmful (malware, cybercrime, illegal activities, firearms, extremism, drugs, narcotics, phishing, scams, counterfeit, hacking tools, stolen data markets, carding)

SOLUTION
The correct answer is: class {}"""

    def formatting_prompts_func(dataset_):
        texts = []
        for i in range(len(dataset_['Content'])):
            content_ = dataset_['Content'].iloc[i]
            label_ = dataset_['Label'].iloc[i]
            text = prompt.format(content_, label_)
            texts.append(text)
        return texts

else:
    prompt = """You are an expert Website Classifier.

Domain: {}
Website Content: {}

Classify the website based on its content into one of the following categories:
- 0: Benign (general info, news, safe entertainment, educational, marketplace, social media, etc.)
- 1: Gambling (betting, casino, lottery, real money games, judi, slot)
- 2: Pornography (explicit sexual content, adult themes, nudity, sexual, bokep)
- 3: Harmful (malware, cybercrime, illegal activities, firearms, extremism, drugs, narcotics, phishing, scams, counterfeit, hacking tools, stolen data markets, carding)

SOLUTION
The correct answer is: class {}"""

    def formatting_prompts_func(dataset_):
        texts = []
        for i in range(len(dataset_['Content'])):
            domain_ = dataset_['Domain'].iloc[i]
            content_ = dataset_['Content'].iloc[i]
            label_ = dataset_['Label'].iloc[i]
            text = prompt.format(domain_, content_, label_)
            texts.append(text)
        return texts

# apply formatting_prompts_func to train_df
train_df['text'] = formatting_prompts_func(train_df)
train_dataset = datasets.Dataset.from_pandas(train_df, preserve_index=False)

In [None]:
train_dataset

In [None]:
train_dataset['text'][0]

In [None]:
# this custom collator makes it so the model trains only on the last token of the sequence. It also maps from the old tokenizer to the new lm_head indices
class DataCollatorForLastTokenLM(DataCollatorForLanguageModeling):
    def __init__(
        self,
        *args,
        mlm: bool = False,
        ignore_index: int = -100,
        **kwargs,
    ):
        super().__init__(*args, mlm=mlm, **kwargs)
        self.ignore_index = ignore_index

    def torch_call(self, examples: List[Union[List[int], Any, Dict[str, Any]]]) -> Dict[str, Any]:
        batch = super().torch_call(examples)

        for i in range(len(examples)):
            # Find the last non-padding token
            last_token_idx = (batch["labels"][i] != self.ignore_index).nonzero()[-1].item()
            # Set all labels to ignore_index except for the last token
            batch["labels"][i, :last_token_idx] = self.ignore_index
            # If the last token in the text is, for example, "2", then this was processed with the old tokenizer into number_token_ids[2]
            # But we don't actually want this because number_token_ids[2] could be something like 27, which is now undefined in the new lm_head. So we map it to the new lm_head index.
            # if this line gives you a keyerror then increase max_seq_length
            batch["labels"][i, last_token_idx] = reverse_map[ batch["labels"][i, last_token_idx].item() ]


        return batch
collator = DataCollatorForLastTokenLM(tokenizer=tokenizer)

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`. We also support TRL's `DPOTrainer`!

In [None]:
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    max_seq_length = max_seq_length,
    dataset_num_proc = 1,
    packing = False, # not needed because group_by_length is True
    args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 1,
        warmup_steps = 10,
        learning_rate = 1e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine",
        seed = 3407,
        output_dir = "outputs/NetPro-Qwen3-4B-ClfDC",
        num_train_epochs = 1,
        # report_to = "wandb",
        report_to = "none",
        group_by_length = True,
    ),
    data_collator=collator,
    dataset_text_field="text",
)

In [None]:
#@title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

In [None]:
trainer_stats = trainer.train()

In [None]:
#@title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory         /max_memory*100, 3)
lora_percentage = round(used_memory_for_lora/max_memory*100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.")
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

### Save Model

In [None]:
model.save_pretrained("model/NetPro-Qwen3-4B-ClfDC")  # Local saving
tokenizer.save_pretrained("model/NetPro-Qwen3-4B-ClfDC")

In [None]:
from dotenv import load_dotenv

load_dotenv() 
HF_TOKEN = os.getenv("HF_TOKEN")
if not HF_TOKEN:
    raise ValueError("HF_TOKEN not found in .env file")

# Now push to hub using the cleaned name and loaded token
model.push_to_hub("NetPro-Qwen3-4B-ClfDC", token=HF_TOKEN)
tokenizer.push_to_hub("NetPro-Qwen3-4B-ClfDC", token=HF_TOKEN)

In [None]:
# stop running all cells
1/0

### Inference

In [None]:
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
print()

#### remake the old lm_head but with unused tokens having -1000 bias and 0 weights (improves compatibility with libraries like vllm)

In [None]:
# Save the current (trimmed) lm_head and bias
trimmed_lm_head = model.lm_head.weight.data.clone()
trimmed_lm_head_bias = model.lm_head.bias.data.clone() if hasattr(model.lm_head, "bias") and model.lm_head.bias is not None else torch.zeros(len(number_token_ids), device=trimmed_lm_head.device)

# Create a new lm_head with shape [old_size, hidden_dim]
hidden_dim = trimmed_lm_head.shape[1]
new_lm_head = torch.full((old_size, hidden_dim), 0, dtype=trimmed_lm_head.dtype, device=trimmed_lm_head.device)
new_lm_head_bias = torch.full((old_size,), -1000.0, dtype=trimmed_lm_head_bias.dtype, device=trimmed_lm_head_bias.device)

# Fill in the weights and bias for the allowed tokens (number_token_ids)
for new_idx, orig_token_id in enumerate(number_token_ids):
    new_lm_head[orig_token_id] = trimmed_lm_head[new_idx]
    new_lm_head_bias[orig_token_id] = trimmed_lm_head_bias[new_idx]

# Update the model's lm_head weight and bias
with torch.no_grad():
    new_lm_head_module = torch.nn.Linear(hidden_dim, old_size, bias=True, device=model.device)
    new_lm_head_module.weight.data.copy_(new_lm_head)
    new_lm_head_module.bias.data.copy_(new_lm_head_bias)
    model.lm_head.modules_to_save["default"] = new_lm_head_module

print(f"Remade lm_head: shape = {model.lm_head.weight.shape}. Allowed tokens: {number_token_ids}")

Now if you closed the notebook kernel and want to reload the model:

In [3]:
!git clone https://huggingface.co/jordinia/NetPro-Qwen3-0.6B-ClfDC

Cloning into 'NetPro-Qwen3-0.6B-ClfDC'...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


remote: Enumerating objects: 19, done.[K
remote: Counting objects: 100% (13/13), done.[K
remote: Compressing objects: 100% (13/13), done.[K
remote: Total 19 (delta 1), reused 0 (delta 0), pack-reused 6 (from 1)[K
Unpacking objects: 100% (19/19), 1.72 MiB | 1.25 MiB/s, done.


In [3]:
!git clone https://huggingface.co/jordinia/NetPro-Qwen3-1.7B-ClfDC

Cloning into 'NetPro-Qwen3-1.7B-ClfDC'...
Filtering content: 100% (2/2)
Filtering content: 100% (2/2), 77.46 MiB | 6.49 MiB/s, done.


In [2]:
!git clone https://huggingface.co/jordinia/NetPro-Qwen3-4B-ClfDC

Cloning into 'NetPro-Qwen3-4B-ClfDC'...
Filtering content: 100% (2/2)
Filtering content: 100% (2/2), 136.98 MiB | 2.99 MiB/s
Filtering content: 100% (2/2), 136.98 MiB | 2.48 MiB/s, done.


### Testing

In [None]:
from unsloth import FastLanguageModel
from peft import PeftModel
import torch

# --- Configuration (should match your training setup) ---
base_model_name = "unsloth/Qwen3-0.6B-Base" # The original base model
load_in_4bit_at_load_time = False # Matches your inference script
max_seq_length_at_load_time = 24000 # Matches your inference script
dtype_at_load_time = None # Matches your inference script

checkpoint_path = "./model/NetPro-Qwen3-0.6B-ClfDC"
NUM_CLASSES = 4 # Same as during training

# --- 1. Load the original base model ---
print(f"Loading base model: {base_model_name}...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=base_model_name,
    max_seq_length=max_seq_length_at_load_time,
    dtype=dtype_at_load_time,
    load_in_4bit=load_in_4bit_at_load_time,
)
print("Base model loaded.")

# --- 2. Re-apply the lm_head modification (EXACTLY as done in training) ---
print("Modifying lm_head to match training setup...")
number_token_ids = []
for i in range(0, NUM_CLASSES+1):
    number_token_ids.append(tokenizer.encode(str(i), add_special_tokens=False)[0])
# keep only the number tokens from lm_head
par = torch.nn.Parameter(model.lm_head.weight[number_token_ids, :])

old_shape = model.lm_head.weight.shape
old_size = old_shape[0]
print(par.shape)
print(old_shape)

model.lm_head.weight = par

reverse_map = {value: idx for idx, value in enumerate(number_token_ids)} # will be used later to convert an idx from the old tokenizer to the new lm_head
reverse_map

# --- 3. Load the LoRA adapter from the specific checkpoint ---
# Now that the model's lm_head has the correct (shrunken) shape,
# PeftModel can load the adapter weights without a size mismatch.
print(f"Loading LoRA adapter from: {checkpoint_path}")
model = PeftModel.from_pretrained(
    model, # The base model WITH THE MODIFIED lm_head
    checkpoint_path,
    is_trainable=False
)
print("LoRA adapter loaded successfully.")

# --- lm head ---
# Save the current (trimmed) lm_head and bias
trimmed_lm_head = model.lm_head.weight.data.clone()
trimmed_lm_head_bias = model.lm_head.bias.data.clone() if hasattr(model.lm_head, "bias") and model.lm_head.bias is not None else torch.zeros(len(number_token_ids), device=trimmed_lm_head.device)

# Create a new lm_head with shape [old_size, hidden_dim]
hidden_dim = trimmed_lm_head.shape[1]
new_lm_head = torch.full((old_size, hidden_dim), 0, dtype=trimmed_lm_head.dtype, device=trimmed_lm_head.device)
new_lm_head_bias = torch.full((old_size,), -1000.0, dtype=trimmed_lm_head_bias.dtype, device=trimmed_lm_head_bias.device)

# Fill in the weights and bias for the allowed tokens (number_token_ids)
for new_idx, orig_token_id in enumerate(number_token_ids):
    new_lm_head[orig_token_id] = trimmed_lm_head[new_idx]
    new_lm_head_bias[orig_token_id] = trimmed_lm_head_bias[new_idx]

# Update the model's lm_head weight and bias
with torch.no_grad():
    new_lm_head_module = torch.nn.Linear(hidden_dim, old_size, bias=True, device=model.device)
    new_lm_head_module.weight.data.copy_(new_lm_head)
    new_lm_head_module.bias.data.copy_(new_lm_head_bias)
    model.lm_head.modules_to_save["default"] = new_lm_head_module

print(f"Remade lm_head: shape = {model.lm_head.weight.shape}. Allowed tokens: {number_token_ids}")

# --- 4. Prepare for inference ---
FastLanguageModel.for_inference(model) # Unsloth's optimization for inference
print("Model prepared for inference.")

# --- 5. Your Inference Prompt and Generation ---
prompt_template = """You are an expert Website Classifier.

Domain: {} 
Website Content: {} 

Classify the website based on its content into one of the following categories:
- 0: Benign (general info, news, safe entertainment, educational, marketplace, social media, etc.)
- 1: Gambling (betting, casino, lottery, real money games, judi, slot)
- 2: Pornography (explicit sexual content, adult themes, nudity, sexual, bokep)
- 3: Harmful (malware, cybercrime, illegal activities, firearms, extremism, drugs, narcotics, phishing, scams, counterfeit, hacking tools, stolen data markets, carding)

SOLUTION
The correct answer is: class """  # Note: Removed the final {} for inference

# Example values
website_domain_example = "example.com"
website_content_example = "sample content"

# Format the prompt for inference (model should generate the class number)
full_prompt_for_inference = prompt_template.format(website_domain_example, website_content_example)

inputs = tokenizer(full_prompt_for_inference, return_tensors="pt").to(model.device)

print("Generating output...")
outputs = model.generate(**inputs, max_new_tokens=1, use_cache=True, pad_token_id=tokenizer.eos_token_id) # Added pad_token_id
# For classification, you typically want deterministic output, so low/zero temperature:
# outputs = model.generate(**inputs, max_new_tokens=1, do_sample=False, pad_token_id=tokenizer.eos_token_id)

decoded_outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print("Full decoded output:", decoded_outputs)

# Extract just the newly generated token
generated_sequence = outputs[0]
input_length = inputs.input_ids.shape[1]
newly_generated_tokens = generated_sequence[input_length:]
predicted_class_token = tokenizer.decode(newly_generated_tokens, skip_special_tokens=True)

print(f"Predicted class token: '{predicted_class_token}'")

# Validation

In [1]:
import pandas as pd
import torch
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from tqdm import tqdm
from unsloth import FastLanguageModel
from peft import PeftModel
import torch

# --- Configuration (should match your training setup) ---
base_model_name = "unsloth/Qwen3-0.6B-Base" # The original base model
load_in_4bit_at_load_time = False # Matches your inference script
max_seq_length_at_load_time = 24000 # Matches your inference script
dtype_at_load_time = None # Matches your inference script
output_csv_filename = "output_1,6B_ClfDC.csv"
output_report_filename = "report_1,6B_ClfDC.txt"
val_filename = "dataset/test_balanced.csv"

checkpoint_path = "./model/NetPro-Qwen3-0.6B-ClfDC"
NUM_CLASSES = 4 # Same as during training

# --- 1. Load the original base model ---
print(f"Loading base model: {base_model_name}...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=base_model_name,
    max_seq_length=max_seq_length_at_load_time,
    dtype=dtype_at_load_time,
    load_in_4bit=load_in_4bit_at_load_time,
)
print("Base model loaded.")

# --- 2. Re-apply the lm_head modification (EXACTLY as done in training) ---
print("Modifying lm_head to match training setup...")
number_token_ids = []
for i in range(0, NUM_CLASSES+1):
    number_token_ids.append(tokenizer.encode(str(i), add_special_tokens=False)[0])
# keep only the number tokens from lm_head
par = torch.nn.Parameter(model.lm_head.weight[number_token_ids, :])

old_shape = model.lm_head.weight.shape
old_size = old_shape[0]
print(par.shape)
print(old_shape)

model.lm_head.weight = par

reverse_map = {value: idx for idx, value in enumerate(number_token_ids)} # will be used later to convert an idx from the old tokenizer to the new lm_head
reverse_map

# --- 3. Load the LoRA adapter from the specific checkpoint ---
# Now that the model's lm_head has the correct (shrunken) shape,
# PeftModel can load the adapter weights without a size mismatch.
print(f"Loading LoRA adapter from: {checkpoint_path}")
model = PeftModel.from_pretrained(
    model, # The base model WITH THE MODIFIED lm_head
    checkpoint_path,
    is_trainable=False
)
print("LoRA adapter loaded successfully.")

# --- lm head ---
# Save the current (trimmed) lm_head and bias
trimmed_lm_head = model.lm_head.weight.data.clone()
trimmed_lm_head_bias = model.lm_head.bias.data.clone() if hasattr(model.lm_head, "bias") and model.lm_head.bias is not None else torch.zeros(len(number_token_ids), device=trimmed_lm_head.device)

# Create a new lm_head with shape [old_size, hidden_dim]
hidden_dim = trimmed_lm_head.shape[1]
new_lm_head = torch.full((old_size, hidden_dim), 0, dtype=trimmed_lm_head.dtype, device=trimmed_lm_head.device)
new_lm_head_bias = torch.full((old_size,), -1000.0, dtype=trimmed_lm_head_bias.dtype, device=trimmed_lm_head_bias.device)

# Fill in the weights and bias for the allowed tokens (number_token_ids)
for new_idx, orig_token_id in enumerate(number_token_ids):
    new_lm_head[orig_token_id] = trimmed_lm_head[new_idx]
    new_lm_head_bias[orig_token_id] = trimmed_lm_head_bias[new_idx]

# Update the model's lm_head weight and bias
with torch.no_grad():
    new_lm_head_module = torch.nn.Linear(hidden_dim, old_size, bias=True, device=model.device)
    new_lm_head_module.weight.data.copy_(new_lm_head)
    new_lm_head_module.bias.data.copy_(new_lm_head_bias)
    model.lm_head.modules_to_save["default"] = new_lm_head_module

print(f"Remade lm_head: shape = {model.lm_head.weight.shape}. Allowed tokens: {number_token_ids}")

# --- 4. Prepare for inference ---
FastLanguageModel.for_inference(model) # Unsloth's optimization for inference
print("Model prepared for inference.")

# ...existing code...

prompt_template = """You are an expert Website Classifier.

Domain: {}
Website Content: {}

Classify the website based on its content into one of the following categories:
- 0: Benign (general info, news, safe entertainment, educational, marketplace, social media, etc.)
- 1: Gambling (betting, casino, lottery, real money games, judi, slot)
- 2: Pornography (explicit sexual content, adult themes, nudity, sexual, bokep)
- 3: Harmful (malware, cybercrime, illegal activities, firearms, extremism, drugs, narcotics, phishing, scams, counterfeit, hacking tools, stolen data markets, carding)

SOLUTION
The correct answer is: class """

# Load validation data
val_df = pd.read_csv(val_filename, encoding="utf-8")

# Store predictions
predicted_labels = []

for _, row in tqdm(val_df.iterrows(), total=len(val_df), desc="Predicting"):
    domain = row['Domain']
    content = row['Content']
    full_prompt_for_inference = prompt_template.format(domain, content)
    inputs = tokenizer(full_prompt_for_inference, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=1, use_cache=True, pad_token_id=tokenizer.eos_token_id)
    generated_sequence = outputs[0]
    input_length = inputs.input_ids.shape[1]
    newly_generated_tokens = generated_sequence[input_length:]
    predicted_class_token = tokenizer.decode(newly_generated_tokens, skip_special_tokens=True)
    try:
        predicted_class_int = int(predicted_class_token.strip())
    except Exception:
        predicted_class_int = -1  # or any invalid class
    predicted_labels.append(predicted_class_int)

# True labels
true_labels = val_df['Label']

# Evaluation
accuracy = accuracy_score(true_labels, predicted_labels)
precision = precision_score(true_labels, predicted_labels, average='weighted', zero_division=0)
recall = recall_score(true_labels, predicted_labels, average='weighted', zero_division=0)
f1 = f1_score(  true_labels, predicted_labels, average='weighted', zero_division=0)
report = classification_report(true_labels, predicted_labels)

print("Evaluation Metrics:")
print(f"Accuracy : {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall   : {recall:.4f}")
print(f"F1 Score : {f1:.4f}")
print("\nDetailed classification report:\n")
print(report)

# Save metrics to .txt
with open(output_report_filename, "w", encoding="utf-8") as f:
    f.write("Evaluation Metrics:\n")
    f.write(f"Accuracy : {accuracy:.4f}\n")
    f.write(f"Precision: {precision:.4f}\n")
    f.write(f"Recall   : {recall:.4f}\n")
    f.write(f"F1 Score : {f1:.4f}\n\n")
    f.write("Detailed classification report:\n")
    f.write(report)

# Add predictions to DataFrame and export
val_df['predicted_label'] = predicted_labels
val_df.to_csv(output_csv_filename, index=False)
print(f"\nClassification complete. Output saved to '{output_csv_filename}' and '{output_report_filename}'")

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
Loading base model: unsloth/Qwen3-0.6B-Base...


  GPU_BUFFERS = tuple([torch.empty(2*256*2048, dtype = dtype, device = f"cuda:{i}") for i in range(n_gpus)])


==((====))==  Unsloth 2025.4.8: Fast Qwen3 patching. Transformers: 4.51.3.
   \\   /|    NVIDIA GeForce RTX 3090. Num GPUs = 1. Max memory: 24.0 GB. Platform: Windows.
O^O/ \_/ \    Torch: 2.7.0+cu126. CUDA: 8.6. CUDA Toolkit: 12.6. Triton: 3.3.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.30. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Base model loaded.
Modifying lm_head to match training setup...
torch.Size([5, 1024])
torch.Size([151936, 1024])
Loading LoRA adapter from: ./model/NetPro-Qwen3-0.6B-ClfDC
LoRA adapter loaded successfully.
Remade lm_head: shape = torch.Size([151936, 1024]). Allowed tokens: [15, 16, 17, 18, 19]
Model prepared for inference.


Predicting: 100%|██████████| 992/992 [01:16<00:00, 13.05it/s]

Evaluation Metrics:
Accuracy : 0.9133
Precision: 0.9188
Recall   : 0.9133
F1 Score : 0.9123

Detailed classification report:

              precision    recall  f1-score   support

           0       0.85      0.95      0.90       248
           1       0.97      0.98      0.98       248
           2       0.88      0.95      0.91       248
           3       0.97      0.78      0.87       248

    accuracy                           0.91       992
   macro avg       0.92      0.91      0.91       992
weighted avg       0.92      0.91      0.91       992


Classification complete. Output saved to 'output_1,6B_ClfDC.csv' and 'report_1,6B_ClfDC.txt'



