### Installation

In [1]:
%%capture
!pip install pip3-autoremove
!pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu124
!pip install unsloth
# !pip install --upgrade transformers==4.52.3

### Unsloth

In [2]:
from unsloth import FastLanguageModel
import torch

fourbit_models = [
    "unsloth/Qwen3-1.7B-unsloth-bnb-4bit", # Qwen 14B 2x faster
    "unsloth/Qwen3-4B-unsloth-bnb-4bit",
    "unsloth/Qwen3-8B-unsloth-bnb-4bit",
    "unsloth/Qwen3-14B-unsloth-bnb-4bit",
    "unsloth/Qwen3-32B-unsloth-bnb-4bit",

    # 4bit dynamic quants for superior accuracy and low memory use
    "unsloth/gemma-3-12b-it-unsloth-bnb-4bit",
    "unsloth/Phi-4",
    "unsloth/Llama-3.1-8B",
    "unsloth/Llama-3.2-3B",
    "unsloth/orpheus-3b-0.1-ft-unsloth-bnb-4bit" # [NEW] We support TTS models!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Qwen3-14B",
    max_seq_length = 512,   # Context length - can be longer, but uses more memory
    load_in_4bit = True,     # 4bit uses much less memory
    load_in_8bit = False,    # A bit more accurate, uses 2x memory
    full_finetuning = False, # We have full finetuning now!
    # token = "hf_...",      # use one if using gated models
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.


2025-06-24 15:51:34.799282: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750780295.032519      35 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750780295.100397      35 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.6.5: Fast Qwen3 patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/168k [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.59G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/1.56G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/237 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/707 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/4.67k [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 32,           # Choose any number > 0! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 32,  # Best to choose alpha = rank or rank*2
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,   # We support rank stabilized LoRA
    loftq_config = None,  # And LoftQ
)

Unsloth 2025.6.5 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.


<a name="Data"></a>
### Data Prep

In [4]:
import re
import ast
from datasets import load_dataset

# Load the dataset
dataset = load_dataset("csv", data_files="/kaggle/input/intraml/train (1).csv", split="train")

# Split the dataset into train, validation, and test sets
train_testval = dataset.train_test_split(test_size=0.1, seed=3407)  # 90% train, 10% for test+val

# Define the datasets
train_dataset = train_testval["train"]
test_dataset = train_testval["test"]

# Print dataset sizes for verification
print(f"Train dataset size: {len(train_dataset)}")
print(f"Test dataset size: {len(test_dataset)}")

def preprocess_dataset(examples):
    instructions, inputs, outputs, problems, solutions = [], [], [], [], []

    for instr, opt_str, ans in zip(examples["question"], examples["options"], examples["answer"]):
        try:
            instr = str(instr)
            opt_str = str(opt_str)
            ans = str(ans).strip().upper()

            # Fix malformed list
            opt_str_fixed = re.sub(r"'\s*'", "', '", opt_str)
            options_list = ast.literal_eval(opt_str_fixed)

            # Add A., B., C., ...
            labeled_options = [f"{chr(65+i)}. {opt.strip()}" for i, opt in enumerate(options_list)]

            # Correct answer
            ans_index = ord(ans) - ord('A')
            correct_answer = labeled_options[ans_index] if 0 <= ans_index < len(labeled_options) else "Invalid"

            # Compose formatted strings
            input_str = f"options: {labeled_options}"
            problem_str = (
                f'As a physics expert, solve the physics question which is in bangla language (but some units or terms maybe are in english). First translate the question into english with proper bangla to english physics terms. Then reason step by step in english. Then select the correct option:\n'
                f'Question: "{instr}", "options: {labeled_options}"'
            )

            # Append to lists
            instructions.append(instr)
            inputs.append(input_str)
            outputs.append(correct_answer)
            problems.append(problem_str)
            solutions.append(correct_answer)

        except Exception as e:
            instructions.append(instr)
            inputs.append("options: []")
            outputs.append("Invalid")
            problems.append("Invalid")
            solutions.append("Invalid")

    return {
        "instruction": instructions,
        "input": inputs,
        "output": outputs,
        "problem": problems,
        "Solution": solutions
    }

# Apply preprocessing to all splits
train_dataset = train_dataset.map(preprocess_dataset, batched=True)
test_dataset = test_dataset.map(preprocess_dataset, batched=True)

Generating train split: 0 examples [00:00, ? examples/s]

Train dataset size: 1350
Test dataset size: 150


Map:   0%|          | 0/1350 [00:00<?, ? examples/s]

Map:   0%|          | 0/150 [00:00<?, ? examples/s]

In [5]:
def generate_conversation(examples):
    problems  = examples["problem"]
    solutions = examples["Solution"]
    conversations = []
    for problem, Solution in zip(problems, solutions):
        conversations.append([
            {"role" : "user",      "content" : problem},
            {"role" : "assistant", "content" : Solution},
        ])
    return { "conversations": conversations, }

In [6]:
train_dataset = tokenizer.apply_chat_template(
    train_dataset.map(generate_conversation, batched = True)["conversations"],
    tokenize = False,
)

Map:   0%|          | 0/1350 [00:00<?, ? examples/s]

Let's see the first transformed row:

In [7]:
train_dataset[0]

'<|im_start|>user\nAs a physics expert, solve the physics question which is in bangla language (but some units or terms maybe are in english). First translate the question into english with proper bangla to english physics terms. Then reason step by step in english. Then select the correct option:\nQuestion: "আলোর অপবর্তন নিচের কোন কারণে ঘটে?", "options: [\'A. প্রতিফলন\', \'B. ব্যতিচার\', \'C. সমবর্তন\', \'D. প্রতিসরণ\']"<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\nB. ব্যতিচার<|im_end|>\n'

In [8]:
import pandas as pd
data =  pd.Series(train_dataset)

In [9]:
data[0]

'<|im_start|>user\nAs a physics expert, solve the physics question which is in bangla language (but some units or terms maybe are in english). First translate the question into english with proper bangla to english physics terms. Then reason step by step in english. Then select the correct option:\nQuestion: "আলোর অপবর্তন নিচের কোন কারণে ঘটে?", "options: [\'A. প্রতিফলন\', \'B. ব্যতিচার\', \'C. সমবর্তন\', \'D. প্রতিসরণ\']"<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\nB. ব্যতিচার<|im_end|>\n'

In [10]:
from datasets import Dataset
data.name = "text"
data = Dataset.from_pandas(pd.DataFrame(data))

In [11]:
data[0]

{'text': '<|im_start|>user\nAs a physics expert, solve the physics question which is in bangla language (but some units or terms maybe are in english). First translate the question into english with proper bangla to english physics terms. Then reason step by step in english. Then select the correct option:\nQuestion: "আলোর অপবর্তন নিচের কোন কারণে ঘটে?", "options: [\'A. প্রতিফলন\', \'B. ব্যতিচার\', \'C. সমবর্তন\', \'D. প্রতিসরণ\']"<|im_end|>\n<|im_start|>assistant\n<think>\n\n</think>\n\nB. ব্যতিচার<|im_end|>\n'}

Next we take the non reasoning dataset and convert it to conversational format as well.

We have to use Unsloth's `standardize_sharegpt` function to fix up the format of the dataset first.

<a name="Train"></a>
### Train the model
Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`.

In [12]:
from unsloth import is_bfloat16_supported

from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = data,
    eval_dataset = None, # Can set up evaluation!
    args = SFTConfig(
        dataset_text_field = "text",
        per_device_train_batch_size = 2,
        bf16 = is_bfloat16_supported(),
        fp16 = not is_bfloat16_supported(),
        gradient_accumulation_steps = 4, # Use GA to mimic batch size!
        warmup_steps = 5,
        num_train_epochs = 1, # Set this for 1 full training run.
        # max_steps = 42,
        learning_rate = 2e-4, # Reduce to 2e-5 for long training runs
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        report_to = "none", # Use this for WandB etc
    ),
)

average_tokens_across_devices is set to True but it is invalid when world size is1. Turn it to False automatically.


Unsloth: Tokenizing ["text"] (num_proc=4):   0%|          | 0/1350 [00:00<?, ? examples/s]

In [13]:
# @title Show current memory stats
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

GPU = Tesla T4. Max memory = 14.741 GB.
11.898 GB of memory reserved.


Let's train the model! To resume a training run, set `trainer.train(resume_from_checkpoint = True)`

In [14]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,350 | Num Epochs = 1 | Total steps = 84
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 4 x 1) = 16
 "-____-"     Trainable parameters = 128,450,560/14,000,000,000 (0.92% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,1.8716
2,1.8759
3,1.9413
4,1.8897
5,1.6872
6,1.491
7,1.3285
8,1.0533
9,0.9319
10,0.8205


In [15]:
# @title Show final memory and time stats
used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
used_memory_for_lora = round(used_memory - start_gpu_memory, 3)
used_percentage = round(used_memory / max_memory * 100, 3)
lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)
print(f"{trainer_stats.metrics['train_runtime']} seconds used for training.")
print(
    f"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training."
)
print(f"Peak reserved memory = {used_memory} GB.")
print(f"Peak reserved memory for training = {used_memory_for_lora} GB.")
print(f"Peak reserved memory % of max memory = {used_percentage} %.")
print(f"Peak reserved memory for training % of max memory = {lora_percentage} %.")

2295.9829 seconds used for training.
38.27 minutes used for training.
Peak reserved memory = 13.801 GB.
Peak reserved memory for training = 1.903 GB.
Peak reserved memory % of max memory = 93.623 %.
Peak reserved memory for training % of max memory = 12.91 %.


<a name="Inference"></a>
### Inference
Let's run the model via Unsloth native inference! According to the `Qwen-3` team, the recommended settings for reasoning inference are `temperature = 0.6, top_p = 0.95, top_k = 20`

For normal chat based inference, `temperature = 0.7, top_p = 0.8, top_k = 20`

In [16]:
messages = [
    {"role" : "user", "content" : "As a physics expert, solve the physics question which is in bangla language (but some units or terms maybe are in english). First translate the question into english with proper bangla to english physics terms. Then reason step by step in english. Then select the correct option :\nQuestion: \"অপবর্তন এক বিশেষ ধরনের—\", \"options: ['A. সমবর্তন', 'B. প্রতিফলন', 'C. ব্যাতিচার', 'D. প্রতিসরণ']\""}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, # Must add for generation
    enable_thinking = False, # Disable thinking
)

from transformers import TextStreamer
_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    max_new_tokens = 256, # Increase for longer outputs!
    temperature = 0.7, top_p = 0.8, top_k = 20, # For non thinking
    streamer = TextStreamer(tokenizer, skip_prompt = True),
)

D. প্রতিসরণ<|im_end|>


In [17]:
test_dataset[0]

{'id': '59668f6f-6e6f-46f1-90d4-0235f3952bf3-128390',
 'question': 'আয়তনে বাংলাদেশের বৃহত্তম জেলা কোনটি?',
 'options': "['রাঙামাটি' 'বান্দরবান' 'ঢাকা' 'দিনাজপুর']",
 'answer': 'A',
 'instruction': 'আয়তনে বাংলাদেশের বৃহত্তম জেলা কোনটি?',
 'input': "options: ['A. রাঙামাটি', 'B. বান্দরবান', 'C. ঢাকা', 'D. দিনাজপুর']",
 'output': 'A. রাঙামাটি',
 'problem': 'As a physics expert, solve the physics question which is in bangla language (but some units or terms maybe are in english). First translate the question into english with proper bangla to english physics terms. Then reason step by step in english. Then select the correct option:\nQuestion: "আয়তনে বাংলাদেশের বৃহত্তম জেলা কোনটি?", "options: [\'A. রাঙামাটি\', \'B. বান্দরবান\', \'C. ঢাকা\', \'D. দিনাজপুর\']"',
 'Solution': 'A. রাঙামাটি'}

In [18]:
from transformers import TextStreamer
import torch
from tqdm import tqdm

model.eval()
model.to("cuda")

generated_outputs = []

# test_subset = test_dataset.select(range(10))  # keeps Dataset format with dict examples

test_subset = test_dataset  # first 10 samples

for example in tqdm(test_subset):
    # Your prompt is in example['problem']
    messages = [{"role": "user", "content": example["problem"]}]
    
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=False,
    )

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=64,
            temperature=0.7,
            top_p=0.8,
            top_k=20,
            do_sample=False,
        )

    # Extract generated tokens only (skip prompt tokens)
    output_ids = outputs[0][inputs["input_ids"].shape[-1]:]
    generated_response = tokenizer.decode(output_ids, skip_special_tokens=True).strip()

    generated_outputs.append({
        "id": example["id"],
        "problem": example["problem"],
        "ground_truth": example["Solution"],
        "model_output": generated_response
    })

# (Optional) convert to dataframe for easier viewing
import pandas as pd
df = pd.DataFrame(generated_outputs)
print(df[["id", "model_output", "ground_truth"]])


100%|██████████| 150/150 [05:27<00:00,  2.18s/it]

                                              id  \
0    59668f6f-6e6f-46f1-90d4-0235f3952bf3-128390   
1     fb785d12-d042-48f0-81fb-30801bc8c7ba-27228   
2     36954207-f560-447e-ad48-3edc59a93a5d-34411   
3     d6e458b6-9567-435e-9ae1-20c017584b74-50794   
4     e43d17dc-e22b-4ed5-886a-661a0607b48c-71727   
..                                           ...   
145   63a1630e-a1f7-41d3-bfde-93dd93a0c021-66388   
146  d86f80c3-711e-4ad5-b218-f9acef82d243-127615   
147   c4ae031b-7d58-42e3-a959-29a198237cc7-59974   
148   de2d6086-7554-47b1-b97f-f78d422a0a8c-91061   
149  4ec422d9-0d45-4b8f-a17d-df5df3faf6e0-121850   

                            model_output                         ground_truth  
0                           B. বান্দরবান                          A. রাঙামাটি  
1                              A. 8.46 J                           D. 5.787 J  
2                               D. শক্তি                             D. শক্তি  
3                       B. π/1800 rads-1               




In [19]:
import pandas as pd
pd.DataFrame(generated_outputs).to_csv("test_predictions.csv", index=False)


In [20]:
import pandas as pd
from sklearn.metrics import classification_report, f1_score

# Load predictions from the CSV
df = pd.read_csv("test_predictions.csv")

# Extract the label letter from predictions and ground truth (e.g., "C. ব্যাতিচার" → "C")
def extract_label(text):
    if isinstance(text, str) and "." in text:
        return text.split(".")[0].strip().upper()
    return "X"  # fallback for malformed or missing labels

y_true = df["ground_truth"].apply(extract_label)
y_pred = df["model_output"].apply(extract_label)

# Print classification report
print("Classification Report:")
print(classification_report(y_true, y_pred, labels=["A", "B", "C", "D"], zero_division=0))

# Print macro F1 score
f1 = f1_score(y_true, y_pred, average="macro", zero_division=0)
print(f"\nMacro F1 Score: {f1:.4f}")


Classification Report:
              precision    recall  f1-score   support

           A       0.75      0.69      0.72        35
           B       0.65      0.73      0.69        44
           C       0.85      0.81      0.83        36
           D       0.68      0.66      0.67        35

   micro avg       0.72      0.72      0.72       150
   macro avg       0.73      0.72      0.72       150
weighted avg       0.73      0.72      0.72       150


Macro F1 Score: 0.5800


In [21]:
real_test_dataset = load_dataset("csv", data_files="/kaggle/input/intra-ml1/test (1).csv", split="train")

Generating train split: 0 examples [00:00, ? examples/s]

In [22]:
real_test_dataset[0]

{'id': 1,
 'question': 'কোনো বস্তুর চৌম্বকত্ব ধারকত্ব পরিমাপ করা হয়-',
 'options': "['চুম্বকনকারি বলা হয় ' 'সম্পৃক্ত দ্বারা ' 'আবিষ্ট চুম্বকত্ব দ্বারা '\n 'উপরের কোনোটিই নয় ']"}

In [23]:
import re
import ast

def preprocess_real_test(examples):
    instructions, inputs, problems = [], [], []

    for instr, opt_str in zip(examples["question"], examples["options"]):
        try:
            instr = str(instr)
            opt_str = str(opt_str)

            # Fix malformed list string: add missing commas
            opt_str_fixed = re.sub(r"'\s*'", "', '", opt_str)
            options_list = ast.literal_eval(opt_str_fixed)

            # Label options as A., B., C., ...
            labeled_options = [f"{chr(65+i)}. {opt.strip()}" for i, opt in enumerate(options_list)]

            input_str = f"options: {labeled_options}"

            problem_str = (
                f'As a physics expert, solve the physics question which is in bangla language (but some units or terms maybe are in english). First translate the question into english with proper bangla to english physics terms. Then reason step by step in english. Then select the correct option:\n'
                f'Question: "{instr}", "options: {labeled_options}"'
            )

            instructions.append(instr)
            inputs.append(input_str)
            problems.append(problem_str)

        except Exception as e:
            instructions.append(instr)
            inputs.append("options: []")
            problems.append("Invalid")

    return {
        "instruction": instructions,
        "input": inputs,
        "problem": problems
    }


In [24]:
real_test_dataset = real_test_dataset.map(preprocess_real_test, batched=True)

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

In [25]:
real_test_dataset[0]

{'id': 1,
 'question': 'কোনো বস্তুর চৌম্বকত্ব ধারকত্ব পরিমাপ করা হয়-',
 'options': "['চুম্বকনকারি বলা হয় ' 'সম্পৃক্ত দ্বারা ' 'আবিষ্ট চুম্বকত্ব দ্বারা '\n 'উপরের কোনোটিই নয় ']",
 'instruction': 'কোনো বস্তুর চৌম্বকত্ব ধারকত্ব পরিমাপ করা হয়-',
 'input': "options: ['A. চুম্বকনকারি বলা হয়', 'B. সম্পৃক্ত দ্বারা', 'C. আবিষ্ট চুম্বকত্ব দ্বারা', 'D. উপরের কোনোটিই নয়']",
 'problem': 'As a physics expert, solve the physics question which is in bangla language (but some units or terms maybe are in english). First translate the question into english with proper bangla to english physics terms. Then reason step by step in english. Then select the correct option:\nQuestion: "কোনো বস্তুর চৌম্বকত্ব ধারকত্ব পরিমাপ করা হয়-", "options: [\'A. চুম্বকনকারি বলা হয়\', \'B. সম্পৃক্ত দ্বারা\', \'C. আবিষ্ট চুম্বকত্ব দ্বারা\', \'D. উপরের কোনোটিই নয়\']"'}

In [26]:
import pandas as pd
from tqdm import tqdm
import torch

model.eval()
model.to("cuda")

results = []

for example in tqdm(real_test_dataset):  # full test dataset
    messages = [{"role": "user", "content": example["problem"]}]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=False,
    )

    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=64,
            temperature=0.7,
            top_p=0.8,
            top_k=20,
            do_sample=False,
        )

    output_ids = outputs[0][inputs["input_ids"].shape[-1]:]
    generated_response = tokenizer.decode(output_ids, skip_special_tokens=True).strip()

    # Extract the label letter (A/B/C/D) from generated_response
    # Assumes generated_response format like "C. ব্যাতিচার"
    pred_label = generated_response.split('.')[0].strip().upper()

    results.append({
        "id": example["id"],
        "answer": pred_label
    })

# Save submission CSV
submission_df = pd.DataFrame(results)
submission_df.to_csv("submission.csv", index=False)

print("Submission file 'submission.csv' created successfully.")


100%|██████████| 200/200 [08:05<00:00,  2.43s/it]

Submission file 'submission.csv' created successfully.





<a name="Save"></a>
### Saving, loading finetuned models
To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.

**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

In [27]:
model.save_pretrained("lora_model")  # Local saving
tokenizer.save_pretrained("lora_model")
# model.push_to_hub("your_name/lora_model", token = "...") # Online saving
# tokenizer.push_to_hub("your_name/lora_model", token = "...") # Online saving

('lora_model/tokenizer_config.json',
 'lora_model/special_tokens_map.json',
 'lora_model/vocab.json',
 'lora_model/merges.txt',
 'lora_model/added_tokens.json',
 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:

In [28]:
if False:
    from unsloth import FastLanguageModel
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = 2048,
        load_in_4bit = True,
    )

### Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [29]:
# Merge to 16bit
if False:
    model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: # Pushing to HF Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False:
    model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: # Pushing to HF Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False:
    model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: # Pushing to HF Hub
    model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")

### GGUF / llama.cpp Conversion
To save to `GGUF` / `llama.cpp`, we support it natively now! We clone `llama.cpp` and we default save it to `q8_0`. We allow all methods like `q4_k_m`. Use `save_pretrained_gguf` for local saving and `push_to_hub_gguf` for uploading to HF.

Some supported quant methods (full list on our [Wiki page](https://github.com/unslothai/unsloth/wiki#gguf-quantization-options)):
* `q8_0` - Fast conversion. High resource use, but generally acceptable.
* `q4_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
* `q5_k_m` - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[**NEW**] To finetune and auto export to Ollama, try our [Ollama notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)

In [30]:
# Save to 8bit Q8_0
if False:
    model.save_pretrained_gguf("model", tokenizer,)
# Remember to go to https://huggingface.co/settings/tokens for a token!
# And change hf to your username!
if False:
    model.push_to_hub_gguf("hf/model", tokenizer, token = "")

# Save to 16bit GGUF
if False:
    model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
if False: # Pushing to HF Hub
    model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "f16", token = "")

# Save to q4_k_m GGUF
if False:
    model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
if False: # Pushing to HF Hub
    model.push_to_hub_gguf("hf/model", tokenizer, quantization_method = "q4_k_m", token = "")

# Save to multiple GGUF options - much faster if you want multiple!
if False:
    model.push_to_hub_gguf(
        "hf/model", # Change hf to your username!
        tokenizer,
        quantization_method = ["q4_k_m", "q8_0", "q5_k_m",],
        token = "", # Get a token at https://huggingface.co/settings/tokens
    )

Now, use the `model-unsloth.gguf` file or `model-unsloth-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)

And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:
1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)
2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)
3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)
6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!

<div class="align-center">
  <a href="https://unsloth.ai"><img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="115"></a>
  <a href="https://discord.gg/unsloth"><img src="https://github.com/unslothai/unsloth/raw/main/images/Discord.png" width="145"></a>
  <a href="https://docs.unsloth.ai/"><img src="https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true" width="125"></a>

  Join Discord if you need help + ⭐️ <i>Star us on <a href="https://github.com/unslothai/unsloth">Github</a> </i> ⭐️
</div>
