# Fine-Tuning LLaMA 3.1-8B-Instruct on Bengali Empathetic Conversations
Complete pipeline with all fixes applied ‚Äî Works perfectly on 2√óT4 GPUs (No OOM, No bf16 error)

## 1. Installation

In [1]:
%%capture
import os

!pip install pip3-autoremove
!pip install torch torchvision torchaudio xformers --index-url https://download.pytorch.org/whl/cu128
!pip install unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2
!pip install datasets evaluate nltk rouge-score sacrebleu
!pip install pandas numpy matplotlib seaborn

## 2. Import Dependencies and Setup

## 3. Database Schema Implementation

In [2]:
import torch
import json
import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime
from typing import Dict, List, Tuple
from dataclasses import dataclass, asdict

from unsloth import FastLanguageModel
from transformers import TextStreamer
from trl import SFTConfig, SFTTrainer
from datasets import Dataset
import evaluate

# Critical for multi-GPU + T4 support
from accelerate import Accelerator
accelerator = Accelerator()
device = accelerator.device
print(f"Using {torch.cuda.device_count()} GPU(s) - T4 compatible mode (fp16)")

LOG_DIR = Path("experiment_logs")
LOG_DIR.mkdir(exist_ok=True)

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.


2025-12-03 13:21:32.266281: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1764768092.496357      20 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1764768092.557853      20 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


ü¶• Unsloth Zoo will now patch everything to make training faster!
Using 2 GPU(s) - T4 compatible mode (fp16)


In [3]:
class ExperimentDatabase:
    def __init__(self, db_path: str = "experiment_logs/llama_experiments.db"):
        import sqlite3
        self.db_path = db_path
        self.conn = sqlite3.connect(db_path)
        cursor = self.conn.cursor()
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS LLAMAExperiments (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                model_name TEXT,
                lora_config TEXT,
                train_loss REAL,
                val_loss REAL,
                metrics TEXT,
                timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
            )
        """)
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS GeneratedResponses (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                experiment_id INTEGER,
                input_text TEXT,
                response_text TEXT,
                FOREIGN KEY (experiment_id) REFERENCES LLAMAExperiments(id)
            )
        """)
        self.conn.commit()

    def log_experiment(self, model_name, lora_config, train_loss, val_loss, metrics):
        import json
        cursor = self.conn.cursor()
        cursor.execute("INSERT INTO LLAMAExperiments (model_name, lora_config, train_loss, val_loss, metrics) VALUES (?, ?, ?, ?, ?)",
                      (model_name, json.dumps(lora_config), train_loss, val_loss, json.dumps(metrics)))
        self.conn.commit()
        return cursor.lastrowid

    def log_response(self, exp_id, inp, resp):
        cursor = self.conn.cursor()
        cursor.execute("INSERT INTO GeneratedResponses (experiment_id, input_text, response_text) VALUES (?, ?, ?)",
                      (exp_id, inp, resp))
        self.conn.commit()

    def close(self):
        self.conn.close()

## 4. Configuration Classes

In [None]:
@dataclass
class LoRAConfig:
    r: int = 32
    lora_alpha: int = 16
    lora_dropout: float = 0.05
    bias: str = "none"
    target_modules: List[str] = None
    use_gradient_checkpointing: str = "unsloth"

    def __post_init__(self):
        if self.target_modules is None:
            self.target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                                 "gate_proj", "up_proj", "down_proj"]

    def to_dict(self): return asdict(self)

@dataclass
class TrainingConfig:
    max_seq_length: int = 128000
    per_device_train_batch_size: int = 2
    per_device_eval_batch_size: int = 2
    gradient_accumulation_steps: int = 8
    warmup_steps: int = 10
    max_steps: int = 200
    learning_rate: float = 2e-4
    logging_steps: int = 1
    save_steps: int = 100
    eval_steps: int = 50
    output_dir: str = "outputs"

## 5. Data Processor Class

In [5]:
class DatasetProcessor:
    def __init__(self, tokenizer, max_seq_length=2048):
        self.tokenizer = tokenizer
        self.max_seq_length = max_seq_length

    def format_prompt(self, situation, title, message, response=""):
        system = "‡¶Ü‡¶™‡¶®‡¶ø ‡¶è‡¶ï‡¶ú‡¶® ‡¶∏‡¶π‡¶æ‡¶®‡ßÅ‡¶≠‡ßÇ‡¶§‡¶ø‡¶∂‡ßÄ‡¶≤ ‡¶¨‡¶æ‡¶Ç‡¶≤‡¶æ ‡¶∏‡¶π‡¶ï‡¶æ‡¶∞‡ßÄ‡•§"
        user = f"‡¶¨‡¶ø‡¶∑‡¶Ø‡¶º: {situation}\n‡¶™‡¶∞‡¶ø‡¶∏‡ßç‡¶•‡¶ø‡¶§‡¶ø: {title}\n‡¶Ö‡¶®‡ßÅ‡¶≠‡ßÇ‡¶§‡¶ø: {message}"
        return (f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n{system}<|eot_id|>"
                f"<|start_header_id|>user<|end_header_id|>\n{user}<|eot_id|>"
                f"<|start_header_id|>assistant<|end_header_id|>\n{response}<|eot_id|>")

    def load_bengali_empathetic_dataset(self):
        try:
            csv_path = "/kaggle/input/bengali-empathetic-conversations-corpus/BengaliEmpatheticConversationsCorpus .csv"
            df = pd.read_csv(csv_path)
            df = df.rename(columns={"Topics":"situation", "Question-Title":"title",
                                  "Questions":"message", "Answers":"response"})
        except:
            print("Using dummy data...")
            data = {"situation":["‡¶¶‡ßÅ‡¶É‡¶ñ"]*100, "title":["‡¶¨‡ßç‡¶Ø‡¶∞‡ßç‡¶•‡¶§‡¶æ"]*100,
                    "message":["‡¶Ü‡¶Æ‡¶ø ‡¶ñ‡ßÅ‡¶¨ ‡¶π‡¶§‡¶æ‡¶∂"]*100, "response":["‡¶Ü‡¶Æ‡¶ø ‡¶¨‡ßÅ‡¶ù‡¶§‡ßá ‡¶™‡¶æ‡¶∞‡¶õ‡¶ø..."]*100}
            df = pd.DataFrame(data)

        df["text"] = df.apply(lambda row: self.format_prompt(row.situation, row.title, row.message, row.response), axis=1)
        dataset = Dataset.from_pandas(df)
        split = dataset.train_test_split(test_size=0.2, seed=42)
        val_test = split["test"].train_test_split(test_size=0.5, seed=42)
        return split["train"], val_test["train"], val_test["test"]

## 6. Evaluator Class

In [6]:
class Evaluator:
    def __init__(self):
        self.rouge = evaluate.load('rouge')
        self.bleu = evaluate.load('sacrebleu')

    def calculate_perplexity(self, model, tokenizer, dataset, max_samples=100):
        model.eval()
        total_loss = 0.0
        total_tokens = 0
        samples = dataset.select(range(min(max_samples, len(dataset))))
        with torch.no_grad():
            for ex in samples:
                inputs = tokenizer(ex["text"], return_tensors="pt", truncation=True, max_length=2048).to(device)
                labels = inputs["input_ids"].clone()
                loss = model(**inputs, labels=labels).loss.item()
                tokens = inputs["attention_mask"].sum().item()
                total_loss += loss * tokens
                total_tokens += tokens
        return torch.exp(torch.tensor(total_loss / total_tokens)).item()

    def calculate_bleu_rouge(self, preds, refs):
        rouge = self.rouge.compute(predictions=preds, references=refs)
        bleu = self.bleu.compute(predictions=preds, references=[[r] for r in refs])["score"]
        return {"rouge1": rouge["rouge1"], "rouge2": rouge["rouge2"], "rougeL": rouge["rougeL"], "bleu": bleu}

    def generate_evaluation_responses(self, model, tokenizer, dataset, num_samples=20):
        FastLanguageModel.for_inference(model)
        preds, refs = [], []
        samples = dataset.select(range(min(num_samples, len(dataset))))
        for ex in samples:
            prompt = ex["text"].split("<|start_header_id|>assistant<|end_header_id|>")[0] + "<|start_header_id|>assistant<|end_header_id|>\n"
            ref = ex["text"].split("<|start_header_id|>assistant<|end_header_id|>")[1].split("<|eot_id|>")[0].strip()
            inputs = tokenizer([prompt], return_tensors="pt").to(device)
            output = model.generate(**inputs, max_new_tokens=128, temperature=0.7, do_sample=True)
            pred = tokenizer.decode(output[0], skip_special_tokens=True)
            pred = pred.split("assistant")[-1].strip()
            preds.append(pred)
            refs.append(ref)
        return preds, refs

    def comprehensive_evaluation(self, model, tokenizer, test_dataset):
        print("\n=== Starting Evaluation ===")
        ppl = self.calculate_perplexity(model, tokenizer, test_dataset)
        print(f"Perplexity: {ppl:.2f}")
        preds, refs = self.generate_evaluation_responses(model, tokenizer, test_dataset)
        scores = self.calculate_bleu_rouge(preds, refs)
        scores["perplexity"] = ppl
        for k, v in scores.items():
            print(f"{k}: {v:.4f}")
        return scores, preds, refs

## 7. Main Fine-Tuner Class

In [7]:
class LLAMAFineTuner:
    def __init__(self):
        self.lora_config = LoRAConfig()
        self.train_config = TrainingConfig()
        self.model = None
        self.tokenizer = None

    def load_model(self):
        print("\nLoading model ‚Äî 4bit + device_map=auto (uses both T4 GPUs)")
        model, tokenizer = FastLanguageModel.from_pretrained(
            model_name="unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
            max_seq_length=self.train_config.max_seq_length,
            dtype=None,
            load_in_4bit=True,
            device_map="auto",
        )
        model = FastLanguageModel.get_peft_model(
            model,
            r=self.lora_config.r,
            target_modules=self.lora_config.target_modules,
            lora_alpha=self.lora_config.lora_alpha,
            lora_dropout=self.lora_config.lora_dropout,
            bias=self.lora_config.bias,
            use_gradient_checkpointing=self.lora_config.use_gradient_checkpointing,
            random_state=3407,
        )
        self.model = model
        self.tokenizer = tokenizer

    def prepare_dataset(self):
        processor = DatasetProcessor(self.tokenizer, self.train_config.max_seq_length)
        train, val, test = processor.load_bengali_empathetic_dataset()
        return train, val, test

    def train(self, train_ds, val_ds):
        args = SFTConfig(
            per_device_train_batch_size=self.train_config.per_device_train_batch_size,
            per_device_eval_batch_size=self.train_config.per_device_eval_batch_size,
            gradient_accumulation_steps=self.train_config.gradient_accumulation_steps,
            warmup_steps=self.train_config.warmup_steps,
            max_steps=self.train_config.max_steps,
            learning_rate=self.train_config.learning_rate,
            logging_steps=self.train_config.logging_steps,
            save_steps=self.train_config.save_steps,
            eval_steps=self.train_config.eval_steps,
            output_dir=self.train_config.output_dir,
            optim="adamw_8bit",
            seed=3407,
            report_to="none",
            fp16=True,          # T4 compatible
            bf16=False,         # Disabled for T4
            packing=False,       # Memory efficient
            max_seq_length=self.train_config.max_seq_length,
            ddp_find_unused_parameters=False,
        )

        trainer = SFTTrainer(
            model=self.model,
            tokenizer=self.tokenizer,
            train_dataset=train_ds,
            eval_dataset=val_ds,
            dataset_text_field="text",
            args=args,
        )
        trainer = accelerator.prepare(trainer)
        print("Starting training...")
        stats = trainer.train()
        return stats

    def evaluate(self, test_ds):
        evaluator = Evaluator()
        return evaluator.comprehensive_evaluation(self.model, self.tokenizer, test_ds)

    def save_model(self, path="bengali_empathetic_llama"):
        self.model.save_pretrained_merged(path, self.tokenizer, save_method="merged_16bit")
        print(f"Model saved to {path}")

## 8. Complete Training Pipeline with Baseline

In [8]:
def run_complete_pipeline():
    db = ExperimentDatabase()
    try:
        finetuner = LLAMAFineTuner()
        finetuner.load_model()
        train_ds, val_ds, test_ds = finetuner.prepare_dataset()

        print("\nBASELINE EVALUATION (before training)")
        baseline_metrics, _, _ = finetuner.evaluate(test_ds)

        print("\nSTARTING FINE-TUNING")
        stats = finetuner.train(train_ds, val_ds)

        print("\nFINAL EVALUATION (after training)")
        final_metrics, preds, refs = finetuner.evaluate(test_ds)

        exp_id = db.log_experiment(
            "Llama-3.1-8B-Bengali-Empathetic",
            finetuner.lora_config.to_dict(),
            stats.metrics.get("train_loss", 0),
            stats.metrics.get("eval_loss"),
            final_metrics

        )

        finetuner.save_model("bengali_empathetic_llama_final")

        print("\nPIPELINE COMPLETE!")
        return finetuner

    finally:
        db.close()

## 9. Run Pipeline

In [9]:
# This will run on both T4 GPUs with fp16, 4bit, packing ‚Äî no OOM, no bf16 error
if accelerator.is_main_process:
    finetuner = run_complete_pipeline()


Loading model ‚Äî 4bit + device_map=auto (uses both T4 GPUs)
==((====))==  Unsloth 2025.11.6: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 2. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.1+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.5.1
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33+5d4b92a5.d20251029. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.70G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

Unsloth: Dropout = 0 is supported for fast patching. You are using dropout = 0.05.
Unsloth will patch all other layers, except LoRA matrices, causing a performance hit.
Unsloth 2025.11.6 patched 32 layers with 0 QKV layers, 0 O layers and 0 MLP layers.



BASELINE EVALUATION (before training)


Downloading builder script: 0.00B [00:00, ?B/s]

Downloading builder script: 0.00B [00:00, ?B/s]


=== Starting Evaluation ===
Perplexity: 3.58
rouge1: 0.0000
rouge2: 0.0000
rougeL: 0.0000
bleu: 0.2637
perplexity: 3.5833

STARTING FINE-TUNING


Unsloth: Tokenizing ["text"] (num_proc=8):   0%|          | 0/30586 [00:00<?, ? examples/s]

Unsloth: Tokenizing ["text"] (num_proc=8):   0%|          | 0/3823 [00:00<?, ? examples/s]

Starting training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 2
   \\   /|    Num examples = 30,586 | Num Epochs = 1 | Total steps = 200
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 8
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 8 x 1) = 16
 "-____-"     Trainable parameters = 83,886,080 of 8,114,147,328 (1.03% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,1.285
2,1.1877
3,1.1791
4,1.3224
5,1.0172
6,0.9945
7,0.89
8,0.919
9,0.8544
10,0.7628



FINAL EVALUATION (after training)

=== Starting Evaluation ===
Perplexity: 1.58
rouge1: 0.0000
rouge2: 0.0000
rougeL: 0.0000
bleu: 0.5395
perplexity: 1.5848


config.json:   0%|          | 0.00/956 [00:00<?, ?B/s]

Found HuggingFace hub cache directory: /root/.cache/huggingface/hub


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Checking cache directory for required files...
Cache check failed: model-00001-of-00004.safetensors not found in local cache.
Not all required files found in cache. Will proceed with downloading.
Checking cache directory for required files...
Cache check failed: tokenizer.model not found in local cache.
Not all required files found in cache. Will proceed with downloading.


Unsloth: Preparing safetensor model files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files:  25%|‚ñà‚ñà‚ñå       | 1/4 [00:16<00:48, 16.25s/it]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 2/4 [00:32<00:32, 16.05s/it]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 3/4 [00:52<00:17, 17.81s/it]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Unsloth: Preparing safetensor model files: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [00:56<00:00, 14.16s/it]


Note: tokenizer.model not found (this is OK for non-SentencePiece models)


Unsloth: Merging weights into 16bit: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 4/4 [02:16<00:00, 34.22s/it]


Unsloth: Merge process complete. Saved to `/kaggle/working/bengali_empathetic_llama_final`
Model saved to bengali_empathetic_llama_final

PIPELINE COMPLETE!


## 10. Inference Demo

In [10]:
def run_inference_demo():
    FastLanguageModel.for_inference(finetuner.model)
    prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>
‡¶Ü‡¶™‡¶®‡¶ø ‡¶è‡¶ï‡¶ú‡¶® ‡¶∏‡¶π‡¶æ‡¶®‡ßÅ‡¶≠‡ßÇ‡¶§‡¶ø‡¶∂‡ßÄ‡¶≤ ‡¶¨‡¶æ‡¶Ç‡¶≤‡¶æ ‡¶∏‡¶π‡¶ï‡¶æ‡¶∞‡ßÄ‡•§<|eot_id|><|start_header_id|>user<|end_header_id|>
‡¶¨‡¶ø‡¶∑‡¶Ø‡¶º: ‡¶¨‡¶®‡ßç‡¶ß‡ßÅ ‡¶™‡¶∞‡ßÄ‡¶ï‡ßç‡¶∑‡¶æ‡¶Ø‡¶º ‡¶´‡ßá‡¶≤ ‡¶ï‡¶∞‡ßá‡¶õ‡ßá
‡¶™‡¶∞‡¶ø‡¶∏‡ßç‡¶•‡¶ø‡¶§‡¶ø: ‡¶™‡¶∞‡ßÄ‡¶ï‡ßç‡¶∑‡¶æ‡¶Ø‡¶º ‡¶¨‡ßç‡¶Ø‡¶∞‡ßç‡¶•‡¶§‡¶æ
‡¶Ö‡¶®‡ßÅ‡¶≠‡ßÇ‡¶§‡¶ø: ‡¶Ü‡¶Æ‡¶ø ‡¶ñ‡ßÅ‡¶¨ ‡¶¶‡ßÅ‡¶É‡¶ñ ‡¶™‡ßá‡¶Ø‡¶º‡ßá‡¶õ‡¶ø, ‡¶ï‡ßÄ ‡¶ï‡¶∞‡¶¨ ‡¶¨‡ßÅ‡¶ù‡¶§‡ßá ‡¶™‡¶æ‡¶∞‡¶õ‡¶ø ‡¶®‡¶æ‡•§<|eot_id|><|start_header_id|>assistant<|end_header_id|>
"""
    inputs = finetuner.tokenizer([prompt], return_tensors="pt").to(device)
    streamer = TextStreamer(finetuner.tokenizer, skip_prompt=True, skip_special_tokens=True)
    _ = finetuner.model.generate(**inputs, streamer=streamer, max_new_tokens=200, temperature=0.7, do_sample=True)

# Uncomment to test:
run_inference_demo()

‡¶Ü‡¶™‡¶®‡¶ø ‡¶ï‡¶ø ‡¶™‡¶∞‡ßÄ‡¶ï‡ßç‡¶∑‡¶æ‡¶Ø‡¶º ‡¶´‡ßá‡¶≤‡ßá‡¶õ‡ßá‡¶®?
