# Fine-Tuning LLaMA 3.1-8B on Bengali Empathetic Conversations
## Complete Production-Ready Implementation with LoRA

This notebook provides end-to-end fine-tuning of LLaMA 3.1-8B-Instruct on Bengali Empathetic Conversations using LoRA for parameter-efficient adaptation on free GPUs.

## Step 1: Install Dependencies

In [3]:
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 -q
!pip install transformers datasets peft bitsandbytes accelerate -q
!pip install evaluate rouge_score nltk pandas numpy scikit-learn matplotlib -q
print('✓ All dependencies installed successfully!')

✓ All dependencies installed successfully!


## Step 2: Import Libraries and Configure Environment

In [None]:
import os
import json
import sqlite3
import numpy as np
import pandas as pd
from datetime import datetime
from typing import Dict, List, Tuple
import warnings
warnings.filterwarnings('ignore')

# PyTorch and Transformers
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
)
from peft import LoraConfig, get_peft_model, TaskType

# Evaluation metrics
from evaluate import load as load_metric
import nltk
from nltk.translate.bleu_score import sentence_bleu
from nltk.tokenize import word_tokenize
import matplotlib.pyplot as plt

# Download NLTK data
nltk.download('punkt', quiet=True)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'✓ Device: {device}')
if torch.cuda.is_available():
    print(f'✓ GPU: {torch.cuda.get_device_name(0)}')
    print(f'✓ Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB')

## Step 3: Define Core Classes with OOP Design

In [3]:
class BengaliEmpatheticDataset(Dataset):
    """Custom PyTorch Dataset for Bengali empathetic conversations"""
    def __init__(self, dataframe, tokenizer, max_length=512):
        self.tokenizer = tokenizer
        self.max_length = max_length
        self.inputs = []
        self.prepare_data(dataframe)

    def prepare_data(self, df):
        """Prepare conversation pairs from dataframe"""
        for idx, row in df.iterrows():
            question = str(row['Questions']).strip()
            answer = str(row['Answers']).strip()

            # Instruction-following format
            prompt = f"Question: {question}\nAnswer: {answer}"
            self.inputs.append(prompt)

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, idx):
        encoding = self.tokenizer(
            self.inputs[idx],
            max_length=self.max_length,
            padding='max_length',
            truncation=False,  # CRITICAL: Don't reduce sequence length
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].squeeze(),
            'attention_mask': encoding['attention_mask'].squeeze(),
            'labels': encoding['input_ids'].squeeze()
        }

print('✓ BengaliEmpatheticDataset defined')

✓ BengaliEmpatheticDataset defined


In [4]:
class DatasetProcessor:
    """Handle dataset loading, preprocessing, and splitting"""
    def __init__(self, csv_path: str, test_size: float = 0.1, val_size: float = 0.1):
        self.csv_path = csv_path
        self.test_size = test_size
        self.val_size = val_size
        self.df = None
        self.train_df = None
        self.val_df = None
        self.test_df = None

    def load_data(self):
        """Load and explore dataset"""
        self.df = pd.read_csv(self.csv_path)
        print(f'✓ Dataset loaded: {len(self.df)} samples')
        print(f'✓ Columns: {list(self.df.columns)}')
        print(f'✓ Sample row:\n{self.df.iloc[0]}')
        return self.df

    def split_data(self):
        """Split data into train/val/test"""
        np.random.seed(42)
        indices = np.arange(len(self.df))
        np.random.shuffle(indices)

        val_split = int(self.val_size * len(self.df))
        test_split = int(self.test_size * len(self.df))

        val_indices = indices[:val_split]
        test_indices = indices[val_split:val_split + test_split]
        train_indices = indices[val_split + test_split:]

        self.train_df = self.df.iloc[train_indices].reset_index(drop=True)
        self.val_df = self.df.iloc[val_indices].reset_index(drop=True)
        self.test_df = self.df.iloc[test_indices].reset_index(drop=True)

        print(f'✓ Train: {len(self.train_df)}, Val: {len(self.val_df)}, Test: {len(self.test_df)}')
        return self.train_df, self.val_df, self.test_df

    def create_datasets(self, tokenizer, max_length=512):
        """Create PyTorch datasets"""
        train_dataset = BengaliEmpatheticDataset(self.train_df, tokenizer, max_length)
        val_dataset = BengaliEmpatheticDataset(self.val_df, tokenizer, max_length)
        test_dataset = BengaliEmpatheticDataset(self.test_df, tokenizer, max_length)

        print(f'✓ Datasets created (max_length={max_length})')
        return train_dataset, val_dataset, test_dataset

print('✓ DatasetProcessor defined')

✓ DatasetProcessor defined


In [5]:
class LoRAConfigurator:
    """Manage LoRA configuration and model adaptation - Strategy Pattern"""
    def __init__(self, r=16, lora_alpha=32, lora_dropout=0.05, target_modules=None):
        self.r = r
        self.lora_alpha = lora_alpha
        self.lora_dropout = lora_dropout
        self.target_modules = target_modules or ['q_proj', 'v_proj']  # Attention layers
        self.config = None

    def create_lora_config(self):
        """Create PEFT LoRA configuration"""
        self.config = LoraConfig(
            r=self.r,
            lora_alpha=self.lora_alpha,
            lora_dropout=self.lora_dropout,
            task_type=TaskType.CAUSAL_LM,
            target_modules=self.target_modules,
            bias='none',
            inference_mode=False
        )
        print(f'✓ LoRA Config: r={self.r}, alpha={self.lora_alpha}, dropout={self.lora_dropout}')
        return self.config

    def apply_lora(self, model):
        """Apply LoRA to model"""
        if self.config is None:
            self.create_lora_config()

        model = get_peft_model(model, self.config)
        trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
        total_params = sum(p.numel() for p in model.parameters())
        print(f'✓ LoRA Applied: {trainable_params / 1e6:.2f}M / {total_params / 1e9:.2f}B params')
        return model

    def get_config_dict(self):
        """Return config as dictionary"""
        return {
            'r': self.r,
            'lora_alpha': self.lora_alpha,
            'lora_dropout': self.lora_dropout,
            'target_modules': self.target_modules
        }

print('✓ LoRAConfigurator defined')

✓ LoRAConfigurator defined


In [6]:
class EvaluationMetrics:
    """Calculate evaluation metrics for model performance"""
    def __init__(self):
        self.rouge_metric = load_metric('rouge')
        self.bleu_metric = load_metric('bleu')

    def calculate_perplexity(self, model, eval_dataloader, device):
        """Calculate perplexity on evaluation set"""
        model.eval()
        total_loss = 0
        total_tokens = 0

        with torch.no_grad():
            for batch in eval_dataloader:
                input_ids = batch['input_ids'].to(device)
                attention_mask = batch['attention_mask'].to(device)
                labels = batch['labels'].to(device)

                outputs = model(
                    input_ids=input_ids,
                    attention_mask=attention_mask,
                    labels=labels
                )

                total_loss += outputs.loss.item() * input_ids.shape[0]
                total_tokens += (attention_mask.sum().item())

        perplexity = torch.exp(torch.tensor(total_loss / total_tokens)).item()
        return perplexity

    def calculate_bleu(self, reference, hypothesis):
        """Calculate BLEU score"""
        ref_tokens = word_tokenize(reference)
        hyp_tokens = word_tokenize(hypothesis)

        # Use 1-gram and 2-gram
        score = sentence_bleu([ref_tokens], hyp_tokens, weights=(0.5, 0.5))
        return score

    def calculate_rouge(self, reference, hypothesis):
        """Calculate ROUGE scores"""
        results = self.rouge_metric.compute(
            predictions=[hypothesis],
            references=[reference]
        )
        return {
            'rouge1': results['rouge1'],
            'rouge2': results['rouge2'],
            'rougeL': results['rougeL']
        }

    def get_human_evaluation_template(self):
        """Return template for human evaluation"""
        return """
        HUMAN EVALUATION TEMPLATE
        ========================
        Input Question: {question}
        Generated Response: {response}
        Reference Response: {reference}

        Evaluation Criteria (1-5 scale):
        1. Empathy Score: Does response show emotional understanding?
        2. Relevance: Does it address the question?
        3. Fluency: Is Bengali natural and grammatical?
        4. Helpfulness: Would this help the user?
        5. Overall Quality: General assessment
        """

print('✓ EvaluationMetrics defined')

✓ EvaluationMetrics defined


In [7]:
class ExperimentLogger:
    """Log experiments and responses to SQLite database"""
    def __init__(self, db_path='bengali_llama_experiments.db'):
        self.db_path = db_path
        self.init_database()

    def init_database(self):
        """Initialize SQLite database with schema"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        # LLAMAExperiments table
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS LLAMAExperiments (
                id INTEGER PRIMARY KEY,
                model_name TEXT,
                lora_config TEXT,
                train_loss REAL,
                val_loss REAL,
                metrics TEXT,
                timestamp DATETIME
            )
        ''')

        # GeneratedResponses table
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS GeneratedResponses (
                id INTEGER PRIMARY KEY,
                experiment_id INTEGER,
                input_text TEXT,
                response_text TEXT,
                timestamp DATETIME,
                FOREIGN KEY(experiment_id) REFERENCES LLAMAExperiments(id)
            )
        ''')

        conn.commit()
        conn.close()
        print(f'✓ Database initialized: {self.db_path}')

    def log_experiment(self, model_name, lora_config, train_loss, val_loss, metrics):
        """Log experiment to database"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        cursor.execute('''
            INSERT INTO LLAMAExperiments
            (model_name, lora_config, train_loss, val_loss, metrics, timestamp)
            VALUES (?, ?, ?, ?, ?, ?)
        ''', (
            model_name,
            json.dumps(lora_config),
            train_loss,
            val_loss,
            json.dumps(metrics),
            datetime.now()
        ))

        experiment_id = cursor.lastrowid
        conn.commit()
        conn.close()
        print(f'✓ Experiment logged (ID: {experiment_id})')
        return experiment_id

    def log_response(self, experiment_id, input_text, response_text):
        """Log generated response"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        cursor.execute('''
            INSERT INTO GeneratedResponses
            (experiment_id, input_text, response_text, timestamp)
            VALUES (?, ?, ?, ?)
        ''', (experiment_id, input_text, response_text, datetime.now()))

        conn.commit()
        conn.close()

    def get_experiment(self, experiment_id):
        """Retrieve experiment from database"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        cursor.execute('SELECT * FROM LLAMAExperiments WHERE id = ?', (experiment_id,))
        result = cursor.fetchone()
        conn.close()

        return result

print('✓ ExperimentLogger defined')

✓ ExperimentLogger defined


In [8]:
class LLAMAFineTuner:
    """Main orchestrator for LLaMA fine-tuning"""
    def __init__(self, model_name, lora_config=None):
        self.model_name = model_name
        self.lora_configurator = lora_config or LoRAConfigurator()
        self.tokenizer = None
        self.model = None
        self.trainer = None
        self.logger = ExperimentLogger()
        self.evaluator = EvaluationMetrics()

    def load_model_and_tokenizer(self):
        """Load base model and tokenizer"""
        print(f'Loading {self.model_name}...')

        # Load tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(
            self.model_name,
            trust_remote_code=True,
            padding_side='right'
        )

        # Add padding token if needed
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token

        # Load model with 8-bit quantization for memory efficiency
        self.model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
            device_map='auto',
            load_in_8bit=True,
            torch_dtype=torch.float16,
            trust_remote_code=True,
            llm_int8_enable_fp32_cpu_offload=True
        )

        # Apply LoRA
        self.model = self.lora_configurator.apply_lora(self.model)
        print('✓ Model loaded and LoRA applied')

        return self.model, self.tokenizer

    def train(self, train_dataset, val_dataset, output_dir, num_epochs, batch_size, learning_rate):
        """Fine-tune the model"""
        if self.model is None or self.tokenizer is None:
            raise ValueError("Model and tokenizer must be loaded before training.")

        # Define training arguments
        training_args = TrainingArguments(
            output_dir=output_dir,
            num_train_epochs=num_epochs,
            per_device_train_batch_size=batch_size,
            per_device_eval_batch_size=batch_size,
            learning_rate=learning_rate,
            logging_dir=f'{output_dir}/logs',
            logging_strategy='steps',
            logging_steps=10,
            save_strategy='epoch',
            evaluation_strategy='epoch',
            load_best_model_at_end=True,
            metric_for_best_model='eval_loss',
            gradient_accumulation_steps=4, # Adjust based on GPU memory
            gradient_checkpointing=True,
            fp16=True, # Use mixed precision for faster training
            optim='paged_adamw_8bit', # Optimized AdamW for 8-bit
            report_to='none'
        )

        # Data collator for language modeling (pads sequences to the longest in the batch)
        data_collator = DataCollatorForLanguageModeling(
            tokenizer=self.tokenizer,
            mlm=False # Not masked language modeling
        )

        # Initialize Trainer
        self.trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=val_dataset,
            data_collator=data_collator
        )

        # Start training
        train_result = self.trainer.train()
        self.trainer.save_model(output_dir)
        self.tokenizer.save_pretrained(output_dir)

        # Log training metrics to the logger
        train_metrics = train_result.metrics
        # Assuming train_result.metrics contains 'train_loss' and 'eval_loss'
        # We need to compute val_loss from eval_metrics for logging.
        # This part might need adjustment based on how HuggingFace Trainer returns eval metrics.
        eval_metrics = self.trainer.evaluate(eval_dataset=val_dataset)

        train_loss = train_metrics.get('train_loss', None)
        val_loss = eval_metrics.get('eval_loss', None)

        self.logger.log_experiment(
            model_name=self.model_name,
            lora_config=self.lora_configurator.get_config_dict(),
            train_loss=train_loss,
            val_loss=val_loss,
            metrics=train_metrics # Log all training metrics
        )

        return train_result

    def generate_response(self, prompt, max_new_tokens=100, temperature=0.7, top_k=50, top_p=0.95):
        """Generate a response from the fine-tuned model"""
        if self.model is None or self.tokenizer is None:
            raise ValueError("Model and tokenizer must be loaded for generation.")

        inputs = self.tokenizer(prompt, return_tensors='pt').to(device)
        outputs = self.model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_k=top_k,
            top_p=top_p,
            pad_token_id=self.tokenizer.eos_token_id
        )
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Remove the input prompt from the response
        if response.startswith(prompt):
            response = response[len(prompt):].strip()
        return response

    def save_model(self, path):
        """Save the fine-tuned model and tokenizer"""
        if self.model is None or self.tokenizer is None:
            raise ValueError("No model to save.")

        self.model.save_pretrained(path)
        self.tokenizer.save_pretrained(path)
        print(f"Model and tokenizer saved to {path}")

print('✓ LLAMAFineTuner defined')

✓ LLAMAFineTuner defined


## Step 4: Load and Prepare Data

In [9]:
# Update this path to your CSV file location
CSV_PATH = '/content/BengaliEmpatheticConversationsCorpus .csv'  # Place CSV in same directory as notebook

# Initialize dataset processor
processor = DatasetProcessor(CSV_PATH, test_size=0.1, val_size=0.1)

# Load data
df = processor.load_data()

# Split data
train_df, val_df, test_df = processor.split_data()

print('\n✓ Data loading complete')

✓ Dataset loaded: 38233 samples
✓ Columns: ['Topics', 'Question-Title', 'Questions', 'Answers']
✓ Sample row:
Topics                                           পারিবারিক দ্বন্দ্ব
Question-Title                   মা ও স্ত্রীর মধ্যে মতানৈক্য বৃদ্ধি
Questions          আমার স্ত্রী এবং মায়ের মধ্যে টানটান মতবিরোধ চ...
Answers            আপনি যা বর্ণনা করছেন তাকে মনোবিজ্ঞানীরা "ত্রি...
Name: 0, dtype: object
✓ Train: 30587, Val: 3823, Test: 3823

✓ Data loading complete


## Step 5: Initialize Model and Tokenizer

In [15]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [14]:
# Model configuration
MODEL_NAME = 'meta-llama/Llama-3.1-8B'  # Using Llama-2-7b (Llama 3.1-8B when available)
MAX_SEQ_LENGTH = 512  # Full sequence - NOT REDUCED

# Initialize fine-tuner with LoRA config
lora_config = LoRAConfigurator(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05
)

fine_tuner = LLAMAFineTuner(MODEL_NAME, lora_config)

# Load model and tokenizer
model, tokenizer = fine_tuner.load_model_and_tokenizer()

print('✓ Model and tokenizer loaded')

✓ Database initialized: bengali_llama_experiments.db


Downloading builder script: 0.00B [00:00, ?B/s]

Downloading builder script: 0.00B [00:00, ?B/s]

Downloading extra modules: 0.00B [00:00, ?B/s]

Loading meta-llama/Llama-3.1-8B...


tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

✓ LoRA Config: r=16, alpha=32, dropout=0.05
✓ LoRA Applied: 6.82M / 8.04B params
✓ Model loaded and LoRA applied
✓ Model and tokenizer loaded


## Step 6: Create Datasets

In [16]:
# Create datasets
train_dataset, val_dataset, test_dataset = processor.create_datasets(
    tokenizer,
    max_length=MAX_SEQ_LENGTH
)

print(f'✓ Train samples: {len(train_dataset)}')
print(f'✓ Val samples: {len(val_dataset)}')
print(f'✓ Test samples: {len(test_dataset)}')


✓ Datasets created (max_length=512)
✓ Train samples: 30587
✓ Val samples: 3823
✓ Test samples: 3823


## Step 7: Train the Model

In [2]:
import os
import json
import sqlite3
import numpy as np
import pandas as pd
from datetime import datetime
from typing import Dict, List, Tuple
import warnings
warnings.filterwarnings('ignore')

# PyTorch and Transformers
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig # Import BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, TaskType

# Evaluation metrics
from evaluate import load as load_metric
import nltk
from nltk.translate.bleu_score import sentence_bleu
from nltk.tokenize import word_tokenize
import matplotlib.pyplot as plt

# Download NLTK data
nltk.download('punkt', quiet=True)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'✓ Device: {device}')
if torch.cuda.is_available():
    print(f'✓ GPU: {torch.cuda.get_device_name(0)}')
    print(f'✓ Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB')

class BengaliEmpatheticDataset(Dataset):
    """Custom PyTorch Dataset for Bengali empathetic conversations"""
    def __init__(self, dataframe, tokenizer, max_length=512):
        self.tokenizer = tokenizer
        self.max_length = max_length
        self.inputs = []
        self.prepare_data(dataframe)

    def prepare_data(self, df):
        """Prepare conversation pairs from dataframe"""
        for idx, row in df.iterrows():
            question = str(row['Questions']).strip()
            answer = str(row['Answers']).strip()

            # Instruction-following format
            prompt = f"Question: {question}\nAnswer: {answer}"
            self.inputs.append(prompt)

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, idx):
        encoding = self.tokenizer(
            self.inputs[idx],
            max_length=self.max_length,
            padding='max_length',
            truncation=False,  # CRITICAL: Don't reduce sequence length
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].squeeze(),
            'attention_mask': encoding['attention_mask'].squeeze(),
            'labels': encoding['input_ids'].squeeze()
        }

print('✓ BengaliEmpatheticDataset defined')

class DatasetProcessor:
    """Handle dataset loading, preprocessing, and splitting"""
    def __init__(self, csv_path: str, test_size: float = 0.1, val_size: float = 0.1):
        self.csv_path = csv_path
        self.test_size = test_size
        self.val_size = val_size
        self.df = None
        self.train_df = None
        self.val_df = None
        self.test_df = None

    def load_data(self):
        """Load and explore dataset"""
        self.df = pd.read_csv(self.csv_path)
        print(f'✓ Dataset loaded: {len(self.df)} samples')
        print(f'✓ Columns: {list(self.df.columns)}')
        print(f'✓ Sample row:\n{self.df.iloc[0]}')
        return self.df

    def split_data(self):
        """Split data into train/val/test"""
        np.random.seed(42)
        indices = np.arange(len(self.df))
        np.random.shuffle(indices)

        val_split = int(self.val_size * len(self.df))
        test_split = int(self.test_size * len(self.df))

        val_indices = indices[:val_split]
        test_indices = indices[val_split:val_split + test_split]
        train_indices = indices[val_split + test_split:]

        self.train_df = self.df.iloc[train_indices].reset_index(drop=True)
        self.val_df = self.df.iloc[val_indices].reset_index(drop=True)
        self.test_df = self.df.iloc[test_indices].reset_index(drop=True)

        print(f'✓ Train: {len(self.train_df)}, Val: {len(self.val_df)}, Test: {len(self.test_df)}')
        return self.train_df, self.val_df, self.test_df

    def create_datasets(self, tokenizer, max_length=512):
        """Create PyTorch datasets"""
        train_dataset = BengaliEmpatheticDataset(self.train_df, tokenizer, max_length)
        val_dataset = BengaliEmpatheticDataset(self.val_df, tokenizer, max_length)
        test_dataset = BengaliEmpatheticDataset(self.test_df, tokenizer, max_length)

        print(f'✓ Datasets created (max_length={max_length})')
        return train_dataset, val_dataset, test_dataset

print('✓ DatasetProcessor defined')

class LoRAConfigurator:
    """Manage LoRA configuration and model adaptation - Strategy Pattern"""
    def __init__(self, r=16, lora_alpha=32, lora_dropout=0.05, target_modules=None):
        self.r = r
        self.lora_alpha = lora_alpha
        self.lora_dropout = lora_dropout
        self.target_modules = target_modules or ['q_proj', 'v_proj']  # Attention layers
        self.config = None

    def create_lora_config(self):
        """Create PEFT LoRA configuration"""
        self.config = LoraConfig(
            r=self.r,
            lora_alpha=self.lora_alpha,
            lora_dropout=self.lora_dropout,
            task_type=TaskType.CAUSAL_LM,
            target_modules=self.target_modules,
            bias='none',
            inference_mode=False
        )
        print(f'✓ LoRA Config: r={self.r}, alpha={self.lora_alpha}, dropout={self.lora_dropout}')
        return self.config

    def apply_lora(self, model):
        """Apply LoRA to model"""
        if self.config is None:
            self.create_lora_config()

        model = get_peft_model(model, self.config)
        trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
        total_params = sum(p.numel() for p in model.parameters())
        print(f'✓ LoRA Applied: {trainable_params / 1e6:.2f}M / {total_params / 1e9:.2f}B params')
        return model

    def get_config_dict(self):
        """Return config as dictionary"""
        return {
            'r': self.r,
            'lora_alpha': self.lora_alpha,
            'lora_dropout': self.lora_dropout,
            'target_modules': self.target_modules
        }

print('✓ LoRAConfigurator defined')

class EvaluationMetrics:
    """Calculate evaluation metrics for model performance"""
    def __init__(self):
        self.rouge_metric = load_metric('rouge')
        self.bleu_metric = load_metric('bleu')

    def calculate_perplexity(self, model, eval_dataloader, device):
        """Calculate perplexity on evaluation set"""
        model.eval()
        total_loss = 0
        total_tokens = 0

        with torch.no_grad():
            for batch in eval_dataloader:
                input_ids = batch['input_ids'].to(device)
                attention_mask = batch['attention_mask'].to(device)
                labels = batch['labels'].to(device)

                outputs = model(
                    input_ids=input_ids,
                    attention_mask=attention_mask,
                    labels=labels
                )

                total_loss += outputs.loss.item() * input_ids.shape[0]
                total_tokens += (attention_mask.sum().item())

        perplexity = torch.exp(torch.tensor(total_loss / total_tokens)).item()
        return perplexity

    def calculate_bleu(self, reference, hypothesis):
        """Calculate BLEU score"""
        ref_tokens = word_tokenize(reference)
        hyp_tokens = word_tokenize(hypothesis)

        # Use 1-gram and 2-gram
        score = sentence_bleu([ref_tokens], hyp_tokens, weights=(0.5, 0.5))
        return score

    def calculate_rouge(self, reference, hypothesis):
        """Calculate ROUGE scores"""
        results = self.rouge_metric.compute(
            predictions=[hypothesis],
            references=[reference]
        )
        return {
            'rouge1': results['rouge1'],
            'rouge2': results['rouge2'],
            'rougeL': results['rougeL']
        }

    def get_human_evaluation_template(self):
        """Return template for human evaluation"""
        return """
        HUMAN EVALUATION TEMPLATE
        ========================
        Input Question: {question}
        Generated Response: {response}
        Reference Response: {reference}

        Evaluation Criteria (1-5 scale):
        1. Empathy Score: Does response show emotional understanding?
        2. Relevance: Does it address the question?
        3. Fluency: Is Bengali natural and grammatical?
        4. Helpfulness: Would this help the user?
        5. Overall Quality: General assessment
        """

print('✓ EvaluationMetrics defined')

class ExperimentLogger:
    """Log experiments and responses to SQLite database"""
    def __init__(self, db_path='bengali_llama_experiments.db'):
        self.db_path = db_path
        self.init_database()

    def init_database(self):
        """Initialize SQLite database with schema"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        # LLAMAExperiments table
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS LLAMAExperiments (
                id INTEGER PRIMARY KEY,
                model_name TEXT,
                lora_config TEXT,
                train_loss REAL,
                val_loss REAL,
                metrics TEXT,
                timestamp DATETIME
            )
        ''')

        # GeneratedResponses table
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS GeneratedResponses (
                id INTEGER PRIMARY KEY,
                experiment_id INTEGER,
                input_text TEXT,
                response_text TEXT,
                timestamp DATETIME,
                FOREIGN KEY(experiment_id) REFERENCES LLAMAExperiments(id)
            )
        ''')

        conn.commit()
        conn.close()
        print(f'✓ Database initialized: {self.db_path}')

    def log_experiment(self, model_name, lora_config, train_loss, val_loss, metrics):
        """Log experiment to database"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        cursor.execute('''
            INSERT INTO LLAMAExperiments
            (model_name, lora_config, train_loss, val_loss, metrics, timestamp)
            VALUES (?, ?, ?, ?, ?, ?)
        ''', (
            model_name,
            json.dumps(lora_config),
            train_loss,
            val_loss,
            json.dumps(metrics),
            datetime.now()
        ))

        experiment_id = cursor.lastrowid
        conn.commit()
        conn.close()
        print(f'✓ Experiment logged (ID: {experiment_id})')
        return experiment_id

    def log_response(self, experiment_id, input_text, response_text):
        """Log generated response"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        cursor.execute('''
            INSERT INTO GeneratedResponses
            (experiment_id, input_text, response_text, timestamp)
            VALUES (?, ?, ?, ?)
        ''', (experiment_id, input_text, response_text, datetime.now()))

        conn.commit()
        conn.close()

    def get_experiment(self, experiment_id):
        """Retrieve experiment from database"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()

        cursor.execute('SELECT * FROM LLAMAExperiments WHERE id = ?', (experiment_id,))
        result = cursor.fetchone()
        conn.close()

        return result

print('✓ ExperimentLogger defined')

class LLAMAFineTuner:
    """Main orchestrator for LLaMA fine-tuning"""
    def __init__(self, model_name, lora_config=None):
        self.model_name = model_name
        self.lora_configurator = lora_config or LoRAConfigurator()
        self.tokenizer = None
        self.model = None
        self.trainer = None
        self.logger = ExperimentLogger()
        self.evaluator = EvaluationMetrics()

    def load_model_and_tokenizer(self):
        """Load base model and tokenizer"""
        print(f'Loading {self.model_name}...')

        # Load tokenizer
        self.tokenizer = AutoTokenizer.from_pretrained(
            self.model_name,
            trust_remote_code=True,
            padding_side='right'
        )

        # Add padding token if needed
        if self.tokenizer.pad_token is None:
            self.tokenizer.pad_token = self.tokenizer.eos_token

        # Configure 8-bit quantization
        bnb_config = BitsAndBytesConfig(
            load_in_8bit=True,
            llm_int8_threshold=6.0,
            llm_int8_has_fp16_weight=False, # Use FP16 for the weights if possible
            llm_int8_enable_fp32_cpu_offload=True
        )

        # Load model with BitsAndBytesConfig for memory efficiency
        self.model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
            # Removed device_map='auto' to prevent offloading issues
            quantization_config=bnb_config, # Pass the bnb_config here
            torch_dtype=torch.float16,
            trust_remote_code=True
        )

        # Apply LoRA
        self.model = self.lora_configurator.apply_lora(self.model)
        print('✓ Model loaded and LoRA applied')

        return self.model, self.tokenizer

    def train(self, train_dataset, val_dataset, output_dir, num_epochs, batch_size, learning_rate):
        """Fine-tune the model"""
        if self.model is None or self.tokenizer is None:
            raise ValueError("Model and tokenizer must be loaded before training.")

        # Define training arguments
        training_args = TrainingArguments(
            output_dir=output_dir,
            num_train_epochs=num_epochs,
            per_device_train_batch_size=batch_size,
            per_device_eval_batch_size=batch_size,
            learning_rate=learning_rate,
            logging_dir=f'{output_dir}/logs',
            logging_strategy='steps',
            logging_steps=10,
            save_strategy='epoch',
            evaluation_strategy='epoch',
            load_best_model_at_end=True,
            metric_for_best_model='eval_loss',
            gradient_accumulation_steps=4, # Adjust based on GPU memory
            gradient_checkpointing=True,
            fp16=True, # Use mixed precision for faster training
            optim='paged_adamw_8bit', # Optimized AdamW for 8-bit
            report_to='none'
        )

        # Data collator for language modeling (pads sequences to the longest in the batch)
        data_collator = DataCollatorForLanguageModeling(
            tokenizer=self.tokenizer,
            mlm=False # Not masked language modeling
        )

        # Initialize Trainer
        self.trainer = Trainer(
            model=self.model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=val_dataset,
            data_collator=data_collator
        )

        # Start training
        train_result = self.trainer.train()
        self.trainer.save_model(output_dir)
        self.tokenizer.save_pretrained(output_dir)

        # Log training metrics to the logger
        train_metrics = train_result.metrics
        # Assuming train_result.metrics contains 'train_loss' and 'eval_loss'
        # We need to compute val_loss from eval_metrics for logging.
        # This part might need adjustment based on how HuggingFace Trainer returns eval metrics.
        eval_metrics = self.trainer.evaluate(eval_dataset=val_dataset)

        train_loss = train_metrics.get('train_loss', None)
        val_loss = eval_metrics.get('eval_loss', None)

        self.logger.log_experiment(
            model_name=self.model_name,
            lora_config=self.lora_configurator.get_config_dict(),
            train_loss=train_loss,
            val_loss=val_loss,
            metrics=train_metrics # Log all training metrics
        )

        return train_result

    def generate_response(self, prompt, max_new_tokens=100, temperature=0.7, top_k=50, top_p=0.95):
        """Generate a response from the fine-tuned model"""
        if self.model is None or self.tokenizer is None:
            raise ValueError("Model and tokenizer must be loaded for generation.")

        inputs = self.tokenizer(prompt, return_tensors='pt').to(device)
        outputs = self.model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_k=top_k,
            top_p=top_p,
            pad_token_id=self.tokenizer.eos_token_id
        )
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)

        # Remove the input prompt from the response
        if response.startswith(prompt):
            response = response[len(prompt):].strip()
        return response

    def save_model(self, path):
        """Save the fine-tuned model and tokenizer"""
        if self.model is None or self.tokenizer is None:
            raise ValueError("No model to save.")

        self.model.save_pretrained(path)
        self.tokenizer.save_pretrained(path)
        print(f"Model and tokenizer saved to {path}")

print('✓ LLAMAFineTuner defined')



✓ Device: cpu
✓ BengaliEmpatheticDataset defined
✓ DatasetProcessor defined
✓ LoRAConfigurator defined
✓ EvaluationMetrics defined
✓ ExperimentLogger defined
✓ LLAMAFineTuner defined


In [1]:
MODEL_NAME = 'meta-llama/Llama-3.1-8B'
MAX_SEQ_LENGTH = 512

lora_config = LoRAConfigurator(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05
)

fine_tuner = LLAMAFineTuner(MODEL_NAME, lora_config)

model, tokenizer = fine_tuner.load_model_and_tokenizer()

print('✓ Model and tokenizer loaded')

NameError: name 'LoRAConfigurator' is not defined

In [None]:
# Training configuration
EPOCHS = 3
BATCH_SIZE = 4
LEARNING_RATE = 1e-4
OUTPUT_DIR = './fine_tuned_llama'

# Re-initialize dependencies in case of kernel restart or skipped cells
# Update this path to your CSV file location
CSV_PATH = '/content/BengaliEmpatheticConversationsCorpus .csv'
MODEL_NAME = 'meta-llama/Llama-3.1-8B'
MAX_SEQ_LENGTH = 512

# Initialize dataset processor
processor = DatasetProcessor(CSV_PATH, test_size=0.1, val_size=0.1)
df = processor.load_data()
train_df, val_df, test_df = processor.split_data()

# Initialize fine-tuner with LoRA config
lora_config = LoRAConfigurator(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05
)

fine_tuner = LLAMAFineTuner(MODEL_NAME, lora_config)
model, tokenizer = fine_tuner.load_model_and_tokenizer()

# Create datasets
train_dataset, val_dataset, test_dataset = processor.create_datasets(
    tokenizer,
    max_length=MAX_SEQ_LENGTH
)

# Train the model
train_result = fine_tuner.train(
    train_dataset,
    val_dataset,
    output_dir=OUTPUT_DIR,
    num_epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    learning_rate=LEARNING_RATE
)

print('✓ Training completed')

✓ Dataset loaded: 38233 samples
✓ Columns: ['Topics', 'Question-Title', 'Questions', 'Answers']
✓ Sample row:
Topics                                           পারিবারিক দ্বন্দ্ব
Question-Title                   মা ও স্ত্রীর মধ্যে মতানৈক্য বৃদ্ধি
Questions          আমার স্ত্রী এবং মায়ের মধ্যে টানটান মতবিরোধ চ...
Answers            আপনি যা বর্ণনা করছেন তাকে মনোবিজ্ঞানীরা "ত্রি...
Name: 0, dtype: object
✓ Train: 30587, Val: 3823, Test: 3823
✓ Database initialized: bengali_llama_experiments.db
Loading meta-llama/Llama-3.1-8B...


`torch_dtype` is deprecated! Use `dtype` instead!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

## Step 8: Evaluate Model

In [None]:
# Create evaluation dataloader
eval_dataloader = DataLoader(
    val_dataset,
    batch_size=4,
    shuffle=False
)

# Calculate perplexity
perplexity = fine_tuner.evaluator.calculate_perplexity(
    fine_tuner.model,
    eval_dataloader,
    device
)

print(f'✓ Validation Perplexity: {perplexity:.4f}')

NameError: name 'val_dataset' is not defined

## Step 9: Generate Sample Responses

In [None]:
# Sample test questions for generation
sample_questions = [
    "Question: আমি আমার পরিবারের সাথে সম্পর্কের সমস্যা নিয়ে ভুগছি।",
    "Question: আমার জীবনে চাপ এবং উদ্বেগ অনেক বেশি।",
    "Question: আমি আমার কর্মজীবনে সন্তুষ্ট নই।"
]

generated_responses = []

print('\n=== GENERATED RESPONSES ===')
for i, question in enumerate(sample_questions, 1):
    response = fine_tuner.generate_response(question)
    generated_responses.append(response)
    print(f'\nSample {i}:')
    print(f'Q: {question}')
    print(f'A: {response[:200]}...')

print('\n✓ Generation completed')

## Step 10: Log Results to Database

In [None]:
# Prepare metrics
metrics = {
    'perplexity': perplexity,
    'train_loss': train_result.training_loss,
    'max_seq_length': MAX_SEQ_LENGTH,
    'batch_size': BATCH_SIZE,
    'epochs': EPOCHS
}

# Log experiment
lora_dict = lora_config.get_config_dict()
experiment_id = fine_tuner.logger.log_experiment(
    model_name=MODEL_NAME,
    lora_config=lora_dict,
    train_loss=train_result.training_loss,
    val_loss=0.0,  # From trainer
    metrics=metrics
)

# Log generated responses
for i, response in enumerate(generated_responses):
    fine_tuner.logger.log_response(
        experiment_id,
        sample_questions[i],
        response
    )

print(f'✓ Results logged to database (Experiment ID: {experiment_id})')

## Step 11: Save Model

In [None]:
# Save fine-tuned model
model_save_path = './bengali_empathetic_llama_ft'
fine_tuner.save_model(model_save_path)

# Also save LoRA weights separately for smaller file size
fine_tuner.model.save_pretrained('./bengali_empathetic_lora_weights')

print('✓ Model saved successfully')

## Step 12: Summary and Recommendations

In [None]:
summary = f"""
========================================
FINE-TUNING SUMMARY
========================================

MODEL: {MODEL_NAME}
Task: Bengali Empathetic Conversations

LORA CONFIGURATION:
  - r (rank): {lora_dict['r']}
  - alpha: {lora_dict['lora_alpha']}
  - dropout: {lora_dict['lora_dropout']}
  - target_modules: {lora_dict['target_modules']}

TRAINING CONFIGURATION:
  - Epochs: {EPOCHS}
  - Batch Size: {BATCH_SIZE}
  - Learning Rate: {LEARNING_RATE}
  - Max Sequence Length: {MAX_SEQ_LENGTH} (NO REDUCTION)
  - Gradient Checkpointing: Enabled
  - Mixed Precision: Enabled

RESULTS:
  - Training Loss: {train_result.training_loss:.4f}
  - Validation Perplexity: {perplexity:.4f}
  - Total Samples: {len(df)}
  - Train/Val/Test: {len(train_df)}/{len(val_df)}/{len(test_df)}

OUTPUTS:
  ✓ Fine-tuned model: {model_save_path}
  ✓ LoRA weights: ./bengali_empathetic_lora_weights
  ✓ Database: bengali_llama_experiments.db (Experiment ID: {experiment_id})
  ✓ Logs: ./logs

KEY DESIGN DECISIONS:
  1. LoRA Rank (r=16): Balances parameter efficiency with model capacity
  2. 8-bit Quantization: Reduces memory usage on free GPUs
  3. Full Sequence Length: Preserves long conversational context
  4. Gradient Checkpointing: Reduces memory footprint during training
  5. Mixed Precision (FP16): Accelerates training on GPUs

CHALLENGES & SOLUTIONS:
  - Memory constraints: Solved with LoRA + 8-bit quantization
  - Long Bengali text: Full sequence length = no truncation
  - Training speed: Gradient checkpointing + mixed precision
  - Model convergence: Proper learning rate + warmup steps

NEXT STEPS:
  1. Perform human evaluation on generated responses
  2. Fine-tune hyperparameters for better BLEU/ROUGE scores
  3. Test on diverse question categories
  4. Deploy to production with proper API
  5. Collect user feedback for iterative improvement

========================================
"""

print(summary)

# Save summary to file
with open('training_summary.txt', 'w', encoding='utf-8') as f:
    f.write(summary)

print('✓ Summary saved to training_summary.txt')