## LLM Classification Finetuning

### Overview

This notebook contains code for finetuning a pre-trained LLM for classification tasks. The goal is to train an LLM to classify whether a given response is better than the other response, based on the prompt.

### Dataset

The dataset is a CSV file containing the following columns:

- prompt: a prompt for the game
- response_a: a response to the prompt
- response_b: another response to the prompt
- winner_model_a: whether response_a is better than response_b
- winner_model_b: whether response_b is better than response_a
- winner_tie: whether response_a and response_b are tied

### Model

The model used for finetuning is a pre-trained LLM from Hugging Face's model hub. The model used in this notebook is `bert-base-uncased`.

### Training

The training loop is implemented using PyTorch. The model is trained for 100 epochs with a batch size of 16. The optimizer used is AdamW. The loss function used is Cross-Entropy Loss.

### Evaluation

The model is evaluated on the validation set. The validation loss is computed using the validation set.

### Submission

The submission file is generated using the test set. The submission file is a CSV file containing the following columns:

- id: the id of the test data
- winner_model_a: the probability that response_a is better than response_b
- winner_model_b: the probability that response_b is better than response_a
- winner_tie: the probability that response_a and response_b are tied


## Install Dependencies

In [1]:
%pip install torch transformers scikit-learn pandas numpy seaborn

Note: you may need to restart the kernel to use updated packages.


## Import Libraries

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from torch.utils.data import Dataset, DataLoader
from sklearn.metrics import log_loss
from sklearn.metrics import confusion_matrix
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.calibration import calibration_curve
from sklearn.metrics.pairwise import cosine_similarity
from transformers import BertTokenizer, BertModel, AdamW

  from .autonotebook import tqdm as notebook_tqdm


## Dataset Base Path

In [3]:
BASE_PATH = "./data/"

## Config Class

In [4]:
class CFG:
    model_name = "bert-base-uncased"
    batch_size = 16
    lr = 5e-5
    epochs = 5
    max_length = 256
    num_classes= 3
    device = "cuda" if torch.cuda.is_available() else "cpu"

## Load Tokenizer

In [5]:
tokenizer = BertTokenizer.from_pretrained(CFG.model_name)

## Dataset Preparation

In [6]:
# load datasets
train_df = pd.read_csv(BASE_PATH + "original/train.csv")
test_df = pd.read_csv(BASE_PATH + "original/test.csv")

# tokenize training data
def tokenize_data(df):
    inputs = []

    for _, row in df.iterrows():
        prompt = row["prompt"]
        response_a = row["response_a"]
        response_b = row["response_b"]
        
        # concatenate prompt and response
        input_str = f"Prompt: {prompt} Response A: {response_a} Response B: {response_b}"

        encoded_input = tokenizer(
            input_str, 
            truncation=True, 
            padding="max_length",
            max_length=CFG.max_length, 
            return_tensors="pt"
        )

        inputs.append(encoded_input)

    return inputs

train_inputs = tokenize_data(train_df)

# tokenize test data
def tokenize_test_data(df):
    inputs = []
    ids = []
    for _, row in df.iterrows():
        prompt = row['prompt']
        response_a = row['response_a']
        response_b = row['response_b']

        # concatenate prompt and response
        input_str = f"Prompt: {prompt} Response A: {response_a} Response B: {response_b}"

        encoded_input = tokenizer(
            input_str,
            truncation=True,
            padding='max_length',
            max_length=CFG.max_length,
            return_tensors='pt'
        )
        
        inputs.append(encoded_input)
        
        ids.append(row['id'])  # keep track of the IDs for the submission file

    return inputs, ids

test_inputs, test_ids = tokenize_test_data(test_df)

## Model Architecture

This model takes tokenized inputs, runs it through the pretrained BERT model, and adds a classification layer on top to predict one of the three classes:
- model_a wins
- model_b wins
- tie


In [None]:
class PreferenceClassifier(nn.Module):
    def __init__(self, model_name=CFG.model_name, num_classes = CFG.num_classes):
        super(PreferenceClassifier, self).__init__()
        self.bert = BertModel.from_pretrained(model_name)
        self.classifier = nn.Linear(self.bert.config.hidden_size, num_classes)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs.pooler_output # CLS token representation
        logits = self.classifier(pooled_output)

        return logits


## Training Loop

Feed the tokenized inputs to the model and compute the loss using cross-entropy.

In [None]:
class PreferenceDataset(Dataset):
    def __init__(self, inputs, labels):
        self.inputs = inputs
        self.labels = labels

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, idx):
        return self.inputs[idx], self.labels[idx]

train_labels = train_df["winner_model_a"].values
train_dataset = PreferenceDataset(train_inputs, train_labels)
train_loader = DataLoader(train_dataset, batch_size=CFG.batch_size, shuffle=True)

# initialize model, optimizer, and loss fn
model = PreferenceClassifier()
model.to(CFG.device)
optimizer = AdamW(model.parameters(), lr=CFG.lr)
loss_fn = nn.CrossEntropyLoss()

train_loss = []  # store training loss after each epoch
true_labels = []   # store all ground truth labels
predicted_labels = []  # store all predicted labels
predicted_probs = []  # store predicted probabilities for each class

# training loop
for epoch in range(CFG.epochs):
    model.train()
    total_loss = 0

    for batch in train_loader:
        inputs, labels = batch
        
        input_ids = inputs["input_ids"].squeeze(1).to(CFG.device)
        attention_mask = inputs["attention_mask"].squeeze(1).to(CFG.device)
        labels = labels.to(CFG.device)

        optimizer.zero_grad()

        # forward pass
        outputs = model(input_ids, attention_mask)
        loss = loss_fn(outputs, labels)

        # backward pass and optimization
        loss.backward()
        optimizer.step()

        # accumulate loss
        total_loss += loss.item()

        # store true and predicted labels
        true_labels.extend(labels.cpu().numpy())  # Ground truth
        predictions = torch.argmax(outputs, dim=1).cpu().numpy()  # Predicted class
        predicted_labels.extend(predictions)

        # store predicted probabilities (softmax output)
        probs = torch.softmax(outputs, dim=1).detach().cpu().numpy()
        predicted_probs.extend(probs)

    # average loss over batches
    avg_train_loss = total_loss / len(train_loader)
    train_loss.append(avg_train_loss)

    print(f"Epoch {epoch+1}/{CFG.epochs}, Loss: {avg_train_loss}")

## Test Dataset

In [None]:
# custom Dataset for test data
class TestDataset(Dataset):
    def __init__(self, inputs, ids):
        self.inputs = inputs
        self.ids = ids

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, idx):
        return self.inputs[idx], self.ids[idx]

# create the test Dataset and DataLoader
test_dataset = TestDataset(test_inputs, test_ids)
test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)

## Evaluation

evaluate using log loss

In [None]:
def evaluate(model, dataloader):
    model.eval()
    all_preds = []
    all_labels = []

    with torch.no_grad():
        for batch in dataloader:
            inputs, labels = batch
            input_ids = inputs["input_ids"].squeeze(1).to(CFG.device)
            attention_mask = inputs["attention_mask"].squeeze(1).to(CFG.device)
            labels = labels.to(CFG.device)

            outputs = model(input_ids, attention_mask)
            preds = torch.softmax(outputs, dim=1)

            all_preds.extend(preds.cpu().numpy())
            all_labels.extend(labels.cpu().numpy())

    preds = np.concatenate(all_preds, axis=0)
    labels = np.concatenate(all_labels, axis=0)

    return log_loss(labels, preds)
            
# evaluate model on validation set
val_loss = evaluate(model, train_loader)
print(f"Validation Loss: {val_loss}")


# Exploratory Data Analysis


## Distribution of Labels

whether the classes are balanced or if any imbalance needs to be addressed

In [None]:
plt.figure(figsize=(10, 6))
train_df['winner'].value_counts().plot(kind='bar')
plt.title("Distribution of Outcomes (Model A, Model B, Tie)")
plt.xlabel("Outcome")
plt.ylabel("Count")
plt.show()

## Distribution of Response Lengths

analyze how the lengths of responses from each LLM varies, as this can impact user preference. Useful because long or short responses might correleate with preference

In [None]:
train_df['response_a_length'] = train_df['response_a'].apply(len)
train_df['response_b_length'] = train_df['response_b'].apply(len)

plt.figure(figsize=(8, 6))
plt.hist(train_df['response_a_length'], bins=50, alpha=0.6, label='Response A Lengths')
plt.hist(train_df['response_b_length'], bins=50, alpha=0.6, label='Response B Lengths')
plt.title("Distribution of Response Lengths (Model A vs. Model B)")
plt.xlabel("Response Length")
plt.ylabel("Count")
plt.legend()
plt.show()

## Response Similarity

visualize the similarity between responses from each LLM. It could help determine whether the responses are generally similar or different (which may influence user preference)

In [None]:
# calculate TF-IDF-based similarity between responses
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(train_df['response_a'] + train_df['response_b'])
cosine_similarities = cosine_similarity(tfidf_matrix)

# plot histogram of similarity scores
plt.figure(figsize=(8, 6))
plt.hist(cosine_similarities.diagonal(), bins=50, alpha=0.7)
plt.title("Distribution of Cosine Similarities Between Responses")
plt.xlabel("Cosine Similarity")
plt.ylabel("Count")
plt.show()

## Model Performance and Diagnostics - Loss Curve

visualize model overfitting, underfitting, or training as expected

In [None]:
plt.figure(figsize=(8, 6))
plt.plot(train_loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title("Training and Validation Loss over Epochs")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

## Model Performance and Diagnostics - Confusion Matrix

visualize the performance of the model by showing how often it correctly predicts each type of outcome
from sklearn.metrics import confusion_matrix

In [None]:
cm = confusion_matrix(true_labels, predicted_labels)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Model A', 'Model B', 'Tie'], yticklabels=['Model A', 'Model B', 'Tie'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()

## Model Performance and Diagnostics - Probability Calibration

check if the model's confidence in its predictions aligns with the actual accuracy. If the model is confident in its predictions but the predictions are wrong, it indicates a need to calibrate the model

In [None]:
plt.figure(figsize=(8, 6))

for i, (y_true, y_prob) in enumerate(zip(true_labels, predicted_probs)):
    prob_true, prob_pred = calibration_curve(y_true == i, y_prob[:, i], n_bins=10)
    plt.plot(prob_pred, prob_true, marker='o', label=f'Class {i}')

plt.plot([0, 1], [0, 1], 'k:', label="Perfectly Calibrated")
plt.title("Calibration Curves")
plt.xlabel("Predicted Probability")
plt.ylabel("True Probability")
plt.legend()
plt.show()

# Generate Kaggle Submission File

In [None]:
def generate_submission(model, test_loader):
    model.eval()
    submission = []

    with torch.no_grad():
        for batch in test_loader:
            inputs, ids = batch
            input_ids = inputs['input_ids'].squeeze(1).to(CFG.device)
            attention_mask = inputs['attention_mask'].squeeze(1).to(CFG.device)

            outputs = model(input_ids, attention_mask)
            probabilities = torch.softmax(outputs, dim=1).cpu().numpy()

            for i, id in enumerate(ids):
                submission.append([id, probabilities[i, 0], probabilities[i, 1], probabilities[i, 2]])

    submission_df = pd.DataFrame(submission, columns=['id', 'winner_model_a', 'winner_model_b', 'winner_tie'])
    submission_df.to_csv('data/submission/submission.csv', index=False)

generate_submission(model, test_loader)