<a href="https://colab.research.google.com/github/dr-mushtaq/Math-QA-Difficulty-Classifier/blob/main/Math_QA_Difficulty_Classifier_(1%E2%80%935)_using_Transformersi_pynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**🔧 Step-by-Step Implementation Plan**

📁 1. Dataset Preparation

➤ **Data Sources**
Train: hendrycks_math_train.csv

Test: hendrycks_math_test.csv

➤ **Tasks:**

Load both CSVs using pandas

Inspect and clean the data (e.g., missing values)

Combine question + answer as input text

Difficulty is the label (1–5)

**🧠 2. Modeling Approach**

We’ll treat this as a text classification task, using transformer-based models like:

BERT, RoBERTa, or DeBERTa (for accuracy)

DistilBERT (for speed and low compute environments)

Fine-tune using HuggingFace Transformers

In [None]:
➤ Input:

plaintext
Input: "Q: <question text> A: <answer text>"
Output: Class label (1 to 5)

**🧪 3. Evaluation Metric**

Accuracy

Weighted F1-score (due to potential class imbalance)

Confusion Matrix

**📦 4. Required Libraries**

In [None]:
pip install transformers datasets scikit-learn pandas torch


**✅ Example Python Code (Core)**

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
import torch

# Load data
df = pd.read_csv("hendrycks_math_train.csv")
df_test = pd.read_csv("hendrycks_math_test.csv")

# Combine question and answer
df["text"] = "Q: " + df["question"] + " A: " + df["answer"]
df_test["text"] = "Q: " + df_test["question"] + " A: " + df_test["answer"]

# Prepare train/val split
train_texts, val_texts, train_labels, val_labels = train_test_split(df["text"], df["difficulty"], test_size=0.2, random_state=42)

# Tokenizer
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

train_encodings = tokenizer(list(train_texts), truncation=True, padding=True)
val_encodings = tokenizer(list(val_texts), truncation=True, padding=True)
test_encodings = tokenizer(list(df_test["text"]), truncation=True, padding=True)

# Dataset wrapper
class MathDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels=None):
        self.encodings = encodings
        self.labels = labels if labels is not None else None

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        if self.labels is not None:
            item["labels"] = torch.tensor(self.labels[idx] - 1)  # make 0-indexed
        return item

    def __len__(self):
        return len(self.encodings["input_ids"])

train_dataset = MathDataset(train_encodings, train_labels.tolist())
val_dataset = MathDataset(val_encodings, val_labels.tolist())

# Model
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=5)

# Training
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir='./logs',
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
)

trainer.train()

# Evaluation
preds = trainer.predict(val_dataset)
y_pred = preds.predictions.argmax(-1) + 1  # return to 1-indexed
y_true = val_labels.values

print(classification_report(y_true, y_pred, digits=3))
