<a href="https://colab.research.google.com/github/dr-mushtaq/Math-QA-Difficulty-Classifier/blob/main/Math_QA_Difficulty_Classifier_(1%E2%80%935)_using_Transformersi_pynb.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**🔧 Step-by-Step Implementation Plan**

📁 1. Dataset Preparation

➤ **Data Sources**
Train: hendrycks_math_train.csv

Test: hendrycks_math_test.csv

➤ **Tasks:**

Load both CSVs using pandas

Inspect and clean the data (e.g., missing values)

Combine question + answer as input text

Difficulty is the label (1–5)

**🧠 2. Modeling Approach**

We’ll treat this as a text classification task, using transformer-based models like:

BERT, RoBERTa, or DeBERTa (for accuracy)

DistilBERT (for speed and low compute environments)

Fine-tune using HuggingFace Transformers

In [None]:
➤ Input:

plaintext
Input: "Q: <question text> A: <answer text>"
Output: Class label (1 to 5)

**🧪 3. Evaluation Metric**

Accuracy

Weighted F1-score (due to potential class imbalance)

Confusion Matrix

**📦 4. Required Libraries**

In [1]:
pip install transformers datasets scikit-learn pandas torch


Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

**✅ Example Python Code (Core)**

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

In [5]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
import torch

# Load data
df = pd.read_csv("https://raw.githubusercontent.com/dr-mushtaq/Math-QA-Difficulty-Classifier/main/Dataset/hendrycks_math_train%20(1).csv")
df_test = pd.read_csv("https://raw.githubusercontent.com/dr-mushtaq/Math-QA-Difficulty-Classifier/main/Dataset/hendrycks_math_test%20(1).csv")

# Combine problem and solution
df["text"] = "Q: " + df["problem"] + " A: " + df["solution"]
df_test["text"] = "Q: " + df_test["problem"] + " A: " + df_test["solution"]

# Prepare train/val split
train_texts, val_texts, train_labels, val_labels = train_test_split(df["text"], df["level"], test_size=0.2, random_state=42)

# Tokenizer
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

train_encodings = tokenizer(list(train_texts), truncation=True, padding=True)
val_encodings = tokenizer(list(val_texts), truncation=True, padding=True)
test_encodings = tokenizer(list(df_test["text"]), truncation=True, padding=True)

# Dataset wrapper
class MathDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels=None):
        self.encodings = encodings
        self.labels = labels if labels is not None else None

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        if self.labels is not None:
            item["labels"] = torch.tensor(self.labels[idx] - 1)  # make 0-indexed
        return item

    def __len__(self):
        return len(self.encodings["input_ids"])

train_dataset = MathDataset(train_encodings, train_labels.tolist())
val_dataset = MathDataset(val_encodings, val_labels.tolist())

# Model
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=5)

# Training
training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    save_strategy="epoch", # Added to match eval_strategy
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir='./logs',
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
)

trainer.train()

# Evaluation
preds = trainer.predict(val_dataset)
y_pred = preds.predictions.argmax(-1) + 1  # return to 1-indexed
y_true = val_labels.values

print(classification_report(y_true, y_pred, digits=3))

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=models
wandb: Paste an API key from your profile and hit enter:

 ··········


wandb: Paste an API key from your profile and hit enter:

 ··········


wandb: Paste an API key from your profile and hit enter:


Abort: 