# Ordinal Logistic Regression with mord on Amazon Polarity

## Introduction
We use `mord`, a Python library for monotonic ordinal regression, to build a simple yet effective ordinal classifier. The model understands that the labels (e.g., star ratings) are ordered.


In [None]:
!pip install -U -q datasets

In [3]:
!pip install -q mord scikit-learn pandas numpy

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for mord (setup.py) ... [?25l[?25hdone


In [12]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, classification_report, cohen_kappa_score
import mord
from datasets import load_dataset

## Load Dataset
We use the Amazon Polarity dataset from Hugging Face’s `datasets` library, mapping review polarity to ordinal levels (e.g., from 1 to 5).


In [None]:
dataset = load_dataset("amazon_polarity", split="train[:10000]")  # Use 10k samples for speed

# Prepare DataFrame
df = pd.DataFrame({
    "text": dataset["content"],
    # Simulate star ratings from polarity: map 0 → 1–2 stars, 1 → 4–5 stars
    "label": [1 if label == 0 else 5 for label in dataset["label"]]
})

In [None]:
# Add middle class (3-star)
df_middle = df.sample(frac=0.3, random_state=42).copy()
df_middle["label"] = 3
df = pd.concat([df, df_middle], ignore_index=True)

# Convert to ordinal labels starting from 0
label_map = {1: 0, 3: 1, 5: 2}
df["label"] = df["label"].map(label_map)

In [None]:
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(df["text"], df["label"], test_size=0.2, random_state=42)

## TF-IDF vectorization

In [None]:
# Vectorization using TF-IDF
vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

## Model: LogisticAT (All-Threshold)
`mord.LogisticAT()` models ordinal classification using a logistic function across ordered thresholds.


In [None]:
# Train Ordinal Logistic Regression
model = mord.LogisticAT(alpha=1.0)  # You can also try LogisticIT or LogisticSE
model.fit(X_train_tfidf, y_train)

## Evaluation
- Accuracy
- MAE (Mean Absolute Error)
- Classification Report
- Cohen Kappa Score

In [None]:
# Evaluation
y_pred = model.predict(X_test_tfidf)

print("Mean Absolute Error (MAE):", mean_absolute_error(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Mean Absolute Error (MAE): 0.5273076923076923

Classification Report:
               precision    recall  f1-score   support

           0       0.70      0.58      0.64      1036
           1       0.17      0.30      0.22       554
           2       0.71      0.55      0.62      1010

    accuracy                           0.51      2600
   macro avg       0.53      0.48      0.49      2600
weighted avg       0.59      0.51      0.54      2600



In [22]:
from sklearn.metrics import cohen_kappa_score

print("Quadratic Weighted Kappa (QWK):", cohen_kappa_score(y_test, y_pred, weights="quadratic"))


Quadratic Weighted Kappa (QWK): 0.576517501589078
