# Explainability

The goals of this notebooks are as follows:
- Explain what the model learned
- Show where and why it fails
- Connect failures to domain semantics

## 1. Extract Top Words Per Class

Because we used a **Linear SVM**, every word has a **weight per class**.

**What this means:**

- Postivie weight -> pushes prediction toward that class
- Negative weight -> pushes prediction away

In [2]:
import joblib
import numpy as np
import pandas as pd

LABELS = ['World', 'Sports', 'Business', 'Sci/Tech']

model = joblib.load('../models/text_clf.joblib')

tfidf = model.named_steps['tfidf']
clf = model.named_steps['clf']

feature_names = np.array(tfidf.get_feature_names_out())
coef = clf.coef_    # shape: (n_classes, n_features)


def top_features_for_class(class_idx, top_n=20):
    top_pos = np.argsort(coef[class_idx])[-top_n:]

    return pd.DataFrame({
        'token': feature_names[top_pos],
        'weight': coef[class_idx][top_pos]
    }).sort_values('weight', ascending=False)


for i, label in enumerate(LABELS):
    print(f'\n=== Top tokens for class {label} ===')
    display(top_features_for_class(i))

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations



=== Top tokens for class World ===


Unnamed: 0,token,weight
19,iraq,3.534139
18,afp,2.854504
17,canadian press,2.836131
16,iraqi,2.673732
15,iran,2.415589
14,nuclear,2.38364
13,york stocks,2.361854
12,president,2.333672
11,arafat,2.313761
10,afp afp,2.283628



=== Top tokens for class Sports ===


Unnamed: 0,token,weight
19,coach,3.382348
18,cup,2.912591
17,team,2.869023
16,sports,2.661308
15,season,2.540753
14,league,2.492252
13,players,2.457468
12,baseball,2.447412
11,olympic,2.386987
10,nba,2.380016



=== Top tokens for class Business ===


Unnamed: 0,token,weight
19,hellip,2.902799
18,economy,2.667554
17,tax,2.46267
16,oil,2.394382
15,bank,2.34871
14,airlines,2.23894
13,insurance,2.216638
12,enron,2.164013
11,stock,2.080233
10,halliburton,2.077205



=== Top tokens for class Sci/Tech ===


Unnamed: 0,token,weight
19,space,3.449374
18,nasa,3.292242
17,internet,3.218363
16,scientists,3.174407
15,software,2.767175
14,linux,2.584665
13,web,2.522223
12,apple,2.502676
11,online,2.324692
10,researchers,2.290024


## 2. Misclassification Analysis

This section covers any errors the model could make during classification. For instance, a token might overlap and might be wrongly classified. This occurs frequently between the "Busness" and "Sci/Tech" classes because of shared vocabulary between those two classes.

In [None]:
from datasets import load_dataset
from sklearn.model_selection import train_test_split

ds = load_dataset('ag_news')

X = ds['train']['text']
y = ds['train']['label']

X_train, X_hold, y_train, y_hold = train_test_split(
    X, y,
    test_size=0.15,
    random_state=42,
    stratify=y
)

preds = model.predict(X_hold)

errors = pd.DataFrame({
    'text': X_hold,
    'true': y_hold,
    'predicted': preds
})

errors = errors[errors.true != errors.predicted]

errors['true_label'] = errors.true.map(lambda x: LABELS[x])
errors['pred_label'] = errors.predicted.map(lambda x: LABELS[x])

errors.sample(10)