# MultiClass Classification

## Import Libraries

In [1]:
from sklearn.datasets import fetch_openml
import numpy as np

## Load MNIST

In [None]:
mnist = fetch_openml("mnist_784", version = 1, as_frame = False)

In [5]:
X, y = mnist['data'], mnist['target']
y = y.astype(np.int64)

X.shape, y.shape

((70000, 784), (70000,))

In [7]:
X_train, X_test = X[:60000], X[60000:]
y_train, y_test =  y[:60000], y[60000:]

X_train.shape, y_train.shape, X_test.shape, y_test.shape

((60000, 784), (60000,), (10000, 784), (10000,))

## Train MultiClass Classifier

In [9]:
from sklearn.linear_model import LogisticRegression

clf = LogisticRegression(max_iter = 1000)
clf.fit(X_train, y_train)

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


## Verify MutliClass setup

In [10]:
clf.classes_

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int64)

## Scores per class

In [13]:
scores = clf.decision_function([X_test[0]])
scores

array([[ -2.64200038, -26.56911243,  -4.1860004 ,  12.20611372,
         -1.24953321,   5.22542043, -16.69306969,  17.9140524 ,
          6.91427436,   9.07985521]])

## Prediction Working

In [14]:
np.argmax(scores)

7

## Verify

In [15]:
clf.predict([X_test[0]])

array([7], dtype=int64)

### This is multiclass prediction:
`many binary scores → argmax → one class`

## Manual OvR thinking

In [23]:
for digit, score in enumerate(scores[0]):
    print(f"Digit {digit} : score = {score:.2f}")

Digit 0 : score = -2.64
Digit 1 : score = -26.57
Digit 2 : score = -4.19
Digit 3 : score = 12.21
Digit 4 : score = -1.25
Digit 5 : score = 5.23
Digit 6 : score = -16.69
Digit 7 : score = 17.91
Digit 8 : score = 6.91
Digit 9 : score = 9.08


## MultiClass confusion matrix

In [24]:
from sklearn.metrics import confusion_matrix

y_pred = clf.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

cm.shape  ## will be NxN where N is number of classes

(10, 10)

In [25]:
cm

array([[ 955,    0,    3,    2,    1,    5,    6,    4,    4,    0],
       [   0, 1110,    8,    3,    0,    1,    3,    2,    8,    0],
       [   5,   13,  917,   18,   12,    6,   11,    9,   38,    3],
       [   3,    1,   18,  921,    2,   23,    3,   11,   22,    6],
       [   3,    3,    6,    4,  908,    0,   10,    7,   11,   30],
       [  12,    5,    3,   36,   12,  758,   17,    6,   36,    7],
       [  10,    3,    8,    2,    7,   18,  906,    1,    3,    0],
       [   4,    7,   25,    8,    4,    2,    0,  945,    3,   30],
       [   7,   14,    6,   21,    6,   23,   10,   14,  862,   11],
       [   8,    7,    1,   10,   21,    7,    1,   22,    9,  923]],
      dtype=int64)

## Visual Intution

In [26]:
cm[5]

array([ 12,   5,   3,  36,  12, 758,  17,   6,  36,   7], dtype=int64)

### Interpretation:

```python 
    Correct 5s → diagonal value
    Misclassified 5s → where else they land (often 3, 8)
    This shows error patterns, not just accuracy.
```

## Precison & Recall per class

In [27]:
from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.95      0.97      0.96       980
           1       0.95      0.98      0.97      1135
           2       0.92      0.89      0.90      1032
           3       0.90      0.91      0.91      1010
           4       0.93      0.92      0.93       982
           5       0.90      0.85      0.87       892
           6       0.94      0.95      0.94       958
           7       0.93      0.92      0.92      1028
           8       0.87      0.89      0.88       974
           9       0.91      0.91      0.91      1009

    accuracy                           0.92     10000
   macro avg       0.92      0.92      0.92     10000
weighted avg       0.92      0.92      0.92     10000



### You’ll see:

    1. precision per digit
    
    2. recall per digit
    
    3. f1-score per digit
    
    4. macro avg
    
    5. weighted avg

### This is:

`Binary metrics applied class-by-class`

## Why ROC / PR isn’t shown here

Because now:
1. You’d need 10 curves
2. Or averaged curves
3. Which hide which digits fail

`That’s why confusion matrix + per-class metrics matter more.`

## What Logistic Regression is iterating on

    Each iteration:
    
    1. Computes gradients
    
    2. Adjusts weights to reduce loss
    
    3. Repeats

```python 
    iteration 1  → bad weights
    iteration 50 → better
    iteration 200 → much better
    iteration 1000 → stable
```

### If you stop too early:

* Decision boundaries are half-baked

* Scores are unreliable

* Confusion matrix worsens

``` We increase max_iter to ensure the optimizer converges, especially for high-dimensional multiclass problems like MNIST. It’s about training stability, not model tuning.```