# Multinomial Regression
* It's handled the same way than Binary cases.
* It provides the argmax of the probabilities for all cases.
* It's also called Softmax Regression because of the function for the loss.
* We'll use it to classify handwritten digits

## Step 1: Importing the dataset

In [4]:
from sklearn import datasets
import numpy as np
# Loading the dataset
digits = datasets.load_digits()
# Getting the number of samples
n_samples = len(digits.images)

# Flattening the images since they're 8*8 matrices
X = digits.images.reshape((n_samples, -1))
# Storing the target values (classes) 0 to 9
y = digits.target
np.unique(y)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## Step 2: Splitting the data


In [7]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,
random_state = 42)

# Getting the shape of our data: Arrays of 64 elements
print(X_train.shape, X_test.shape)

(1437, 64) (360, 64)


## Step 3: Finding the optimal Multiclass Logistic Regression model with GridSearch and Cross-Validation
* We'll teak some values (learning rate, penalty, alpha)

In [15]:
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import SGDClassifier
parameters = {'penalty': ['l1', 'l2', None],
              'alpha': [1e-07, 1e-06, 1e-05],
              'eta0': [0.001, 0.01, 0.1, 1]}
sgd_lr = SGDClassifier(loss = 'log_loss', learning_rate = 'constant',
                       fit_intercept = True, max_iter = 100000)

grid_search = GridSearchCV(sgd_lr, parameters, n_jobs = -1, cv = 3)

grid_search.fit(X_train, y_train)
print(grid_search.best_params_)

{'alpha': 1e-05, 'eta0': 0.001, 'penalty': None}


## Step 4: Predicting values and evaluating the model

In [22]:
from sklearn.metrics import roc_auc_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

# Predicting using the best model
sgd_lr_best = grid_search.best_estimator_
accuracy = sgd_lr_best.score(X_test, y_test)
print(f'The accuracy on testing set is: {accuracy*100:.1f}%')

# Getting the ROC_AUC
predict_proba = sgd_lr_best.predict_proba(X_test)
roc_auc = roc_auc_score(y_test, predict_proba, multi_class='ovo', average='macro')
print(f'The ROC_AUC on testing set is: {roc_auc*100:.1f}%')

# Getting the confusion matrix
predict = sgd_lr_best.predict(X_test)
print(confusion_matrix(y_test, predict))

# Printing the report
print(classification_report(y_test, predict))


The accuracy on testing set is: 95.6%
The ROC_AUC on testing set is: 99.8%
[[33  0  0  0  0  0  0  0  0  0]
 [ 0 25  0  0  0  0  0  0  3  0]
 [ 0  0 32  0  0  0  0  0  1  0]
 [ 0  0  0 32  0  1  0  0  1  0]
 [ 0  0  0  0 46  0  0  0  0  0]
 [ 0  0  1  0  0 44  1  0  0  1]
 [ 0  0  0  0  0  1 34  0  0  0]
 [ 0  0  0  0  0  0  0 33  0  1]
 [ 0  0  0  0  0  0  0  0 30  0]
 [ 0  0  0  0  0  0  0  0  5 35]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        33
           1       1.00      0.89      0.94        28
           2       0.97      0.97      0.97        33
           3       1.00      0.94      0.97        34
           4       1.00      1.00      1.00        46
           5       0.96      0.94      0.95        47
           6       0.97      0.97      0.97        35
           7       1.00      0.97      0.99        34
           8       0.75      1.00      0.86        30
           9       0.95      0.88      0.91        40