# Multinomial Logistic Regression

Logistic regression is useful in classification on more than just binary classification.

What is we want an algorithm that discriminates between cats, dogs, birds and bees?

This is where multinomial classification comes in.

The multinomial regression function consists of two functional layers-

1. Linear prediction function (a.k.a. logit layer)
2. Softmax function (a.k.a. softmax layer)

The simplest way to think of it is as $k$ regression models being fit (one binary model for each class). Then, we take the [softmax](https://en.wikipedia.org/wiki/Softmax_function) of the probabilities on each, and pick the one with the highest probability:

![](logit_matrix.png)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import confusion_matrix

In [2]:
col_names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'num'] 

df = pd.read_csv('data/cleveland_data.csv', header = None)
df.columns = col_names # setting dataframe column names
df.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num
0,63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0
1,67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,2
2,67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1
3,37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0
4,41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0


In [3]:
# Basic Data Cleaning
df.replace({'?': np.nan}, inplace = True) 
df[['ca', 'thal']] = df[['ca', 'thal']].astype('float64')

df['ca_null'] = df['ca'].isnull().astype(int)
df['thal_null'] = df['thal'].isnull().astype(int)
df.ca = df.ca.fillna(0.)
df.thal = df.thal.fillna(0.)

df.describe()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,num,ca_null,thal_null
count,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0,303.0
mean,54.438944,0.679868,3.158416,131.689769,246.693069,0.148515,0.990099,149.607261,0.326733,1.039604,1.60066,0.663366,4.70297,0.937294,0.013201,0.006601
std,9.038662,0.467299,0.960126,17.599748,51.776918,0.356198,0.994971,22.875003,0.469794,1.161075,0.616226,0.934375,1.971038,1.228536,0.114325,0.08111
min,29.0,0.0,1.0,94.0,126.0,0.0,0.0,71.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,48.0,0.0,3.0,120.0,211.0,0.0,0.0,133.5,0.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0
50%,56.0,1.0,3.0,130.0,241.0,0.0,1.0,153.0,0.0,0.8,2.0,0.0,3.0,0.0,0.0,0.0
75%,61.0,1.0,4.0,140.0,275.0,0.0,2.0,166.0,1.0,1.6,2.0,1.0,7.0,2.0,0.0,0.0
max,77.0,1.0,4.0,200.0,564.0,1.0,2.0,202.0,1.0,6.2,3.0,3.0,7.0,4.0,1.0,1.0


In [4]:
y = df.num

cat_cols = ['cp', 'restecg', 'slope']
num_cols = ['age', 'trestbps', 'chol', 'restecg', 'thalach', 'oldpeak', 'ca', 'thal']

X = df[num_cols + ['ca_null', 'thal_null', 'sex']]

for c in cat_cols:
    X = X.join(pd.get_dummies(df[c].astype(int), drop_first=True, prefix=c))

for c in num_cols:
    X[c + '2'] = X[c] ** 2
    X[c + '3'] = X[c] ** 3

In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
logreg = LogisticRegression()
logreg.fit(X_train, y_train)

LogisticRegression()

# Evaluating the model:

In [6]:
y_pred = logreg.predict(X_test)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(logreg.score(X_test, y_test)))
print(confusion_matrix(y_test, y_pred))

Accuracy of logistic regression classifier on test set: 0.54
[[43  0  3  0  1]
 [16  0  3  0  0]
 [ 8  0  4  0  0]
 [ 5  0  2  2  0]
 [ 3  0  0  1  0]]


In [10]:
logreg.predict_proba(X_test)[:20]

array([[0.95640433, 0.03124195, 0.00426723, 0.00587676, 0.00220973],
       [0.02693941, 0.05303577, 0.40570076, 0.06483971, 0.44948436],
       [0.26584122, 0.21651808, 0.20464168, 0.18965609, 0.12334293],
       [0.27308184, 0.26727388, 0.20642196, 0.23086523, 0.02235708],
       [0.60180746, 0.16620789, 0.09041678, 0.08602481, 0.05554306],
       [0.11489436, 0.18983399, 0.30459165, 0.23020493, 0.16047508],
       [0.09322383, 0.21795055, 0.30738862, 0.30044066, 0.08099633],
       [0.62364853, 0.17092939, 0.07987833, 0.08729041, 0.03825334],
       [0.14613695, 0.20725993, 0.27369902, 0.2321014 , 0.1408027 ],
       [0.57028929, 0.19416427, 0.09283724, 0.10914737, 0.03356182],
       [0.49598954, 0.19789043, 0.1249956 , 0.121355  , 0.05976943],
       [0.67472233, 0.15700248, 0.06809723, 0.07374505, 0.0264329 ],
       [0.21145782, 0.21105329, 0.26031418, 0.19783528, 0.11933944],
       [0.45179661, 0.20944516, 0.14034853, 0.13721377, 0.06119593],
       [0.81277019, 0.11057473, 0.

We can see `predict_proba` outputs an array of probabilities (one per class) for each sample:

In [11]:
logreg.predict(X_test)[:20]

array([0, 4, 0, 0, 0, 2, 2, 0, 2, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0])

The prediction is simply the best one: