# Logistic Regression using Scikit-learn libraries

#### Scikit-learn comes with five algorithms to solve the logistic regression equation
#### - newton-cg: Conjugate Gradient
#### - lbfgs(default): Limited-memory Broyden-Fletcher-Goldfarb-Shanno (default)
#### - liblinear
#### - sag: Stochastic Average Gradient descent
#### - saga: Extension of sag that also allows for L1 regularization. Faster than sag.

## Import Libraries

In [88]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression

## Load the dataset

In [89]:
mnist_train = pd.read_csv("./MNIST_training.csv")
mnist_test = pd.read_csv("./MNIST_test.csv")
mnist_train

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
944,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
945,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
946,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
947,9,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## Split X, y


In [90]:
x_train = mnist_train.drop(columns='label')
y_train = mnist_train['label']
x_test = mnist_test.drop(columns='label')
y_test = mnist_test['label']

## Check the train and test are divided well

In [91]:
print('train data lebel bincount:', np.bincount(mnist_train['label']))
print('test data label bincount:', np.bincount(mnist_test['label']))

train data lebel bincount: [95 95 95 95 95 95 94 95 95 95]
test data label bincount: [5 5 5 5 5 5 5 5 5 5]


## Import the model and learn

In [92]:
solver = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']
score = np.zeros(len(solver))

In [93]:
for i in range(len(solver)):
    model = LogisticRegression(max_iter=500, solver=solver[i])
    model.fit(x_train, y_train)
    
    y_pred = model.predict(x_test)
    
    score[i] = model.score(x_test, y_test)



## Evaluate model performance

In [94]:
for i in range(len(solver)):
    print('Accuracy of %s: %f' %(solver[i], score[i]))

Accuracy of newton-cg: 0.860000
Accuracy of lbfgs: 0.840000
Accuracy of liblinear: 0.820000
Accuracy of sag: 0.840000
Accuracy of saga: 0.840000
