# MNIST - Regression Models

We have already presented the MNIST dataset in this repository: "6. MNIST - An Unsupervised Approach.ipynb". A preliminary analysis of the dataset has been performed there, and won't be repeated here.

Now the problem will be solved by supervised means. After a dimensional reduction, we will use different regression/classification methods, and at the end present deep learning solutions to the problem.

[The dataset was obtained from Google Colab sample_data.]

## 0. Loading the Dataset

In [2]:
import numpy as np
import scipy as sc
import sklearn as sk
import pandas  as pd
import matplotlib.pyplot as plt
import seaborn as sb

from sklearn.decomposition import PCA
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures

from sklearn.datasets import load_digits

mnist = pd.read_csv('C:/Users/Multivac/mnist_train_small.csv', header=None)
mnist = np.array(mnist)
print('The shape of the dataset is:',mnist.shape)
print()
print(mnist)

X = mnist[:,1:]
Y = mnist[:,0]

The shape of the dataset is: (20000, 785)

[[6 0 0 ... 0 0 0]
 [5 0 0 ... 0 0 0]
 [7 0 0 ... 0 0 0]
 ...
 [2 0 0 ... 0 0 0]
 [9 0 0 ... 0 0 0]
 [5 0 0 ... 0 0 0]]


## 1. Dimensional Reduction: PCA

In [41]:
x=X
ipca = PCA(n_components=0.5)
ipca.fit(x)
xt = ipca.transform(x)

print(xt.shape)

(20000, 11)


## 2. Solutions of the Problem: Regression Models

### 2.1 Linear Regression

In [0]:
acclist = []
iterations=100

for i in range(iterations):
    X_train, X_test, Y_train, Y_test = train_test_split(xt,Y, test_size=0.3)
    regr_lineal = linear_model.LinearRegression()
    regr_lineal.fit(X_train, Y_train)#[:, np.newaxis])
    Ypred = regr_lineal.predict(X_test)
    acc = np.sum(np.round(Ypred) == Y_test)/len(Y_test)
    acclist.append(acc)

print('Mean accuracy of Linear Regression after',iterations,'iterations =', round(np.mean(acclist)*100,2),'%')

Mean accuracy of Linear Regression after 100 iterations = 13.92 %


### 2.2 Polynomial Regression

In [0]:
pol_acc_list = []
max_grados = 5

for gr in range(1, max_grados+1):
    pol = PolynomialFeatures(gr)
    # Polynomial Transformation of X
    x_pol = pol.fit_transform(xt)
  
    X_train, X_test, Y_train, Y_test = train_test_split(x_pol,Y, test_size=0.3)
    
    model = linear_model.LinearRegression()
    model.fit(X_train, Y_train)
    y_pol_pred = model.predict(X_test)
    pol_acc = np.sum(np.round(y_pol_pred) == Y_test)/len(Y_test)
    pol_acc_list.append(pol_acc)
    print('Accuracy of Polynomial Regression for degree',gr,'=', round(np.mean(pol_acc_list)*100,2),'%')
    pol_acc_list=[]

Accuracy of Polynomial Regression for degree 1 = 13.18 %
Accuracy of Polynomial Regression for degree 2 = 26.62 %
Accuracy of Polynomial Regression for degree 3 = 36.07 %
Accuracy of Polynomial Regression for degree 4 = 43.23 %
Accuracy of Polynomial Regression for degree 5 = 42.37 %


### 2.3 Logistic Regression (Classification)

In [0]:
iterations = 5
acc = []
acc_log= []
  
for i in range(iter):
    X_train, X_test, Y_train, Y_test = train_test_split(xt,Y, test_size=0.3)
    lo = LogisticRegression(multi_class='multinomial', solver="lbfgs",max_iter=5000,verbose=0).fit(X_train, Y_train)
    acc_log.append(lo.score(X_test, Y_test))
print('Mean accuracy of Logistic Regression after',iterations,'iterations =', round(np.mean(acc_log)*100,2),'%')

Mean accuracy of Logistic Regression after 5 iterations = 80.73 %
