# Block 6 Exercise 1: Non-Linear Classification

## MNIST Data
We return to the MNIST data set on handwritten digits to compare non-linear classification algorithms ...   

In [8]:
#imports 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split

In [2]:
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)


In [3]:
#the full MNIST data set contains 70k samples of digits 0-9 as 28*28 gray scale images (represented as 784 dim vectors)
np.shape(X)

(70000, 784)

In [4]:
X.min()

0.0

In [5]:
#look at max/min value in the data
X.max()

255.0

### E1.1: Cross-Validation and Support Vector Machines
Train and optimize  C-SVM classifier on MNIST (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
* use a RBF kernel
* use *random search* with cross-validation to find the best settings for *gamma* and *C* (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)

In [None]:
clf = make_pipeline(StandardScaler(), SVC(C = 1, gamma='auto'))
clf.fit(X, y)

In [6]:
SVC = make_pipeline(StandardScaler(), SVC(max_iter = 1000))

In [7]:
param = dict(svc__C = [4,8,16], svc__gamma = ['auto', 'scale'])

clf = RandomizedSearchCV(SVC, param, random_state=0, verbose = 2,n_jobs=-1)
search = clf.fit(X, y)
search.best_params_

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.


Fitting 5 folds for each of 6 candidates, totalling 30 fits


[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed: 196.6min finished


{'svc__gamma': 'auto', 'svc__C': 8}

### E1.2: Pipelines and simple Neural Networks
Split the MNIST data into  train- and test-sets and then train and evaluate a simple Multi Layer Perceptron (MLP) network. Since the non-linear activation functions of MLPs are sensitive to the scaling on the input (recall the *sigmoid* function), we need to scale all input values to [0,1] 

* combine all steps of your training in a SKL pipeline (https://scikit-learn.org/stable/modules/compose.html#pipeline)
* use a SKL-scaler to scale the data (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
* MLP Parameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
    * use a *SGD* solver
    * use *tanh* as activation function
    * compare networks with 1, 2 and 3 layers, use different numbers of neurons per layer
    * adjust training parameters *alpha* (regularization) and *learning rate* - how sensitive is the model to these parameters?
    * Hint: do not change all parameters at the same time, split into several experiments
* How hard is it to find the best parameters? How many experiments would you need to find the best parameters?
    


In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

## Testing different Layer depth

In [14]:
print("Testing different Layer depth\n\n")
layers = [(100),(100,100),(100,100,100)]
for layer in layers:
    MLP = make_pipeline(StandardScaler(), MLPClassifier(solver = 'sgd', activation = 'tanh', learning_rate = 'constant', hidden_layer_sizes = layer))
    Clf = MLP.fit(X_train, y_train)
    score = Clf.score(X_test,y_test)
    print('Layers: ', layer, "\tScore: ", score)

Testing different Layer depth






Layers:  100 		Score:  0.961 





Layers:  (100, 100) 		Score:  0.9615 

Layers:  (100, 100, 100) 		Score:  0.9607857142857142 



__Best number of Layers: 2__

## Testing different alpha values

In [15]:
print("Testing different alpha values\n\n")
alphas = [0.00001,0.0001,0.001,0.01,0.1,1]
for alp in alphas:
    MLP = make_pipeline(StandardScaler(), MLPClassifier(solver = 'sgd', activation = 'tanh', learning_rate = 'constant', hidden_layer_sizes = (100,100), alpha = alp))
    Clf = MLP.fit(X_train, y_train)
    score = Clf.score(X_test,y_test)
    print('Alpha: ', alp, "\tScore: ", score)

Testing different alpha values






Alpha:  1e-05 	Score:  0.9604285714285714




Alpha:  0.0001 	Score:  0.9632857142857143




Alpha:  0.001 	Score:  0.9624285714285714




Alpha:  0.01 	Score:  0.9619285714285715




Alpha:  0.1 	Score:  0.9646428571428571




Alpha:  1 	Score:  0.9650714285714286


__Best Alpha: 1__

## Testing different learning rates

In [16]:
LearninRates = ['constant', 'invscaling', 'adaptive']
for leraningRate in LearninRates:
    MLP = make_pipeline(StandardScaler(), MLPClassifier(solver = 'sgd', activation = 'tanh', learning_rate = leraningRate, hidden_layer_sizes = (100,100), alpha = 1))
    Clf = MLP.fit(X_train, y_train)
    score = Clf.score(X_test,y_test)
    print('Leraning Rate: ', leraningRate, "\tScore: ", score)



Leraning Rate:  constant 	Score:  0.9642142857142857
Leraning Rate:  invscaling 	Score:  0.8493571428571428




Leraning Rate:  adaptive 	Score:  0.9631428571428572


__Best Leraning rate: constant__