# Block 6 Exercise 1: Non-Linear Classification

# By Christian Wegert

## MNIST Data
We return to the MNIST data set on handwritten digits to compare non-linear classification algorithms ...   

In [1]:
#imports 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml

In [2]:
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)


In [3]:
#the full MNIST data set contains 70k samples of digits 0-9 as 28*28 gray scale images (represented as 784 dim vectors)
np.shape(X)

(70000, 784)

In [4]:
X.min()

0.0

In [5]:
#look at max/min value in the data
X.max()

255.0

In [6]:
from sklearn.model_selection import train_test_split

### E1.1: Cross-Validation and Support Vector Machines
Train and optimize  C-SVM classifier on MNIST (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
* use a RBF kernel
* use *random search* with cross-validation to find the best settings for *gamma* and *C* (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)

In [8]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

In [16]:
%%time
#clf = make_pipeline(StandardScaler(), SVC(gamma='auto'))
clf = SVC(gamma='auto', C=1.0)
clf.fit(X[0:10000], y[0:10000])

Wall time: 2min 4s


SVC(gamma='auto')

In [18]:
%%time
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform

distributions = dict(gamma=['auto', 'scale'], C=uniform(0.1, 4.0))

clf_otimized = RandomizedSearchCV(clf, distributions, random_state=0)
search = clf_otimized.fit(X[0:1000], y[0:1000])
search.best_params_

Wall time: 34.4 s


{'C': 2.295254015709299, 'gamma': 'scale'}

### E1.2: Pipelines and simple Neural Networks
Split the MNIST data into  train- and test-sets and then train and evaluate a simple Multi Layer Perceptron (MLP) network. Since the non-linear activation functions of MLPs are sensitive to the scaling on the input (recall the *sigmoid* function), we need to scale all input values to [0,1] 

* combine all steps of your training in a SKL pipeline (https://scikit-learn.org/stable/modules/compose.html#pipeline)
* use a SKL-scaler to scale the data (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
* MLP Parameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
    * use a *SGD* solver
    * use *tanh* as activation function
    * compare networks with 1, 2 and 3 layers, use different numbers of neurons per layer
    * adjust training parameters *alpha* (regularization) and *learning rate* - how sensitive is the model to these parameters?
    * Hint: do not change all parameters at the same time, split into several experiments
* How hard is it to find the best parameters? How many experiments would you need to find the best parameters?
    


In [24]:
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler

In [25]:
scaler = StandardScaler() 

In [32]:
scaler.fit(X,y)

StandardScaler()

In [33]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/7, random_state=42)

### Different Layers:

In [41]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 11.1 s


0.2153

In [42]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=2).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 21.5 s


0.377

In [43]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=3).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 27.6 s


0.4616

### Different Alpha:

In [58]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=0.0001).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 11.5 s


0.214

In [57]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=0.01).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 11.5 s


0.2121

In [56]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=0.1).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 18.4 s


0.213

In [50]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=0.5).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 13.4 s


0.2026

In [51]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=1).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 12.7 s


0.2077

In [52]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=2).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 9.54 s


0.2147

In [53]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=3).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 8.42 s


0.2158

In [54]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=10).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 7 s


0.216

In [55]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=100).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 4.65 s


0.2033

### Different learning rate:

In [59]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=0.0001, learning_rate='constant').fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 12.6 s


0.2109

In [60]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=0.0001, learning_rate='invscaling').fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 8.46 s


0.2065

In [62]:
%%time
clf_mlp = MLPClassifier(solver='sgd', activation='tanh', hidden_layer_sizes=1, alpha=0.0001, learning_rate='adaptive', max_iter=300).fit(X_train, y_train)
clf_mlp.score(X_test, y_test)

Wall time: 28.4 s


0.215

### Note: Optimization due to the number of parametersettings is very high -> automatic optimization neccesary...