# Block 6 Exercise 1: Non-Linear Classification

## MNIST Data
We return to the MNIST data set on handwritten digits to compare non-linear classification algorithms ...   

In [3]:
#imports 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml

In [4]:
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

In [5]:
#the full MNIST data set contains 70k samples of digits 0-9 as 28*28 gray scale images (represented as 784 dim vectors)
np.shape(X)

(70000, 784)

In [6]:
X.min()

0.0

In [7]:
#look at max/min value in the data
X.max()

255.0

In [8]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

### E1.1: Cross-Validation and Support Vector Machines
Train and optimize  C-SVM classifier on MNIST (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
* use a RBF kernel
* use *random search* with cross-validation to find the best settings for *gamma* and *C* (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)

In [9]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
svc = SVC(kernel='rbf', random_state=42, max_iter=10)
distributions = dict(C=range(1,100,5), gamma=['scale', 'auto'])
clf = RandomizedSearchCV(svc, distributions, random_state=42, n_jobs=-1)
search = clf.fit(X_train, y_train)
search.best_params_
# getparams.key -> falls fehler mit Parameterzuordnung



{'gamma': 'scale', 'C': 41}

In [10]:
svc = SVC(C=41, gamma='scale', kernel='rbf', random_state=42)
svc.fit(X_train, y_train)

SVC(C=41, random_state=42)

In [11]:
print(svc.score(X_test, y_test))

0.982


### E1.2: Pipelines and simple Neural Networks
Split the MNIST data into  train- and test-sets and then train and evaluate a simple Multi Layer Perceptron (MLP) network. Since the non-linear activation functions of MLPs are sensitive to the scaling on the input (recall the *sigmoid* function), we need to scale all input values to [0,1] 

* combine all steps of your training in a SKL pipeline (https://scikit-learn.org/stable/modules/compose.html#pipeline)
* use a SKL-scaler to scale the data (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
* MLP Parameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
    * use a *SGD* solver
    * use *tanh* as activation function
    * compare networks with 1, 2 and 3 layers, use different numbers of neurons per layer
    * adjust training parameters *alpha* (regularization) and *learning rate* - how sensitive is the model to these parameters?
    * Hint: do not change all parameters at the same time, split into several experiments
* How hard is it to find the best parameters? How many experiments would you need to find the best parameters?
    


In [12]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import MinMaxScaler
from sklearn.neural_network import MLPClassifier

In [14]:
min_max_scaler = MinMaxScaler()
X_train_scaled = min_max_scaler.fit_transform(X_train)
print(X_train_scaled.min(), X_train_scaled.max())

0.0 1.0


In [31]:
mlp = MLPClassifier(activation='tanh', solver='sgd', max_iter=200, random_state=42)
distributions = dict(hidden_layer_sizes=[(10,5,10), (20, 20, 10), (10), (10, 10), (5, 10), (10, 50, 10), (30, 25, 10)])
pipe = make_pipeline(RandomizedSearchCV(mlp, distributions, random_state=42, n_jobs=-1)).fit(X_train_scaled, y_train)



In [32]:
pipe[0].best_params_

{'hidden_layer_sizes': (30, 25, 10)}

In [23]:
X_test_scaled = min_max_scaler.fit_transform(X_test)
mlp =  MLPClassifier(hidden_layer_sizes=(10,5,10), activation='tanh', solver='sgd', random_state=42)
y_pred = mlp.fit(X_train_scaled, y_train)
print(mlp.score(X_test_scaled, y_test))

0.9324285714285714




In [26]:
mlp =  MLPClassifier(hidden_layer_sizes=(20,20,10), activation='tanh', solver='sgd', random_state=42)
y_pred = mlp.fit(X_train_scaled, y_train)
print(mlp.score(X_test_scaled, y_test))

0.953




In [33]:
mlp =  MLPClassifier(hidden_layer_sizes=(30,25,10), activation='tanh', solver='sgd', random_state=42)
y_pred = mlp.fit(X_train_scaled, y_train)
# test data accurracy
print(mlp.score(X_test_scaled, y_test))

0.9652857142857143




In [34]:
# training data accurracy
print(mlp.score(X_train_scaled, y_train))

0.9882857142857143
