# Block 6 Exercise 1: Non-Linear Classification

## MNIST Data
We return to the MNIST data set on handwritten digits to compare non-linear classification algorithms ...   

In [1]:
#imports 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml

In [2]:
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)


In [3]:
#the full MNIST data set contains 70k samples of digits 0-9 as 28*28 gray scale images (represented as 784 dim vectors)
np.shape(X)

(70000, 784)

In [4]:
X.min()

0.0

In [5]:
#look at max/min value in the data
X.max()

255.0

### E1.1: Cross-Validation and Support Vector Machines
Train and optimize  C-SVM classifier on MNIST (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
* use a RBF kernel
* use *random search* with cross-validation to find the best settings for *gamma* and *C* (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)
* use max_iter in the SVM to avoid long training times 

In [6]:
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform,loguniform

svm = SVC(gamma='auto',kernel='rbf',max_iter=300)

distributions = dict(C=loguniform(1e-6, 1e+6),
                      gamma=loguniform(1e-6, 1e+1))
clf = RandomizedSearchCV(svm, distributions, random_state=0,n_jobs=6)
search = clf.fit(X, y)



In [7]:
search.best_params_

{'C': 7.119224664494237e-06, 'gamma': 4.072912667361452e-06}

### E1.2: Pipelines and simple Neural Networks
Split the MNIST data into  train- and test-sets and then train and evaluate a simple Multi Layer Perceptron (MLP) network. Since the non-linear activation functions of MLPs are sensitive to the scaling on the input (recall the *sigmoid* function), we need to scale all input values to [0,1] 

* combine all steps of your training in a SKL pipeline (https://scikit-learn.org/stable/modules/compose.html#pipeline)
* use a SKL-scaler to scale the data (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
* MLP Parameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
    * use a *SGD* solver
    * use *tanh* as activation function
    * compare networks with 1, 2 and 3 layers, use different numbers of neurons per layer
    * adjust training parameters *alpha* (regularization) and *learning rate* - how sensitive is the model to these parameters?
    * Hint: do not change all parameters at the same time, split into several experiments
* How hard is it to find the best parameters? How many experiments would you need to find the best parameters?
    


In [8]:
from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier

from sklearn.model_selection import GridSearchCV

from sklearn.model_selection import cross_val_score

from scipy.stats import randint

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

In [9]:
steps = [('scaler', StandardScaler()), ('mlpclassifier', MLPClassifier(solver="sgd",activation="tanh",max_iter=300))]
mlp = Pipeline(steps)


parameter_dist_1 = {
    'mlpclassifier__hidden_layer_sizes': [
        (250,75,150),                                          
        (125,100,),
        (75,)
    ]    
}

clf_hidden_layer = GridSearchCV(mlp, parameter_dist_1,n_jobs=6,verbose=4)
search_hidden_layer = clf_hidden_layer.fit(X_train,y_train)

Fitting 5 folds for each of 3 candidates, totalling 15 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done   8 out of  15 | elapsed: 26.8min remaining: 23.4min
[Parallel(n_jobs=6)]: Done  12 out of  15 | elapsed: 31.1min remaining:  7.8min
[Parallel(n_jobs=6)]: Done  15 out of  15 | elapsed: 33.1min finished


In [10]:
search_hidden_layer.best_estimator_

Pipeline(steps=[('scaler', StandardScaler()),
                ('mlpclassifier',
                 MLPClassifier(activation='tanh',
                               hidden_layer_sizes=(250, 75, 150), max_iter=300,
                               solver='sgd'))])

In [11]:
search_hidden_layer.cv_results_

{'mean_fit_time': array([1114.66078391,  734.05592327,  415.09423075]),
 'std_fit_time': array([22.63994507, 46.20116872, 68.11530291]),
 'mean_score_time': array([0.51402488, 0.31495953, 0.17713742]),
 'std_score_time': array([0.02252748, 0.04928753, 0.05479356]),
 'param_mlpclassifier__hidden_layer_sizes': masked_array(data=[(250, 75, 150), (125, 100), (75,)],
              mask=[False, False, False],
        fill_value='?',
             dtype=object),
 'params': [{'mlpclassifier__hidden_layer_sizes': (250, 75, 150)},
  {'mlpclassifier__hidden_layer_sizes': (125, 100)},
  {'mlpclassifier__hidden_layer_sizes': (75,)}],
 'split0_test_score': array([0.96166667, 0.95873016, 0.95865079]),
 'split1_test_score': array([0.96087302, 0.96214286, 0.95507937]),
 'split2_test_score': array([0.96134921, 0.96269841, 0.95944444]),
 'split3_test_score': array([0.95992063, 0.95888889, 0.95809524]),
 'split4_test_score': array([0.96293651, 0.96222222, 0.95626984]),
 'mean_test_score': array([0.96134921

##### Interpretation
Der mean_test_score verändert sich nicht stark nach Layergröße. Dennoch erhöht sich die Laufzeit pro Layer quadratisch, aber erzielt kaum bessere Testergebnisse

In [12]:
parameter_dist_2 = {
    'mlpclassifier__alpha': [0.0001,0.001,0.01,0.1,1,2,4,8]  
}

clf_alpha = GridSearchCV(mlp, parameter_dist_2,n_jobs=6,verbose=4)
search_alpha=clf_alpha.fit(X_train,y_train)

Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.
[Parallel(n_jobs=6)]: Done  13 tasks      | elapsed: 32.2min
[Parallel(n_jobs=6)]: Done  40 out of  40 | elapsed: 58.4min remaining:    0.0s
[Parallel(n_jobs=6)]: Done  40 out of  40 | elapsed: 58.4min finished


In [13]:
search_alpha.best_estimator_

Pipeline(steps=[('scaler', StandardScaler()),
                ('mlpclassifier',
                 MLPClassifier(activation='tanh', alpha=0.1, max_iter=300,
                               solver='sgd'))])

In [14]:
search_alpha.cv_results_

{'mean_fit_time': array([675.64232268, 669.51929655, 713.16079693, 708.60997272,
        537.76720285, 386.40315475, 269.86338186, 171.45911455]),
 'std_fit_time': array([58.77252033, 54.80189985, 22.2564743 , 16.4129268 , 59.67363984,
        45.86808855, 24.40373476, 22.39420529]),
 'mean_score_time': array([0.29281669, 0.29800382, 0.28842778, 0.27944593, 0.26130118,
        0.28802748, 0.27047563, 0.16037064]),
 'std_score_time': array([0.00870852, 0.0119433 , 0.01048754, 0.01226817, 0.0072192 ,
        0.03124481, 0.02685336, 0.02677762]),
 'param_mlpclassifier__alpha': masked_array(data=[0.0001, 0.001, 0.01, 0.1, 1, 2, 4, 8],
              mask=[False, False, False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'params': [{'mlpclassifier__alpha': 0.0001},
  {'mlpclassifier__alpha': 0.001},
  {'mlpclassifier__alpha': 0.01},
  {'mlpclassifier__alpha': 0.1},
  {'mlpclassifier__alpha': 1},
  {'mlpclassifier__alpha': 2},
  {'mlpclassifier__alp

##### Interpretation
Bei unterschiedlichen Alpha-Werten scheint das Modell nicht besonders sensitiv zu reagieren. Bei höheren alpha-Werten verringert sich die Laufzeit

In [15]:
parameter_dist_3 = {
    'mlpclassifier__learning_rate': ['constant','adaptive','invscaling']
}

clf_learning_rate = GridSearchCV(mlp, parameter_dist_3,n_jobs=6,verbose=4)
search_learning_rate=clf_learning_rate.fit(X_train,y_train)

[Parallel(n_jobs=6)]: Using backend LokyBackend with 6 concurrent workers.


Fitting 5 folds for each of 3 candidates, totalling 15 fits


[Parallel(n_jobs=6)]: Done   8 out of  15 | elapsed: 15.7min remaining: 13.8min
[Parallel(n_jobs=6)]: Done  12 out of  15 | elapsed: 19.9min remaining:  5.0min
[Parallel(n_jobs=6)]: Done  15 out of  15 | elapsed: 20.5min finished


In [16]:
search_learning_rate.best_estimator_

Pipeline(steps=[('scaler', StandardScaler()),
                ('mlpclassifier',
                 MLPClassifier(activation='tanh', max_iter=300, solver='sgd'))])

In [17]:
search_learning_rate.cv_results_

{'mean_fit_time': array([646.79725742, 593.9228447 , 191.20850706]),
 'std_fit_time': array([53.92121963,  9.86233752, 38.15640427]),
 'mean_score_time': array([0.2810473 , 0.20604506, 0.24494276]),
 'std_score_time': array([0.01963954, 0.05939105, 0.06237656]),
 'param_mlpclassifier__learning_rate': masked_array(data=['constant', 'adaptive', 'invscaling'],
              mask=[False, False, False],
        fill_value='?',
             dtype=object),
 'params': [{'mlpclassifier__learning_rate': 'constant'},
  {'mlpclassifier__learning_rate': 'adaptive'},
  {'mlpclassifier__learning_rate': 'invscaling'}],
 'split0_test_score': array([0.96111111, 0.9602381 , 0.84428571]),
 'split1_test_score': array([0.96007937, 0.95865079, 0.84603175]),
 'split2_test_score': array([0.95968254, 0.96150794, 0.85420635]),
 'split3_test_score': array([0.96103175, 0.96055556, 0.84777778]),
 'split4_test_score': array([0.95992063, 0.95968254, 0.85428571]),
 'mean_test_score': array([0.96036508, 0.96012698, 0.8

##### Interpretation
constant und adaptive unterscheiden sich nicht stark. Beim invscaling reduziert sich der mittlere Testwert start. Dennoch benötigt er im Mean nicht solange Zeit, wie 
das konstante und adpative lernen

##### How hard is it to find the best parameters? How many experiments would you need to find the best parameters?
Mit einem Gridsearch müssten alle Parameter- bzw. viele Parameterkombinationen durchgeprüft werden. Mit einem RandomSearch mit Cross Validierung können oft schneller Ergebnisse 
erzielt werden. Allerdings kann durch den Zufallsaspekt nur das lokale Optimum gefunden werden.