# Block 6 Exercise 1: Non-Linear Classification

## MNIST Data
We return to the MNIST data set on handwritten digits to compare non-linear classification algorithms ...   

In [1]:
#imports 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml

In [2]:
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)


In [3]:
#the full MNIST data set contains 70k samples of digits 0-9 as 28*28 gray scale images (represented as 784 dim vectors)
np.shape(X)
type(X)

pandas.core.frame.DataFrame

In [4]:
X.min()

pixel1      0.0
pixel2      0.0
pixel3      0.0
pixel4      0.0
pixel5      0.0
           ... 
pixel780    0.0
pixel781    0.0
pixel782    0.0
pixel783    0.0
pixel784    0.0
Length: 784, dtype: float64

In [5]:
#look at max/min value in the data
X.max()

pixel1       0.0
pixel2       0.0
pixel3       0.0
pixel4       0.0
pixel5       0.0
            ... 
pixel780    62.0
pixel781     0.0
pixel782     0.0
pixel783     0.0
pixel784     0.0
Length: 784, dtype: float64

### E1.1: Cross-Validation and Support Vector Machines
Train and optimize  C-SVM classifier on MNIST (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
* use a RBF kernel
* use *random search* with cross-validation to find the best settings for *gamma* and *C* (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)

In [12]:
%%time
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42)

clf_svm = make_pipeline(StandardScaler(),SVC(C = 1.0, kernel='rbf', gamma = 'auto', max_iter=40, random_state=42))
clf_svm.fit(X_train, y_train)
y_pred_train = clf_svm.predict(X_train)
y_pred_test = clf_svm.predict(X_test)

print(accuracy_score(y_train, y_pred_train))
print(accuracy_score(y_test, y_pred_test))



0.7217678571428572
0.7225
Wall time: 4min 5s


In [16]:
def report(results, n_top=3):
    for i in range(1, n_top + 1):
        candidates = np.flatnonzero(results['rank_test_score'] == i)
        for candidate in candidates:
            print("Model with rank: {0}".format(i))
            print("Mean validation score: {0:.3f} (std: {1:.3f})"
                  .format(results['mean_test_score'][candidate],
                          results['std_test_score'][candidate]))
            print("Parameters: {0}".format(results['params'][candidate]))
            print("")

In [20]:
%%time
from sklearn.model_selection import RandomizedSearchCV
from time import time
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42)
clf_svm = SVC(max_iter = 50, random_state = 42)

# use a full grid over all parameters
param_grid = {'C': np.linspace(0, 1, num=10),
              'gamma': np.linspace(0.00001, 0.01, num=15)}

# run grid search
SVM_random_search = RandomizedSearchCV(estimator= clf_svm, param_distributions = param_grid, cv = 4, random_state=42)
start = time()
SVM_random_search.fit(X_train, y_train)

print("RandomizedSearchCV took %.2f seconds for %d candidate parameter settings."
      % (time() - start, len(SVM_random_search.cv_results_['params'])))
report(SVM_random_search.cv_results_)

y_pred_train = SVM_random_search.predict(X_train)
y_pred_test = SVM_random_search.predict(X_test)

print(accuracy_score(y_train, y_pred_train))
print(accuracy_score(y_test, y_pred_test))



RandomizedSearchCV took 3207.80 seconds for 10 candidate parameter settings.
Model with rank: 1
Mean validation score: 0.835 (std: 0.013)
Parameters: {'gamma': 0.004291428571428571, 'C': 1.0}

Model with rank: 2
Mean validation score: 0.817 (std: 0.009)
Parameters: {'gamma': 0.005005, 'C': 0.5555555555555556}

Model with rank: 3
Mean validation score: 0.810 (std: 0.006)
Parameters: {'gamma': 0.0028642857142857146, 'C': 0.4444444444444444}

0.8131964285714286
0.8011428571428572
Wall time: 58min 4s


### E1.2: Pipelines and simple Neural Networks
Split the MNIST data into  train- and test-sets and then train and evaluate a simple Multi Layer Perceptron (MLP) network. Since the non-linear activation functions of MLPs are sensitive to the scaling on the input (recall the *sigmoid* function), we need to scale all input values to [0,1] 

* combine all steps of your training in a SKL pipeline (https://scikit-learn.org/stable/modules/compose.html#pipeline)
* use a SKL-scaler to scale the data (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
* MLP Parameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
    * use a *SGD* solver
    * use *tanh* as activation function
    * compare networks with 1, 2 and 3 layers, use different numbers of neurons per layer
    * adjust training parameters *alpha* (regularization) and *learning rate* - how sensitive is the model to these parameters?
    * Hint: do not change all parameters at the same time, split into several experiments
* How hard is it to find the best parameters? How many experiments would you need to find the best parameters?
    


In [22]:
%%time
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.pipeline import Pipeline
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import f1_score, accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42)


Wall time: 1.32 s


#### 1. Parameter Set

In [29]:
%%time
clf_MLP_1 = make_pipeline(StandardScaler(), MLPClassifier(hidden_layer_sizes = 1, activation = 'tanh', 
                                                        solver = 'sgd', alpha = 0.0001, learning_rate = 'constant', 
                                                        max_iter=250, random_state=42))
clf_MLP_1.fit(X_train, y_train)
y_pred_train = clf_MLP_1.predict(X_train)
y_pred_test = clf_MLP_1.predict(X_test)

print('accuracy score with train data: ', accuracy_score(y_train, y_pred_train))
print('accuracy score with test data : ', accuracy_score(y_test, y_pred_test))
print('f1 score with train data: ', f1_score(y_train, y_pred_train, average='macro'))
print('f1 score with test data : ', f1_score(y_test, y_pred_test, average='macro'))



accuracy score with train data:  0.37141071428571426
accuracy score with test data :  0.3725
f1 score with train data:  0.29783460600101647
f1 score with test data :  0.29649149230806626
Wall time: 6min 43s


#### 2. Parameter Set

In [30]:
%%time
clf_MLP_2 = make_pipeline(StandardScaler(), MLPClassifier(hidden_layer_sizes = 2, activation = 'tanh', 
                                                        solver = 'sgd', alpha = 0.0001, learning_rate = 'constant', 
                                                        max_iter=250, random_state=42))
clf_MLP_2.fit(X_train, y_train)
y_pred_train = clf_MLP_2.predict(X_train)
y_pred_test = clf_MLP_2.predict(X_test)

print('accuracy score with train data: ', accuracy_score(y_train, y_pred_train))
print('accuracy score with test data : ', accuracy_score(y_test, y_pred_test))
print('f1 score with train data: ', f1_score(y_train, y_pred_train, average='macro'))
print('f1 score with test data : ', f1_score(y_test, y_pred_test, average='macro'))



accuracy score with train data:  0.6196785714285714
accuracy score with test data :  0.608
f1 score with train data:  0.5969268646705529
f1 score with test data :  0.5835539061752788
Wall time: 7min 17s


#### 3. Parameter Set

In [31]:
%%time
clf_MLP_3 = make_pipeline(StandardScaler(), MLPClassifier(hidden_layer_sizes = 3, activation = 'tanh', 
                                                        solver = 'sgd', alpha = 0.01, learning_rate = 'constant', 
                                                        max_iter=250, random_state=42))
clf_MLP_3.fit(X_train, y_train)
y_pred_train = clf_MLP_3.predict(X_train)
y_pred_test = clf_MLP_3.predict(X_test)

print('accuracy score with train data: ', accuracy_score(y_train, y_pred_train))
print('accuracy score with test data : ', accuracy_score(y_test, y_pred_test))
print('f1 score with train data: ', f1_score(y_train, y_pred_train, average='macro'))
print('f1 score with test data : ', f1_score(y_test, y_pred_test, average='macro'))



accuracy score with train data:  0.770625
accuracy score with test data :  0.7568571428571429
f1 score with train data:  0.7609884104295912
f1 score with test data :  0.7467334214103711
Wall time: 5min 37s


#### 4. Parameter Set

In [32]:
%%time
clf_MLP_4 = make_pipeline(StandardScaler(), MLPClassifier(hidden_layer_sizes = 3, activation = 'tanh', 
                                                        solver = 'sgd', alpha = 0.00001, learning_rate = 'invscaling', 
                                                        max_iter=250, random_state=42))
clf_MLP_4.fit(X_train, y_train)
y_pred_train = clf_MLP_4.predict(X_train)
y_pred_test = clf_MLP_4.predict(X_test)

print('accuracy score with train data: ', accuracy_score(y_train, y_pred_train))
print('accuracy score with test data : ', accuracy_score(y_test, y_pred_test))
print('f1 score with train data: ', f1_score(y_train, y_pred_train, average='macro'))
print('f1 score with test data : ', f1_score(y_test, y_pred_test, average='macro'))

accuracy score with train data:  0.4200357142857143
accuracy score with test data :  0.4184285714285714
f1 score with train data:  0.3021556731325291
f1 score with test data :  0.3019010094808761
Wall time: 1min 55s


#### 5. Parameter Set

In [33]:
%%time
clf_MLP_5 = make_pipeline(StandardScaler(), MLPClassifier(hidden_layer_sizes = 3, activation = 'tanh', 
                                                        solver = 'sgd', alpha = 0.000001, learning_rate = 'adaptive', 
                                                        max_iter=250, random_state=42))
clf_MLP_5.fit(X_train, y_train)
y_pred_train = clf_MLP_5.predict(X_train)
y_pred_test = clf_MLP_5.predict(X_test)

print('accuracy score with train data: ', accuracy_score(y_train, y_pred_train))
print('accuracy score with test data : ', accuracy_score(y_test, y_pred_test))
print('f1 score with train data: ', f1_score(y_train, y_pred_train, average='macro'))
print('f1 score with test data : ', f1_score(y_test, y_pred_test, average='macro'))



accuracy score with train data:  0.7706428571428572
accuracy score with test data :  0.7572142857142857
f1 score with train data:  0.7612512999890869
f1 score with test data :  0.7473287422422983
Wall time: 5min 36s
