<a href="https://colab.research.google.com/github/Ananassio/Data-Science_lab/blob/main/week_6/6_a_Non-Linear_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Block 6 Exercise 1: Non-Linear Classification

## MNIST Data
We return to the MNIST data set on handwritten digits to compare non-linear classification algorithms ...   

In [1]:
#imports 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml

In [2]:
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)


In [60]:
np.shape(X)

(70000, 784)

In [4]:
#convert dataframe to array 
X = X.to_numpy()


### E1.1: Cross-Validation and Support Vector Machines
Train and optimize  C-SVM classifier on MNIST (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
* use a RBF kernel
* use *random search* with cross-validation to find the best settings for *gamma* and *C* (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)
* use max_iter in the SVM to avoid long training times 

In [5]:
from sklearn.svm import SVC
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

In [6]:
%%time
clf =SVC(gamma='auto',max_iter=10)

CPU times: user 28 µs, sys: 2 µs, total: 30 µs
Wall time: 34.3 µs


In [7]:
%%time
clf.fit(X,y)

CPU times: user 14.6 s, sys: 127 ms, total: 14.8 s
Wall time: 14.7 s




SVC(gamma='auto', max_iter=10)

In [8]:
from sklearn.metrics import accuracy_score

In [9]:
accuracy_score(y,clf.predict(X))

0.11587142857142857

In [10]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform

In [11]:
distributions = {
     'C': [0.5,1,5,10], 
     'gamma':['auto', 'scale'] 
     }

In [12]:
clf2 = RandomizedSearchCV(estimator=clf, param_distributions= distributions, random_state=0)


In [13]:
search = clf2.fit(X, y);
print(search.best_params_);



{'gamma': 'scale', 'C': 1}




In [14]:
#optimiced parameters
clf1 =SVC(C=1,gamma='scale',max_iter=10)

In [15]:
%%time
clf1.fit(X,y)

CPU times: user 14.4 s, sys: 56.9 ms, total: 14.4 s
Wall time: 15.3 s




SVC(C=1, max_iter=10)

In [16]:
accuracy_score(y,clf1.predict(X))

0.6908714285714286

### E1.2: Pipelines and simple Neural Networks
Split the MNIST data into  train- and test-sets and then train and evaluate a simple Multi Layer Perceptron (MLP) network. Since the non-linear activation functions of MLPs are sensitive to the scaling on the input (recall the *sigmoid* function), we need to scale all input values to [0,1] 

* combine all steps of your training in a SKL pipeline (https://scikit-learn.org/stable/modules/compose.html#pipeline)
* use a SKL-scaler to scale the data (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
* MLP Parameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
    * use a *SGD* solver
    * use *tanh* as activation function
    * compare networks with 1, 2 and 3 layers, use different numbers of neurons per layer
    * adjust training parameters *alpha* (regularization) and *learning rate* - how sensitive is the model to these parameters?
    * Hint: do not change all parameters at the same time, split into several experiments
* How hard is it to find the best parameters? How many experiments would you need to find the best parameters?
    


In [17]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15) #split 85% train 15% test

In [18]:
from sklearn.neural_network import MLPClassifier

In [49]:
mlp = make_pipeline(StandardScaler(), MLPClassifier(hidden_layer_sizes=(100,), solver = 'sgd', activation = 'tanh', random_state=1, max_iter=5))

In [50]:
mlp.fit(X_train, y_train)



Pipeline(steps=[('standardscaler', StandardScaler()),
                ('mlpclassifier',
                 MLPClassifier(activation='tanh', max_iter=5, random_state=1,
                               solver='sgd'))])

In [51]:
pred = mlp.predict(X_test)

In [52]:
accuracy_score(y_test,pred)

0.9024761904761904

In [53]:
# parameter optimazation 

distributions1 = {
     'mlpclassifier__hidden_layer_sizes': [(100,),(250,),(500,),                          # 1 Layer
                                           (100,100),(250,250),(500,500),                 # 2 Layer
                                           (100,100,100),(250,250,250),(500,500,500)],    # 3 Layer
     }

In [54]:
mlp.get_params().keys()

dict_keys(['memory', 'steps', 'verbose', 'standardscaler', 'mlpclassifier', 'standardscaler__copy', 'standardscaler__with_mean', 'standardscaler__with_std', 'mlpclassifier__activation', 'mlpclassifier__alpha', 'mlpclassifier__batch_size', 'mlpclassifier__beta_1', 'mlpclassifier__beta_2', 'mlpclassifier__early_stopping', 'mlpclassifier__epsilon', 'mlpclassifier__hidden_layer_sizes', 'mlpclassifier__learning_rate', 'mlpclassifier__learning_rate_init', 'mlpclassifier__max_fun', 'mlpclassifier__max_iter', 'mlpclassifier__momentum', 'mlpclassifier__n_iter_no_change', 'mlpclassifier__nesterovs_momentum', 'mlpclassifier__power_t', 'mlpclassifier__random_state', 'mlpclassifier__shuffle', 'mlpclassifier__solver', 'mlpclassifier__tol', 'mlpclassifier__validation_fraction', 'mlpclassifier__verbose', 'mlpclassifier__warm_start'])

In [55]:
mlp1 = RandomizedSearchCV(estimator=mlp, param_distributions= distributions1, random_state=0)

In [56]:
%%time
search = mlp1.fit(X_train, y_train);



CPU times: user 36min 17s, sys: 13min, total: 49min 17s
Wall time: 25min 39s




In [58]:
print(search.best_params_);

{'mlpclassifier__hidden_layer_sizes': (500, 500, 500)}


In [59]:
mlp = make_pipeline(StandardScaler(), MLPClassifier(hidden_layer_sizes=(500,500,500), solver = 'sgd', activation = 'tanh', random_state=1, max_iter=5));
mlp.fit(X_train, y_train);
pred = mlp.predict(X_test)
accuracy_score(y_test,pred)



0.9185714285714286

In [41]:
# parameter optimazation 

distributions2 = {
     'mlpclassifier__learning_rate': ['constant', 'invscaling', 'adaptive'], 
     'mlpclassifier__alpha':[0.00005,0.0001,0.0005],
     }

In [42]:
mlp1 = RandomizedSearchCV(estimator=mlp, param_distributions= distributions2, random_state=0)

In [43]:
%%time
search = mlp1.fit(X_train, y_train);



CPU times: user 1h 51min 32s, sys: 36min 10s, total: 2h 27min 43s
Wall time: 1h 16min 1s




In [44]:
print(search.best_params_);

{'mlpclassifier__learning_rate': 'constant', 'mlpclassifier__alpha': 5e-05}


In [61]:
mlp = make_pipeline(StandardScaler(), MLPClassifier(hidden_layer_sizes=(500,500,500), solver = 'sgd', activation = 'tanh',alpha=0.00005,learning_rate='constant', random_state=1, max_iter=5));
mlp.fit(X_train, y_train);
pred = mlp.predict(X_test)
accuracy_score(y_test,pred)



0.9185714285714286