# Block 6 Exercise 1: Non-Linear Classification

## MNIST Data
We return to the MNIST data set on handwritten digits to compare non-linear classification algorithms ...   

In [1]:
#imports 
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml

In [2]:
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)


In [3]:
#the full MNIST data set contains 70k samples of digits 0-9 as 28*28 gray scale images (represented as 784 dim vectors)
np.shape(X)

(70000, 784)

In [4]:
X.min()

0.0

In [5]:
#look at max/min value in the data
X.max()

255.0

### E1.1: Cross-Validation and Support Vector Machines
Train and optimize  C-SVM classifier on MNIST (https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)
* use a RBF kernel
* use *random search* with cross-validation to find the best settings for *gamma* and *C* (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html#sklearn.model_selection.RandomizedSearchCV)

In [None]:
#Achtung der Code braucht zu lange
# Parameter Optimierung. mit Random Search (Zufällige initialisierung) + Cross-Validation (Trainieren und Testen ohne richtigen Test Datensatz)
# RndCV Paramer: 
# estimatorestimator object. A object of that type is instantiated for each grid point.
# param_distributionsdict or list of dicts: Dictionary with parameters names (str) as keys and distributions or lists of parameters to try. 
from sklearn.model_selection import RandomizedSearchCV
from sklearn.svm import SVC
# SVC Parameter:
# Cfloat, default=1.0
# Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.
# Gamma: Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
# kernel: Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used.
SuppVM= SVC(kernel='rbf')
from scipy.stats import uniform
# uniform generate specific real-valued distributions.Return a random floating point number N such that a (loc=0) <= N <= b (scale=4) for a <= b and b <= N <= a for b < a.
distributions= dict(C=uniform(loc=0, scale=4), gamma=(1e-6, 1e+1, 'log-uniform'))
SuppVMrndCV= RandomizedSearchCV(SuppVM, distributions, random_state=0, n_jobs=4)
Suche = SuppVMrndCV.fit(X, y)
Suche.best_params_

### E1.2: Pipelines and simple Neural Networks
Split the MNIST data into  train- and test-sets and then train and evaluate a simple Multi Layer Perceptron (MLP) network. Since the non-linear activation functions of MLPs are sensitive to the scaling on the input (recall the *sigmoid* function), we need to scale all input values to [0,1] 

* combine all steps of your training in a SKL pipeline (https://scikit-learn.org/stable/modules/compose.html#pipeline)
* use a SKL-scaler to scale the data (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)
* MLP Parameters: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
    * use a *SGD* solver
    * use *tanh* as activation function
    * compare networks with 1, 2 and 3 layers, use different numbers of neurons per layer
    * adjust training parameters *alpha* (regularization) and *learning rate* - how sensitive is the model to these parameters?
    * Hint: do not change all parameters at the same time, split into several experiments
* How hard is it to find the best parameters? How many experiments would you need to find the best parameters?
    


In [6]:
#randomly split into train and test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=60000, random_state=42)
from sklearn import preprocessing
X_test_normalized = preprocessing.normalize(X_test, norm='l2', axis=1)
X_train_normalized = preprocessing.normalize(X_train, norm='l2', axis=1)



In [7]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.neural_network import MLPClassifier
# Parameter MLP:
# solver =The solver for weight optimization.‘sgd’ refers to stochastic gradient descent.
#activation= Activation function for the hidden layer. ‘tanh’, the hyperbolic tan function, returns f(x) = tanh(x). Anstatt einer sigmoid ein tangens hyp. als trenner nach erster lin.
# alpha: L2 penalty (regularization term) parameter.
# learning_rate: Learning rate schedule for weight updates.{‘constant’, ‘invscaling’, ‘adaptive’}, default=’constant’}
# n_layers_: int Number of layer. hidden_layer_sizes = ?
PipeMLP = make_pipeline(StandardScaler(), MLPClassifier(hidden_layer_sizes=(50,),solver='sgd',activation= 'tanh', random_state=42, max_iter=10, alpha= 1e-4, learning_rate_init=.1))
PipeMLP.fit(X_train_normalized, y_train)
print("Training set score: %f" % PipeMLP.score(X_train_normalized, y_train))
print("Test set score: %f" % PipeMLP.score(X_test_normalized, y_test))




Training set score: 0.991767
Test set score: 0.953800
