# Homework 2 – Modeling & Evaluation

## Tasks
1. Classify the complete MNIST dataset (see exercise 2, task 2.3) using a Decision Tree Classifier. Evaluate the model by a suitable performance metric.
* Repeat classification by applying a MLP Classifier (ANN). Which model parameters are used by default?
* Try to achieve a better result by applying a [Grid Search](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV). Do you think you found the best possible solution? – Write a short statement.

Loading the data

In [3]:
#supress warnings for a better readability
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')

In [4]:
print(mnist.data.shape)

(70000, 784)


Splitting the data into test and trainings set

In [5]:
from sklearn.model_selection import train_test_split
train_img, test_img, train_lbl, test_lbl = train_test_split(mnist.data,mnist.target, test_size = 1/7.0, random_state = 0)

### Task 1 - Classify the MNIST dataset by a DecisionTree


#### Model Training

1) Import the model

In [6]:
from sklearn.tree import DecisionTreeClassifier

2) Instantiation

In [7]:
dtc = DecisionTreeClassifier(random_state = 0)

3) Train the model

In [8]:
dtc.fit(train_img, train_lbl)

DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, presort=False, random_state=0,
            splitter='best')

4)Predict data

In [9]:
dtc_pred = dtc.predict(test_img)

In [10]:
#see some results
tmp = pd.DataFrame({'Test': test_lbl, 'Predicted': dtc_pred})
tmp.sample(10)

Unnamed: 0,Test,Predicted
8580,5.0,5.0
153,9.0,9.0
2731,0.0,0.0
994,7.0,7.0
118,0.0,0.0
7210,7.0,3.0
1181,7.0,7.0
8540,6.0,0.0
7534,7.0,7.0
5727,5.0,5.0


#### Model Performance

In [11]:
from sklearn import metrics
print('Accuracy of the DTC: ', dtc.score(test_img, test_lbl))
print("")
print('Confusion Matric of the DTC: ')
print(metrics.confusion_matrix(test_lbl,dtc_pred))
print("")
print('F1-Measure of the DTC: ', metrics.f1_score(test_lbl,dtc_pred, average = 'weighted'))

Accuracy of the DTC:  0.8798

Confusion Matric of the DTC: 
[[ 867    2   11    7    6    9   14    5    7    8]
 [   0 1116    9    6    6    4    5    9    7    1]
 [   9    8  845   23   11    9   16   24   29    8]
 [   5    5   31  873    6   37    6   13   42   20]
 [   2    5   12    7  835   11   15    8   22   31]
 [   9    7    7   33   14  766   22    8   34   21]
 [  18    3   15    5   11   14  933    1   10    3]
 [   1    7   22   11    9    4    1  936    9   29]
 [  11   16   23   35   20   32   17   11  787   26]
 [   7    4   10   12   51   30    1   12   25  840]]

F1-Measure of the DTC:  0.8796888750270813


### Task 2 - Classification by ANN

#### Model Training

1) Import the model

In [12]:
from sklearn.neural_network import MLPClassifier

2) Instantiation

In [13]:
mlp = MLPClassifier(random_state = 0)

3) Train the model

In [14]:
mlp.fit(train_img,train_lbl)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=0, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

4) Predict labels

In [15]:
mlp_pred = mlp.predict(test_img)

In [16]:
#see some results
tmp = pd.DataFrame({'Test': test_lbl, 'Predicted': mlp_pred})
tmp.sample(10)

Unnamed: 0,Test,Predicted
9727,0.0,0.0
1640,8.0,8.0
5775,8.0,8.0
7283,8.0,8.0
7870,4.0,4.0
9782,5.0,5.0
8668,6.0,6.0
4031,5.0,5.0
2615,9.0,9.0
3673,3.0,3.0


#### Model Performance

In [17]:
from sklearn import metrics
print('Accuracy of the MLP: ', mlp.score(test_img, test_lbl))
print("")
print('Confusion Matric of the MLP: ')
print(metrics.confusion_matrix(test_lbl,mlp_pred))
print("")
print('F1-Measure of the MLP: ', metrics.f1_score(test_lbl,mlp_pred, average = 'weighted'))

Accuracy of the MLP:  0.9684

Confusion Matric of the MLP: 
[[ 921    0    3    0    0    0    3    2    5    2]
 [   0 1146    6    1    0    0    1    4    4    1]
 [   1    3  963    2    3    2    1    4    3    0]
 [   1    2   14 1004    0    4    0    2    6    5]
 [   0    2    0    0  918    1    8    2    1   16]
 [   0    1    5   14    2  869   14    0    7    9]
 [   3    0    3    1    0    5 1000    0    1    0]
 [   2    2   10    3    6    1    0  994    1   10]
 [   4    9    9    5    3    7    7    0  921   13]
 [   2    3    1    8   16    2    0    9    3  948]]

F1-Measure of the MLP:  0.9683754245368628


#### Default model parameters

Some model parameters that are used by default:
* number of neurons in the hidden layer (here: 100)
* learning rate (here: constant)
* solver for weigth optimization (here: *adam*)
* penalty parameter (alpha, here: 0.0001)
* initial learning rate (here: 0.001)


### Task 3 - Grid Search

In [18]:
from sklearn.model_selection import GridSearchCV
parameters = {'alpha': [0.01,0.001,0.0001,0.0005,0.0001],'hidden_layer_sizes':[50,100,150]}
grid = GridSearchCV(MLPClassifier(),parameters, n_jobs = -1)
grid.fit(train_img,train_lbl)

GridSearchCV(cv='warn', error_score='raise-deprecating',
       estimator=MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False),
       fit_params=None, iid='warn', n_jobs=-1,
       param_grid={'alpha': [0.01, 0.001, 0.0001, 0.0005, 0.0001], 'hidden_layer_sizes': [50, 100, 150]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)

In [19]:
grid_pred = grid.predict(test_img)

In [20]:
#see some results
tmp = pd.DataFrame({'Test': test_lbl, 'Predicted': mlp_pred})
tmp.sample(10)

Unnamed: 0,Test,Predicted
6804,8.0,8.0
5533,1.0,1.0
740,2.0,2.0
4218,8.0,8.0
3288,0.0,0.0
9084,9.0,9.0
232,2.0,2.0
4720,9.0,9.0
8096,7.0,7.0
3012,8.0,8.0


In [21]:
from sklearn import metrics
print('Accuracy of the GridSearch: ', grid.score(test_img, test_lbl))
print("")
print('Confusion Matric of the GridSearch: ')
print(metrics.confusion_matrix(test_lbl,grid_pred))
print("")
print('F1-Measure of the GridSearch: ', metrics.f1_score(test_lbl,grid_pred, average = 'weighted'))

Accuracy of the GridSearch:  0.9688

Confusion Matric of the GridSearch: 
[[ 915    1    4    0    1    5    6    0    3    1]
 [   1 1141    8    1    2    0    0    2    5    3]
 [   1    2  959    6    3    0    2    3    5    1]
 [   0    2    9  998    0    5    0    5   11    8]
 [   0    3    1    0  924    0    4    1    2   13]
 [   0    1    4   18    0  858   14    1   11   14]
 [   2    1    1    0    0    0 1006    1    2    0]
 [   2    1    8    1    6    2    1  994    3   11]
 [   2    1    4    8    3    3    4    3  942    8]
 [   2    1    0    8   12    1    0   10    7  951]]

F1-Measure of the GridSearch:  0.968795501632541


-> Slightly higher F1-Score than in the classic MLPClassifier

#### Comment on the degree of optimality:

No, it is not the optimal solution. There are multiple reasons, e.g.:

* GridSearch were not told to manipulate all parameters of the MLP Classifier (would computation time issues, even the current set up ran for approx. one hour). Therefore not all parameters of the MLP Classifier are tuned.
* Even for the manipulated parameters I do not know whether the optimal value is inside chosen intervals or somewhere outside or between the values (weakness of GridSearch in general, can be adressed by a Random Search).
* Moreover, a classifier other than the MLP could achieve better results for this classification problem in general.

