# LARGE CNN

### This model achieves 0.8 % error

- Convolutional layer with 30 feature maps of size 5×5.
- Pooling layer taking the max over 2*2 patches.
- Convolutional layer with 15 feature maps of size 3×3.
- Pooling layer taking the max over 2*2 patches.
- Dropout layer with a probability of 20%.
- Flatten layer.
- Fully connected layer with 128 neurons and rectifier activation.
- Fully connected layer with 50 neurons and rectifier activation.
- Output layer.


#### To do: try this model with the augmented dataset and ensemble learning

In [None]:
from keras.datasets import mnist

In [56]:
import numpy as np

In [44]:
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras import backend as K
K.set_image_dim_ordering('th')

In [45]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

In [46]:

# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# reshape to be [samples][pixels][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')

In [47]:

# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255



# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

In [50]:

# define the larger model
def larger_model():
	# create model
	model = Sequential()
	model.add(Conv2D(30, (5, 5), input_shape=(1, 28, 28), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Conv2D(15, (3, 3), activation='relu'))
	model.add(MaxPooling2D(pool_size=(2, 2)))
	model.add(Dropout(0.2))
	model.add(Flatten())
	model.add(Dense(128, activation='relu'))
	model.add(Dense(50, activation='relu'))
	model.add(Dense(num_classes, activation='softmax'))
	# Compile model
	model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

In [51]:

# build the model
model = larger_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Large CNN Error: %.2f%%" % (100-scores[1]*100))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Large CNN Error: 0.80%


##### Now we have 0.8% error rate 

Overview on different ways to do ensemble learning https://en.wikipedia.org/wiki/Committee_machine

Ensemble averaging (machine learning)
In general ensemble learning works better if the individual predictors disagree a lot. So the main point is to make each NN very accurate, but with different wronk predictions.

For ensemble learning:

1) we make multiple NN based on different hyperparameters with the same architecture and then we build a last NN using the avarage of the weights of the NN we previously built. OF course in this case what we would like to have is that different NN correctly predict what the other NN predict wrongly. So toghether they should overlap and cover each other mistakes https://en.wikipedia.org/wiki/Ensemble_averaging_(machine_learning) 

    from Wikipedia " Ensemble averaging keeps the less satisfactory networks around, but (gives them, in the averaging process,)     less weight.The theory of ensemble averaging relies on two properties of artificial neural networks:

    1) In any network, the bias can be reduced at the cost of increased variance
    2) In a group of networks, the variance can be reduced at no cost to bias

    Ensemble averaging creates a group of networks, each with low bias and high variance, then combines them to a new network       with (hopefully) low bias and low variance. It is thus a resolution of the bias-variance dilemma. " 
    I am not sure to have fully understood the variance/bias part, maybe one of you can help out on this?


2) Or we can use different NN with different architecture and parameters and make a prediction based on the average of the probability output. So each NN output a probability, in this case an array of 10 probability( we are using one hot encoding) , we sum the the outputs and take the avarage. 

In both scenarios we can use augmented data

To save and load the weights is very simple in Keras:

In [None]:
##   We need h5py to save weights in this format

## We save the weights : the file is usually saved in C:\Users\YourUserName

import h5py 
##Saves the weights
weigths=model.save_weights('my_model_weights.h5')


In [None]:
##  To load the weights
model.load_weights('my_model_weights.h5') 

##for info https://machinelearningmastery.com/save-load-keras-deep-learning-models/



In [93]:
## just to transform one hot encoding back to classes
## prediction[0].tolist().index(max(prediction[0]))==y_test[0].tolist().index(max(y_test[0]))

True