# MLP
- For the MLP I used ReLU for the activation function and a 0.2 dropout for the regularization.
- For the optimizer, I tried using ADAM instead of SGD but since I got a lower accuracy with ADAM, I continued using SGD instead.
> 256 hidden layers, 20 epochs
> * **Adam** - 44.8%
> * **SGD** - 48.0%

- I also tested using 256, 512, or 1024 hidden layers. Each of them have close results but I chose 512 layers as a compromise to get a higher accuracy with lower training time.
> SGD optimizer, 20 epochs
> * **256** - 48.0%
> * **512** - 48.9%
> * **1024** - 50%

- I also adjusted the number of epochs and found that using 100 epochs is enough to train the model without overfitting
> 512 hidden layers, SGD optimizer
> * **20** - 48.9%
> * **30** - 51.1%
> * **50** - 52.3%
> * **100** - 55.0%

- The final parameters I used are **512** hidden units, **SGD** optimizer, and **100** epochs

In [47]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.datasets import cifar10

# load dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# compute the number of labels
num_labels = len(np.unique(y_train))

# convert to one-hot vector
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# change the shape from a 2d image (assumed square) with 3 channels
image_size = x_train.shape[1]
image_channels = x_train.shape[3]
input_size = image_size * image_size * image_channels

# reshape to a vector
x_train = np.reshape(x_train, [-1, input_size])
x_test = np.reshape(x_test, [-1, input_size])

# change datatype to float32
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# network parameters
batch_size = 128
hidden_units = 512
dropout = 0.2

# Create a 3-layer MLP with ReLU and dropout regularization
model = Sequential()

# layer 1
model.add(Dense(hidden_units, input_dim = input_size))
model.add(Activation('relu'))
model.add(Dropout(dropout))

# layer 2
model.add(Dense(hidden_units))
model.add(Activation('relu'))
model.add(Dropout(dropout))

# output layer
model.add(Dense(num_labels))
model.add(Activation('softmax'))

model.summary()

# create the loss function for a one-hot vector
model.compile(
    loss='categorical_crossentropy',
    optimizer='sgd',
    metrics=['accuracy']
)

# train network
model.fit(x_train, y_train, epochs=100, batch_size=batch_size)

# use test data to validate
score = model.evaluate(x_test, y_test, batch_size=batch_size)
print("\n ACCURACY: %.1f%%" % (100.0 * score[1]))

Model: "sequential_17"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_48 (Dense)             (None, 512)               1573376   
_________________________________________________________________
activation_52 (Activation)   (None, 512)               0         
_________________________________________________________________
dropout_34 (Dropout)         (None, 512)               0         
_________________________________________________________________
dense_49 (Dense)             (None, 512)               262656    
_________________________________________________________________
activation_53 (Activation)   (None, 512)               0         
_________________________________________________________________
dropout_35 (Dropout)         (None, 512)               0         
_________________________________________________________________
dense_50 (Dense)             (None, 10)              

# CNN
- For the CNN, I essentially tested 
- For the kernel size, I tried using a 3x3 and 5x5 kernel, but at 10 epochs the 3x3 kernel model is more accurate (but there isn't that much of a difference in accuracy).
> Using 64 filters, 10 epochs, adam optimizer, valid padding and a (1,1) stride
> - **3x3** - 68.0%
> - **5x5** - 66.5%
- I then tried using 32 and 64 filters, but it seems that 64 filters already is the better one to use
- When it comes to padding, it seems better if the image size is preserved
> Using 64 filters, 10 epochs, adam optimizer and a (1,1) stride
> - **valid** - 67.10%
> - **same** - 70.20%
- The final parameters are used are **3x3** kernel size, **64** filters, **10** epochs, **adam** optimizer, **(1,1)** stride, **same** padding



In [69]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout
from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.utils import to_categorical, plot_model

from keras.datasets import cifar10

# load dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# compute the number of labels
num_labels = len(np.unique(y_train))

# convert to one-hot vector
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
 
image_size = x_train.shape[1]
image_channels = x_train.shape[3]

x_train = np.reshape(x_train, [-1, image_size, image_size, image_channels])
x_test = np.reshape(x_test, [-1, image_size, image_size, image_channels])

# change datatype
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# network parameters
input_shape = (image_size, image_size, image_channels)
batch_size = 64
kernel_size = 3
pool_size = 2
filters = 64
dropout = 0.2

# Create 3-layer CNN
cnn = Sequential()

# layer 1
cnn.add(Conv2D(
    filters = filters,
    kernel_size = kernel_size,
    activation = 'relu',
    input_shape = input_shape,
    padding = 'same'
))
cnn.add(MaxPooling2D(pool_size))

# layer 2
cnn.add(Conv2D(
    filters = filters,
    kernel_size = kernel_size,
    activation = 'relu',
    padding = 'same'
))
cnn.add(MaxPooling2D(pool_size))

# output layer
cnn.add(Conv2D(
    filters = filters,
    kernel_size = kernel_size,
    activation = 'relu',
    strides = strides,
    padding = 'same'
))
cnn.add(Flatten())

# output should be a 10-dim one-hot vector
cnn.add(Dense(num_labels))
cnn.add(Dropout(dropout))
cnn.add(Activation('softmax'))

cnn.summary()

# create the loss function for a one-hot vector
cnn.compile(
    loss = 'categorical_crossentropy',
    optimizer = 'adam',
    metrics = ['accuracy']
)

# train network
cnn.fit(x_train, y_train, epochs=10, batch_size=batch_size)

loss, acc = cnn.evaluate(x_test, y_test, batch_size=batch_size)
print("\nTest accuracy: %.1f%%" % (100.0 * acc))


Model: "sequential_39"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_68 (Conv2D)           (None, 32, 32, 64)        1792      
_________________________________________________________________
max_pooling2d_45 (MaxPooling (None, 16, 16, 64)        0         
_________________________________________________________________
conv2d_69 (Conv2D)           (None, 16, 16, 64)        36928     
_________________________________________________________________
max_pooling2d_46 (MaxPooling (None, 8, 8, 64)          0         
_________________________________________________________________
conv2d_70 (Conv2D)           (None, 8, 8, 64)          36928     
_________________________________________________________________
flatten_21 (Flatten)         (None, 4096)              0         
_________________________________________________________________
dense_70 (Dense)             (None, 10)              

# Comparison
As we can observe, even with a few training epochs, the CNN model can better classify images from the CIFAR10 dataset compared to the MLP model.

The main reason for this is because the CNN model is more suitable for images. This is due to the convolutions using relationship of the data point to other adjacent data points. The CNN can then recognize small patterns from a date set and merge it with other discovered patterns to recognize a larger object (e.g patterns of lines and contours can help identify different objects).