In [None]:
import keras
import keras.layers as L
import keras.initializers as init

In [None]:
net = keras.models.Sequential()

net.add(L.InputLayer([32, 32, 3]))

net.add(L.Conv2D(filters=512, kernel_size=(5, 5), 
                 kernel_initializer=init.zeros()))
net.add(L.Activation('relu'))

net.add(L.Conv2D(filters=128, kernel_size=(3, 3), 
                 kernel_initializer=init.RandomNormal()))
net.add(L.Activation('relu'))

net.add(L.Conv2D(filters=32, kernel_size=(3, 3), 
                 kernel_initializer=init.RandomNormal()))
net.add(L.Activation('relu'))

net.add(L.MaxPool2D(pool_size=(2, 2)))

net.add(L.Dropout(rate=0.5))

net.add(L.Conv2D(filters=8, kernel_size=(3, 3), 
                 kernel_initializer=init.RandomNormal(), padding='same'))
net.add(L.Activation('relu'))

net.add(L.MaxPool2D(pool_size=(3, 3)))

net.add(L.Flatten()) # convert 3d tensor to a vector of features

net.add(L.Dense(units=num_classes))

net.add(L.Activation('softmax'))


* [Conv2D](https://keras.io/layers/convolutional/#conv2d) - performs convolution:
    * filters: number of output channels;
    * kernel_size: an integer or tuple/list of 2 integers, specifying the width and height of the 2D convolution window;
    * padding: padding="same" adds zero padding to the input, so that the output has the same width and height, padding='valid' performs convolution only in locations where kernel and the input fully overlap;
    * activation: "relu", "tanh", etc.
    * input_shape: shape of input.
* [MaxPooling2D](https://keras.io/layers/pooling/#maxpooling2d) - performs 2D max pooling.
* [Flatten](https://keras.io/layers/core/#flatten) - flattens the input, does not affect the batch size.
* [Dense](https://keras.io/layers/core/#dense) - fully-connected layer.
    * Activation - applies an activation function.
* [LeakyReLU](https://keras.io/layers/advanced-activations/#leakyrelu) - applies leaky relu activation.
* [Dropout](https://keras.io/layers/core/#dropout) - applies dropout.

## Book of grudges
* zero init for weights will cause symmetry effect
* Too many filters for first 3x3 convolution - will lead to enormous matrix while there's just not enough relevant combinations of 3x3 images (overkill).
* Usually the further you go, the more filters you need.
* large filters (10x10 is generally a bad pactice, and you definitely need more than 10 of them
* the second of 10x10 convolution gets 8x6x6 image as input, so it's technically unable to perform such convolution.
* Softmax nonlinearity effectively makes only 1 or a few neurons from the entire layer to "fire", rendering 512-neuron layer almost useless. Softmax at the output layer is okay though
* Dropout after probability prediciton is just lame. A few random classes get probability of 0, so your probabilities no longer sum to 1 and crossentropy goes -inf

In this exercise you have to train a new Convolutional Neural Network from scratch for the classification of images.

1. For this we will use the Keras library.
2. The aim is to achieve 99% accuracy (on validation/test set) the MNIST dataset http://yann.lecun.com/exdb/mnist/.
3. We have provided a basic Keras implementation of a CNN.
4. You are allowed to do whatever you want (except copy pasting) with the network as long as it is explained in your report.
5. Feel free to change the architecture of the network as well as parameters (e.g. learning rate, kernel sizes, ...).
6. You can try to guess parameters manually of you want, just make sure that it performs better than 99% on the validation set.
7. Sketch the final network architecture in your report.
8. Make sure you train the network on the GPU, otherwise it will be too slow.
9. Explain the plots: learning curve, accuracy wrt epoch.

* Use `tf.keras.datasets.cifar10.load_data()` to get the data
* split to 70 - 30 train / val using `train_test_split`
* normalize the input like $x_{\text{norm}} = \frac{x}{255} - 0.5$
* We need to convert class labels to one-hot encoded vectors. Use `keras.utils.to_categorical`.

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.layers.experimental import preprocessing


In [None]:
ds=keras.datasets.cifar10.load_data() 

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [None]:
tf.executing_eagerly()

True

In [None]:
train_ds=ds[0][0]
train_label=ds[0][1]
test_ds=ds[1][0]
test_label=ds[1][1]
num_classes=len(np.unique(test_label))
num_classes

10

In [None]:
x_train = keras.layers.experimental.preprocessing.Rescaling(scale=(1./255))(train_ds) #for scaling
x_val = keras.layers.experimental.preprocessing.Rescaling(scale=(1./255))(test_ds)


In [None]:
y_train=keras.utils.to_categorical(train_label,num_classes=num_classes)
y_val=keras.utils.to_categorical(test_label,num_classes=num_classes)


In [None]:
batch_size = 100

In [None]:
x_train.shape

TensorShape([50000, 32, 32, 3])

In [None]:
net.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 28, 28, 512)       38912     
_________________________________________________________________
activation_4 (Activation)    (None, 28, 28, 512)       0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 26, 26, 128)       589952    
_________________________________________________________________
activation_5 (Activation)    (None, 26, 26, 128)       0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 24, 24, 32)        36896     
_________________________________________________________________
activation_6 (Activation)    (None, 24, 24, 32)        0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 32)       

In [None]:
net.compile(
  optimizer=tf.keras.optimizers.Adam(),
  loss=tf.losses.SparseCategoricalCrossentropy(),
  metrics=['accuracy'],
  steps_per_execution=30)

In [None]:
net.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=10,
          verbose=1,
          validation_data=(x_val, y_val))

Epoch 1/10


ValueError: ignored

# normalize inputs
# convert class labels to one-hot encoded, should have shape (?, NUM_CLASSES)
y_train = ### YOUR CODE HERE
y_test = ### YOUR CODE HERE

x_val = ### YOUR CODE HERE
x_val = ### YOUR CODE HERE

y_test = ### YOUR CODE HERE
y_test = ### YOUR CODE HERE