# Material in progress for further sessions
Adithya Jayan

Code borrowed from: Coursera - Introduction to deep learning - HSE university

Entropy, Feature Importance

### Let's start from the basics

- What is deep learning
- Neuron and its architecture
- Activation Function
- How do neural networks learn and Backpropagation

### Vanishing Gradient, Remedy for saddle points and local minima,  

#### Deciding your model

Deciding your model might not be straightforward. It depends on various factors like the type of data, the output needed, what we're trying to achive, how fast it needs to be and a lot more.

<img src="Images/Misunderstanding.jpg" alt="Overfit Model" width="500"/>

<img src="Images/Info.jpg" alt="Overfit Model" width="300"/>

ML is like a language, you need to convey what you need through the model
- Ex: say you want to predict car price
<img src="Images/Car1.jpg" alt="Overfit Model" width="500"/>
- Naive way would be to do this
<img src="Images/Car2.jpg" alt="Overfit Model" width="500"/>
- With some more knowledge, we would know that the images are low level and hence would need some more proccesing compared to high level data.
<img src="Images/Car3.jpg" alt="Overfit Model" width="500"/>
- Say we wanted to reduce importance of the images on final output
<img src="Images/Car4.jpg" alt="Overfit Model" width="500"/>


### What is a convolution layer?
* Add stuff here

## Example of a Convolutional Neural network (Deep learning) 

- Below we import the neccesarry libraries

In [13]:
# ML libraries
import tensorflow as tf
from tensorflow import keras
from keras import backend as K
import keras_utils

print(tf.__version__)
print(keras.__version__)

#Numerical Python (Numpy) for math
import numpy as np

#MAtplotlib for plotting
%matplotlib inline
import matplotlib.pyplot as plt

AttributeError: module 'keras.engine.base_layer' has no attribute 'BaseRandomLayer'

 - Downloading the required data (If we have our own data, we would load that instead)
 - Here we're using the CIFAR-10 dataset. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

In [None]:
#Downloading the dataset
from keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

In [None]:
#Visualize dataframe dimentions
print("Train samples:", x_train.shape, y_train.shape)
print("Test samples:", x_test.shape, y_test.shape)

In [None]:
# Define the labels - (It is defined in the dataset)
NUM_CLASSES = 10
cifar10_classes = ["airplane", "automobile", "bird", "cat", "deer", 
                   "dog", "frog", "horse", "ship", "truck"]

- Visualize the input data

In [None]:
# show random images from train
cols = 8
rows = 2
fig = plt.figure(figsize=(2 * cols - 1, 2.5 * rows - 1))
for i in range(cols):
    for j in range(rows):
        random_index = np.random.randint(0, len(y_train))
        ax = fig.add_subplot(rows, cols, i * rows + j + 1)
        ax.grid('off')
        ax.axis('off')
        ax.imshow(x_train[random_index, :])
        ax.set_title(cifar10_classes[y_train[random_index, 0]])
plt.show()

We need to normalize inputs like this: $$ x_{norm} = \frac{x}{255} - 0.5 $$

We need to convert class labels to one-hot encoded vectors. Use __keras.utils.to_categorical__.

In [None]:
# normalize inputs
x_train2 = (x_train/255)-0.5
x_test2 = (x_test/255)-0.5

# convert class labels to one-hot encoded
y_train2 = tf.keras.utils.to_categorical(y_train,NUM_CLASSES)
y_test2 = tf.keras.utils.to_categorical(y_test,NUM_CLASSES)

In [None]:
print(np.std(x_train2))
print(np.mean(x_train2))

### Defining the model

In [None]:
# import necessary building blocks
from keras.models import Sequential #The model type
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Activation, Dropout 
from keras.layers.advanced_activations import LeakyReLU

Convolutional networks are built from several types of layers:
- [Conv2D](https://keras.io/layers/convolutional/#conv2d) - performs convolution:
    - **filters**: number of output channels; 
    - **kernel_size**: an integer or tuple/list of 2 integers, specifying the width and height of the 2D convolution window;
    - **padding**: padding="same" adds zero padding to the input, so that the output has the same width and height, padding='valid' performs convolution only in locations where kernel and the input fully overlap;
    - **activation**: "relu", "tanh", etc.
    - **input_shape**: shape of input.
- [MaxPooling2D](https://keras.io/layers/pooling/#maxpooling2d) - performs 2D max pooling.
- [Flatten](https://keras.io/layers/core/#flatten) - flattens the input, does not affect the batch size.
- [Dense](https://keras.io/layers/core/#dense) - fully-connected layer.
- [Activation](https://keras.io/layers/core/#activation) - applies an activation function.
- [LeakyReLU](https://keras.io/layers/advanced-activations/#leakyrelu) - applies leaky relu activation.
- [Dropout](https://keras.io/layers/core/#dropout) - applies dropout.

In [None]:
#Function to make the model by adding one layer at a time onto the sequential stack

def make_model():
    """
    Returns `Sequential` model.
    """
    model = Sequential()
    model.add(Conv2D(16,kernel_size=(3,3),input_shape=(32,32,3),padding="same"))
    model.add(LeakyReLU(0.1))
    model.add(Conv2D(32,kernel_size=(3,3),padding="same"))
    model.add(LeakyReLU(0.1))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=2, padding="same"))
    model.add(Dropout(0.25))
    model.add(Conv2D(32,kernel_size=(3,3),padding="same"))
    model.add(LeakyReLU(0.1))
    model.add(Conv2D(64,kernel_size=(3,3),padding="same"))
    model.add(LeakyReLU(0.1))
    model.add(MaxPooling2D(pool_size=(2, 2), strides=2, padding="same"))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(256))
    model.add(LeakyReLU(0.1))
    model.add(Dropout(0.5))
    model.add(Dense(NUM_CLASSES))
    model.add(Activation("softmax"))
    
    return model

In [None]:
# describe the model
s = tf.keras.backend.clear_session()  # clear default graph
model = make_model()
model.summary()

 - We can see above that the model above has a little more than a million trainable parameters.
 - Now we compile the model before training. This is where we define the loss, optimizer and the metrics to be reported.

In [None]:
## Compiling the model

INIT_LR = 5e-3  # initial learning rate
BATCH_SIZE = 32
EPOCHS = 10

s = tf.keras.backend.clear_session()  # clear default graph
# don't call K.set_learning_phase() !!! (otherwise will enable dropout in train/test simultaneously)
model = make_model()  # define our model

# prepare model for fitting (loss, optimizer, etc)
model.compile(
    loss='categorical_crossentropy',  # we train 10-way classification
    optimizer=tf.keras.optimizers.Adamax(learning_rate=INIT_LR),  # for SGD
    metrics=['accuracy']  # report accuracy during training
)

Training takes approximately **1.5 hours**.

In [None]:
# we will save model checkpoints to continue training in case of kernel death
model_filename = 'Saved_Models/cifar_demo.{epoch:03d}.hdf5'
last_finished_epoch = None #Change this to last finished epoch if continuing from middle

#### uncomment below to continue training from model checkpoint
#### fill `last_finished_epoch` with your latest finished epoch

# from keras.models import load_model
# s = tf.keras.backend.clear_session()
# last_finished_epoch = 6
# model = load_model(model_filename.format(last_finished_epoch))
    
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
    filepath=model_filename,
    save_weights_only=True,
    monitor='val_accuracy',
    mode='max',
    save_best_only=True)

In [None]:
# fit model
model.fit(
    x_train2, y_train2,  # prepared data
    batch_size=BATCH_SIZE,
    epochs=EPOCHS,
    callbacks=[model_checkpoint_callback],
    validation_data=(x_test2, y_test2),
    shuffle=True,
    verbose=1,
    initial_epoch=(last_finished_epoch or 0)
)

In [None]:
# load weights from file (In case model is already saved - from checkpoint)
model.load_weights(model_filename)

### Evaluate model

In [None]:
# make test predictions
y_pred_test = model.predict(x_test2)
y_pred_test_classes = np.argmax(y_pred_test, axis=1)
y_pred_test_max_probas = np.max(y_pred_test, axis=1)

In [None]:
# confusion matrix and accuracy
from sklearn.metrics import confusion_matrix, accuracy_score
plt.figure(figsize=(7, 6))
plt.title('Confusion matrix', fontsize=16)
plt.imshow(confusion_matrix(y_test, y_pred_test_classes))
plt.xticks(np.arange(10), cifar10_classes, rotation=45, fontsize=12)
plt.yticks(np.arange(10), cifar10_classes, fontsize=12)
plt.colorbar()
plt.show()
print("Test accuracy:", accuracy_score(y_test, y_pred_test_classes))

In [None]:
# inspect preditions
cols = 8
rows = 2
fig = plt.figure(figsize=(2 * cols - 1, 3 * rows - 1))
for i in range(cols):
    for j in range(rows):
        random_index = np.random.randint(0, len(y_test))
        ax = fig.add_subplot(rows, cols, i * rows + j + 1)
        ax.grid('off')
        ax.axis('off')
        ax.imshow(x_test[random_index, :])
        pred_label = cifar10_classes[y_pred_test_classes[random_index]]
        pred_proba = y_pred_test_max_probas[random_index]
        true_label = cifar10_classes[y_test[random_index, 0]]
        ax.set_title("pred: {}\nscore: {:.3}\ntrue: {}".format(
               pred_label, pred_proba, true_label
        ))
plt.show()

# Visualize maximum stimuli

We want to find input images that provide maximum activations for particular layers of our network. 

We will find those maximum stimuli via gradient ascent in image space.

For that task we load our model weights, calculate the layer output gradient with respect to image input and shift input image in that direction.

In [None]:
s = tf.keras.backend.clear_session()  # clear default graph
K.set_learning_phase(0)  # disable dropout
model = make_model()
model.load_weights("weights.h5")  # that were saved after model.fit

In [None]:
# all weights we have
model.summary()

In [None]:
def find_maximum_stimuli(layer_name, is_conv, filter_index, model, iterations=20, step=1., verbose=True):
    
    def image_values_to_rgb(x):
        # normalize x: center on 0 (np.mean(x_train2)), ensure std is 0.25 (np.std(x_train2))
        # so that it looks like a normalized image input for our network
        x = x-np.mean(x)
        x=x/np.sqrt(4*np.std(x))
        ### YOUR CODE HERE

        # do reverse normalization to RGB values: x = (x_norm + 0.5) * 255
        x = (x+0.5) * 255   ### YOUR CODE HERE
    
        # clip values to [0, 255] and convert to bytes
        x = np.clip(x, 0, 255).astype('uint8')
        return x

    # this is the placeholder for the input image
    input_img = model.input
    img_width, img_height = input_img.shape.as_list()[1:3]
    
    # find the layer output by name
    layer_output = list(filter(lambda x: x.name == layer_name, model.layers))[0].output

    # we build a loss function that maximizes the activation
    # of the filter_index filter of the layer considered
    if is_conv:
        # mean over feature map values for convolutional layer
        loss = K.mean(layer_output[:, :, :, filter_index])
    else:
        loss = K.mean(layer_output[:, filter_index])

    # we compute the gradient of the loss wrt input image
    grads = K.gradients(loss, input_img)[0]  # [0] because of the batch dimension!

    # normalization trick: we normalize the gradient
    grads = grads / (K.sqrt(K.sum(K.square(grads))) + 1e-10)

    # this function returns the loss and grads given the input picture
    iterate = K.function([input_img], [loss, grads])

    # we start from a gray image with some random noise
    input_img_data = np.random.random((1, img_width, img_height, 3))
    input_img_data = (input_img_data - 0.5) * (0.1 if is_conv else 0.001)

    # we run gradient ascent
    for i in range(iterations):
        loss_value, grads_value = iterate([input_img_data])
        input_img_data += grads_value * step
        if verbose:
            print('Current loss value:', loss_value)

    # decode the resulting input image
    img = image_values_to_rgb(input_img_data[0])
    
    return img, loss_value

In [None]:
# sample maximum stimuli
def plot_filters_stimuli(layer_name, is_conv, model, iterations=20, step=1., verbose=False):
    cols = 8
    rows = 2
    filter_index = 0
    max_filter_index = list(filter(lambda x: x.name == layer_name, model.layers))[0].output.shape.as_list()[-1] - 1
    fig = plt.figure(figsize=(2 * cols - 1, 3 * rows - 1))
    for i in range(cols):
        for j in range(rows):
            if filter_index <= max_filter_index:
                ax = fig.add_subplot(rows, cols, i * rows + j + 1)
                ax.grid('off')
                ax.axis('off')
                loss = -1e20
                while loss < 0 and filter_index <= max_filter_index:
                    stimuli, loss = find_maximum_stimuli(layer_name, is_conv, filter_index, model,
                                                         iterations, step, verbose=verbose)
                    filter_index += 1
                if loss > 0:
                    ax.imshow(stimuli)
                    ax.set_title("Filter #{}".format(filter_index))
    plt.show()

In [None]:
# maximum stimuli for convolutional neurons
conv_activation_layers = []
for layer in model.layers:
    if isinstance(layer, LeakyReLU):
        prev_layer = layer.inbound_nodes[0].inbound_layers[0]
        if isinstance(prev_layer, Conv2D):
            conv_activation_layers.append(layer)

for layer in conv_activation_layers:
    print(layer.name)
    plot_filters_stimuli(layer_name=layer.name, is_conv=True, model=model)

In [None]:
# maximum stimuli for last dense layer
last_dense_layer = list(filter(lambda x: isinstance(x, Dense), model.layers))[-1]
plot_filters_stimuli(layer_name=last_dense_layer.name, is_conv=False, 
                     iterations=200, step=0.1, model=model)


What we've done:
- defined CNN architecture
- trained your model
- evaluated your model
- visualised learnt filters