# Inception Neural Network

Welcome to the fourth HDA laboratory! In this notebook, you will implement an advanced architecture: the **Inception-v4 network.** The architecture was proposed by [Google developers](https://arxiv.org/pdf/1602.07261.pdf) for image classification.

**In this assignment, you will:**
- Implement the basic building blocks of Inception-v4.
- Put together these building blocks to implement and train a state-of-the-art neural network for image classification.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
cd '/content/drive/MyDrive/MLHD_labs/Lab_3'

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, Input
from tensorflow.keras.layers import Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D, Dropout, Concatenate
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import pydot
import datetime
from PIL import Image
from IPython.display import SVG
from load_utils import *
from tensorflow.keras.initializers import glorot_uniform
from tensorflow.keras.utils import plot_model
import scipy.misc
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.metrics import precision_recall_fscore_support, roc_curve, auc, accuracy_score, precision_recall_curve

# Dataset
For this lab, you will use the [**PatchCamelyon** dataset](https://github.com/basveeling/pcam). It consists of 327.680 color images (96 x 96px) extracted from histopathologic scans of lymph node sections. Each image is annotated with a binary label indicating the presence of metastatic tissue.
If the label is `1` it means that the center 32x32px region of a patch contains at least one pixel of tumor tissue. Tumor tissue in the outer region of the patch does not influence the label.

In this notebook, you will use a smaller version of the dataset that consists of 6,000 images. Feel free to download the entire dataset to experiment with it (e.g., if you train the network with more examples, the performance of the designed classifier should increase).

The function `load_data()` defined in `load_utils.py`, loads the smaller dataset.

In [None]:
load_data_dir = '../Datasets/Lab_3/'
(x_train, y_train, meta_train), (x_test, y_test, meta_test) = load_data(load_data_dir)

In [None]:
print('TRAIN SET, images: {}'.format(x_train.shape))
print('TRAIN SET, labels: {}'.format(y_train.shape))
train_length = y_train.shape[0]

print('TEST SET, images: {}'.format(x_test.shape))
print('TEST SET, labels: {}'.format(y_test.shape))
test_length = y_test.shape[0]

In [None]:
plt.figure(figsize=[20,10])
plt.subplot(1, 10, 1)

for i in range(10):
    plt.subplot(1, 10, i+1)
    image = x_train[i, :, :, :]
    plt.imshow(image)
    label = y_train[i]
    plt.title('class '+ str(label))
plt.show()

To train the models we will implement below on the image dataset, we use [ImageDataGenerator](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator) that generates batches of tensor image data. This class is useful when using image datasets as it allows applying real-time data augmentation. For the test generator we only apply normalization without data augmentation.

In [None]:
datagen_train = ImageDataGenerator(preprocessing_function=lambda x: x/255.,
                             width_shift_range=4,  # randomly shift images horizontally
                             height_shift_range=4,  # randomly shift images vertically
                             horizontal_flip=True,  # randomly flip images
                             vertical_flip=True)  # randomly flip images

In [None]:
datagen_test = ImageDataGenerator(preprocessing_function=lambda x: x/255.)

In [None]:
batch_size = 32
train_steps = int(np.ceil(train_length/batch_size))
test_steps = int(np.ceil(test_length/batch_size))

# Inception network

In a pure Inception network, there are two different block types: the **inception block** and the **reduction block**.

Inception-v4 is composed of 3 inception blocks, 2 reduction ones and an initial stem block.

<img src="https://drive.google.com/uc?export=view&id=13ERbjo3D_J7SuLGDU5iXSlSilldD1xWT" style="width:800px;">
<caption><center>  <br> </center></caption>

**Note**: for the last activation we will use ``sigmoid`` with one output neuron (binary classification task).

## 1 - Inception-v4 blocks
### 1.1 - Convolutional and batch normalization helper function
First of all, here below is implemented the ``conv2d_bn`` helper function that you will use in all the blocks of the Inception v4 network.

Use the following structure:
- CONV2D with $F$ filters of shape ($h$, $w$), stride of ($s_1$, $s_2$).
- BatchNorm, normalizing the 'channels' axis.  
- ReLU activation function.

In [None]:
# FUNCTION: conv2d_bn block

def conv2d_bn(X_input, filters, kernel_size, strides, padding='same', activation=None,
              name=None):
    """
    Implementation of a conv block as defined above

    Arguments:
    X_input -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
    filters -- integer, defining the number of filters in the CONV layer
    kernel_size -- (f1, f2) tuple of integers, specifying the shape of the CONV kernel
    s -- integer, specifying the stride to be used
    padding -- padding approach to be used
    name -- name for the layers

    Returns:
    X -- output of the conv2d_bn block, tensor of shape (n_H, n_W, n_C)
    """

    # defining name basis
    conv_name_base = 'conv_'
    bn_name_base = 'bn_'

    X = Conv2D(filters = filters, kernel_size = kernel_size, strides = strides,
               padding = padding, name = conv_name_base + name,
               kernel_initializer = glorot_uniform(seed=0))(X_input)
    X = BatchNormalization(axis = 3, name = bn_name_base + name)(X)
    if activation is not None:
        X = Activation(activation)(X)
    return X

### 1.2 - The stem block

The stem block is designed as follows:

<img src="https://drive.google.com/uc?export=view&id=19lyJj1qlee1uIjv1Hafb7DyjWRYvCN7a" style="width:70%">
<caption><center> Stem block. </center></caption>

Implement below all the steps by taking advantage of the above implemented ``conv2d_bn`` function for the blue rectangles.
The values for the kernel sizes and the strides are specified in each block.

*   Use stride 1x1 when it is not specified.
*   Use padding "valid" when the letter **V** appears, otherwise use padding "same".

As an exmaple, the first layer has the following parameters: 32 filters of shape (3, 3), stride of (2, 2), padding "valid" while the third one is composed of 64 filters of shape (3, 3), stride of (1, 1), padding "same".

*  For the **Filter concat** layers (orange rectangles), use  [Concatenate](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Concatenate) and **concatenate along the 'channel' axis (axis=3)**.

*  **Important suggestion**: add a **name** to each of the layers.
*  Use 1 as the seed for the random initialization to reproduce the expected output.
* The last conv block has ``stride = 2`` and the max pooling layer has ``kernel = (3, 3)``.


In [None]:
# FUNCTION: stem_block

def stem_block(X_input):
    """
    Implementation of the stem block as defined above

    Arguments:
    X_input -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)

    Returns:
    X -- output of the stem block, tensor of shape (n_H, n_W, n_C)
    """

    ### START CODE HERE ###

    # First conv
    X = None

    # Second conv
    X = None

    # Third conv
    X = None

    # First branch: max pooling
    branch1 = None

    # Second branch: conv
    branch2 = None

    # Concatenate (1) branch1 and branch2 along the channel axis
    X = None

    # First branch: 2 convs
    branch1 = None

    # Second branch: 4 convs
    branch2 = None

    # Concatenate (2) branch1 and branch2 along the channel axis
    X = None

    # First branch: conv
    branch1 = None

    # Second branch: max pooling
    branch2 = None

    # Concatenate (3) branch1 and branch2 along the channel axis
    X = None

    ### END CODE HERE ###

    return X

In [None]:
tf.random.set_seed(1)
X_inp = tf.random.normal((4, 100, 100, 3), dtype=tf.dtypes.float32)

X_out = stem_block(X_inp)
print("shape output" + str(X_out.shape))
print("out = " + str(X_out[:, 0, 0, 0]))

Expected output:

``shape output(4, 10, 10, 384)``

``out = tf.Tensor([0.01137412 0.         0.01842478 0.        ], shape=(4,), dtype=float32)``


### 1.3 - The Inception-A block

Implement below the Inception-A block as detailed in the figure:

<img src="https://drive.google.com/uc?export=view&id=147HD5TVWw9qWywYh3DGl1Mv1uIQqoI07" style="width:50%">
<caption><center> Inception-A block. </center></caption>

#### Note

- The average pooling has ``pool_size = (3, 3)`` and ``strides = (1, 1)``.

In [None]:
# FUNCTION: Inception-A block

def inception_a_block(X_input, base_name):
    """
    Implementation of the Inception-A block

    Arguments:
    X_input -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)

    Returns:
    X -- output of the block, tensor of shape (n_H, n_W, n_C)
    """

    ### START CODE HERE ###

    # Branch 1
    branch1 = None

    # Branch 2
    branch2 = None

    # Branch 3
    branch3 = None

    # Branch 4
    branch4 = None

    # Concatenate branch1, branch2, branch3 and branch4 along the channel axis
    X = None

    ### END CODE HERE ###

    return X

In [None]:
tf.random.set_seed(1)
X_inp = tf.random.normal((4, 100, 100, 3), dtype=tf.dtypes.float32)

X_out = inception_a_block(X_inp, 'a')
print("shape output" + str(X_out.shape))
print("out = " + str(X_out[:, 0, 0, 0]))

Expected output:

``shape output(4, 100, 100, 384)``

``out = tf.Tensor([0.09909102 0.18648547 0.01800422 0.        ], shape=(4,), dtype=float32)``


### 1.4 - The Inception-B block

Implement below the Inception-B block as detailed in the figure:

<img src="https://drive.google.com/uc?export=view&id=1HO_HaRit39sEqzvZ9ETFp5TBQ36ApeZo" style="width:50%">
<caption><center> Inception-B block. </center></caption>

### Note
- The average pooling has ``pool_size = (3, 3)`` and ``strides = (1, 1)``
- In the **third** branch, the **last** convolutional layer has ``kernel_size = (7, 1)``.

In [None]:
# FUNCTION: Inception-B block

def inception_b_block(X_input, base_name):
    """
    Implementation of the Inception-B block

    Arguments:
    X_input -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)

    Returns:
    X -- output of the block, tensor of shape (n_H, n_W, n_C)
    """

    ### START CODE HERE ###

    # Branch 1
    branch1 = None

    # Branch 2
    branch2 = None

    # Branch 3
    branch3 = None

    # Branch 4
    branch4 = None

    # Concatenate branch1, branch2, branch3 and branch4 along the channel axis
    X = None

    ### END CODE HERE ###

    return X

In [None]:
tf.random.set_seed(1)
X_inp = tf.random.normal((4, 100, 100, 3), dtype=tf.dtypes.float32)

X_out = inception_b_block(X_inp, 'b')
print("shape output" + str(X_out.shape))
print("out = " + str(X_out[:, 0, 0, 0]))

Expected output:

``shape output(4, 100, 100, 1024)``

``out = tf.Tensor([0.09139545 0.11461768 0.         0.03276341], shape=(4,), dtype=float32)
``


### 1.5 - The Inception-C block

Implement below the Inception-C block as detailed in the figure:

<img src="https://drive.google.com/uc?export=view&id=1j-QoS5ik4P2SUiFLBwgZrZ3wEtpcVFI_" style="width:50%">
<caption><center> Inception-C block. </center></caption>


### Note
 - The average pooling has ``pool_size = (3, 3)`` and ``strides = (1, 1)``.

In [None]:
# FUNCTION: Inception-C block

def inception_c_block(X_input, base_name):
    """
    Implementation of the Inception-C block

    Arguments:
    X_input -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)

    Returns:
    X -- output of the block, tensor of shape (n_H, n_W, n_C)
    """

    ### START CODE HERE ###

    # Branch 1
    branch1 = None

    # Branch 2
    branch2 = None

    # Branch 3
    branch3 = None

    # Branch 4
    branch4 = None

    # Concatenate branch1, branch2, branch3_1, branch3_2, branch4_1 and branch4_2 along the channel axis
    X = None

    ### END CODE HERE ###

    return X

In [None]:
tf.random.set_seed(1)
X_inp = tf.random.normal((4, 100, 100, 3), dtype=tf.dtypes.float32)

X_out = inception_c_block(X_inp, 'c')
print("shape output" + str(X_out.shape))
print("out = " + str(X_out[:, 0, 0, 0]))

Expected output:

``shape output(4, 100, 100, 1536)``

``out = tf.Tensor([0.04224635 0.1296405  0.0542384  0.        ], shape=(4,), dtype=float32)
``


### 1.6 - The Reduction-A block

Implement below the Reduction-A block as detailed in the figure:

<img src="https://drive.google.com/uc?export=view&id=1-1Ntm8Xw07GFkhpqV_TWLfA4MWnevXZS" style="width:40%">
<caption><center> Reduction-A block. </center></caption>

### Note
For the Inception-v4 the parameters are as follows:
- $n = 384$
- $k = 192$
- $l = 224$
- $m = 256$

In [None]:
# FUNCTION: Reduction-A block

def reduction_a_block(X_input):
    """
    Implementation of the Reduction-A block

    Arguments:
    X_input -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)

    Returns:
    X -- output of the block, tensor of shape (n_H, n_W, n_C)
    """

    ### START CODE HERE ###

    # Branch 1
    branch1 = None

    # Branch 2
    branch2 = None

    # Branch 3
    branch3 = None

    # Concatenate branch1, branch2 and branch3 along the channel axis
    X = None

    ### END CODE HERE ###

    return X

In [None]:
tf.random.set_seed(1)
X_inp = tf.random.normal((4, 100, 100, 3), dtype=tf.dtypes.float32)

X_out = reduction_a_block(X_inp)
print("shape output" + str(X_out.shape))
print("out = " + str(X_out[:, 0, 0, 0]))

Expected output:

``shape output(4, 49, 49, 643)``

``out = tf.Tensor([1.3573772  0.54856557 1.4745578  0.3727632 ], shape=(4,), dtype=float32)
``

### 1.7 - The Reduction-B block

Implement below the Reduction-B block as detailed in the figure:

<img src="https://drive.google.com/uc?export=view&id=1hnE2PP7O3CDBy91LIqB73yoyzfB-hX70" style="width:40%">
<caption><center> Reduction-B block. </center></caption>


In [None]:
# FUNCTION: Reduction-B block

def reduction_b_block(X_input):
    """
    Implementation of the Reduction-B block

    Arguments:
    X_input -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)

    Returns:
    X -- output of the block, tensor of shape (n_H, n_W, n_C)
    """

    ### START CODE HERE ###

    # Branch 1
    branch1 = None

    # Branch 2
    branch2 = None

    # Branch 3
    branch3 = None

    # Concatenate branch1, branch2 and branch3 along the channel axis
    X = None

    ### END CODE HERE ###

    return X

In [None]:
tf.random.set_seed(1)
X_inp = tf.random.normal((4, 100, 100, 3), dtype=tf.dtypes.float32)

X_out = reduction_b_block(X_inp)
print("shape output" + str(X_out.shape))
print("out = " + str(X_out[:, 0, 0, 0]))

Expected output:

``shape output(4, 49, 49, 515)``

``out = tf.Tensor([1.3573772  0.54856557 1.4745578  0.3727632 ], shape=(4,), dtype=float32)
``


### 1.8 - Network construction

You have now implemented all the necessary blocks to build the **Inception-v4** network!

Refer to the above figure about the whole network and stack the blocks you implemented in the helper functions to build the Inception network.


#### Note:

- Add a ``Flatten`` layer after the last ``AveragePooling2D`` layer.

In [None]:
# FUNCTION: Inception-v4

def Inceptionv4(input_shape):
    """
    Implementation of the Inception-v4 architecture

    Arguments:
    input_shape -- shape of the images of the dataset
    classes -- integer, number of classes

    Returns:
    model -- a Model() instance in Keras
    """

    ### START CODE HERE ###

    # Define the input as a tensor with shape input_shape (1 line)
    X_input = None

    # Call the above functions for the stem, inception-a, reduction-a, inception-b, reduction-b and inception-c blocks
    X = None

    # Four Inception A blocks
    X = None

    # Reduction A block
    X = reduction_a_block(X)

    # Seven Inception B blocks
    X = None

    # Reduction B block
    X = None

    # Three Inception C blocks
    X = None

    # AVGPOOL (1 line). Use "X = AveragePooling2D(...)(X)"
    kernel_pooling = (1,1) # you should check it in the model.summary() list of layers and dimensions
    X = None

    # Dropout
    X = None

    # Output layer
    X = None

    ### END CODE HERE ###

    # Create model
    model = Model(inputs = X_input, outputs = X, name='Inceptionv4')

    return model

## 2 - Network training

- Create the model, using the correct input shape for the dataset, and compile it.
- Use `binary_crossentropy` for the loss as we need to solve a binary classification problem.
- As optimizer try this time `SGD` using ``optimizer = tf.keras.optimizers.SGD()`` specifying ``learning rate = 0.005``.

In [None]:
model = None
optimizer = None

# Compile the model
None

Use the early stopping callback ([documentation here](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping)) to stop the training when the validation loss stops decreasing.

In [None]:
# Create a callback for early stopping
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=6)

In the next cell we create a callback for tensorboard that helps in model visualization and in analyzing the training process. See the details at [this link](https://colab.research.google.com/github/tensorflow/tensorboard/blob/master/docs/tensorboard_in_notebooks.ipynb#scrollTo=lpUO9HqUKP6z)

In [None]:
# Create a callback for tensorboard
%load_ext tensorboard
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
!mkdir -p logs
%tensorboard --logdir logs

Fit the model on the data using real-time data augmentation. Use the method `flow` of `ImageDataGenerator` ([documentation here](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator#flow)). In addition to the training data, specify `validation_data`, `steps_per_epoch`, `validation_steps` and `callbacks` (see [here](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit)).

In [1]:
# Fit the model on batches with real-time data augmentation:
### START CODE HERE ###
None
### END CODE HERE ###

In [None]:
model.save('my_inception_model.keras')

## 3 - Performance assessment

To load the pre-trained model and use it on the test set, uncomment the line in the cell below.

In [None]:
# model = load_model('Inceptionv4.keras')

### 3.1 - Loss and accuracy
Compute the loss and accuracy on the test set.

In [None]:
preds = model.evaluate(datagen_test.flow(x_test, y_test, batch_size=test_length, shuffle=False), steps=1)
print('Loss = {:.5f}'.format(preds[0]))
print('Test Accuracy = {:.2f}%'.format(preds[1]*100))

### 3.2 - Precision, recall, fscore

The performance of the neural network architecture can be evaluated with other metrics that provide additional information with respect to accuracy. Moreover, in the case of imbalanced datasets (i.e., when the elements in the dataset are not equally distributed among the classes), accuracy is not a good metric and others should be preferred.

Here we consider three other metrics: **precision**, **recall** and **fscore**. You will use some methods from the *scikit-learn* library [documentation here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics).

First, we need the output of the network for all the validation samples. To obtain it use:
```python
model.predict(datagen_test.flow(x_test, batch_size=batch_size, shuffle=False), steps=test_steps)[:test_length].squeeze()
```
The ```[:test_length]``` is needed because the elements in ```[test_length:]``` are not part of our dataset, they are added to complete the batch.

Then, if the output is < 0.5, the estimated class is `no tumor`, otherwise, pixels of tumor tissue have been detected in the image: use
```python
(test_values > 0.5).astype(int)
```

At this point, compute the precision, recall, fscore using the ``precision_recall_fscore_support`` function setting the parameter ``average='binary'`` ([here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html#sklearn.metrics.precision_recall_fscore_support) the documentation). As we are not interested in the last output of this function we put an underscore to consider the position (without the placeholder, that line returns the error ```ValueError: too many values to unpack (expected 3)```).

In [None]:
### START CODE HERE ### (3 lines)
# Get the network output for the validation set
test_values = None

# Get the estimated classes
test_y_est = None

# Compute precision, recall, fscore
precision, recall, fscore, _= None
### END CODE HERE ###

print('Precision = {:.2f}%'.format(precision*100))
print('Recall = {:.2f}%'.format(recall*100))
print('Fscore = {:.2f}%'.format(fscore*100))

### 3.3 - Receiver operating characteristic (ROC) curve

Another interesting analysis is the evaluation of the ROC curve. The curve is obtained by evaluating the *False Positive Rate* (FPR) and the *True Positive Rate* (TPR) by varying the threshold used to infer the estimated classes (0.5 in the previous case).

More specifically, in the previous case, we evaluated the metrics considering as positives all the examples with an output > 0.5. In this case, the output will be compared to many different thresholds, achieving a different performance for each one of them. The ROC is obtained by plotting the value of the TPR and FPR pair for the different thresholds.

Fortunately, scikit.learn also exposes a function `roc_curve` for this purpose, see [here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html) the documentation. You just need the network outputs (*val_preds*) and the corresponding labels (*val_labels*). Compute the area under the ROC curve using the `auc` function [here](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html).

In [None]:
### START CODE HERE ### (2 lines)
fpr, tpr, _ = None
roc_auc = None
### END CODE HERE ###

plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic (ROC)')
plt.legend(loc="lower right")

### 3.4 - Precision-Recall Curve (PRC)
We can do the exact same thing with precision and recall (instead of TPR and FPR), generating the precision-recall curve (PRC). Select the proper function from [here](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics).

In [None]:
### START CODE HERE ### (1 line)
prec, rec, _ = None
### END CODE HERE ###

plt.figure()
plt.plot(prec, rec, color='darkorange', lw=2)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Precision')
plt.ylabel('Recall')
plt.title('Precision-Recall Curve (PRC)')

**Congratulations! Lab 3 completed :)**

What can you do now?
- This time we have not created an optimized input pipeline (there were already a lot of holes to fill...) but you can try to change the code and implement it as in Lab 2.
- We used a subset of the complete dataset available [here](https://github.com/basveeling/pcam). You can try to increase the number of training images to evaluate the performance of the Inception v4 neural network.
- For simplicity, we used only two sets, training, and test. Remember that in a real evaluation we need three sets: training, validation, and test. The validation set is used during training to select the best hyperparameters (number of layers, neurons per layer...) and the epoch where to stop training. The test set is used to assess the performance of the resulting network, and it is never used during training.