# Module 4. Model deployment and transfer learning

In this notebook we deal with the usual scenario found often in real life: it is time to establish a reasonable end-to-end system for training your model. Depending on the problem you are solving, you need to design the baseline network type and architecture. In this step, ask yourself questions like:

- Which kind of convolution filters should I use?
- How deep should my network be?
- Which activation type should I use?
- What kind of optimizer should I use?
- Do I need to add any other regularization layers to avoid overfitting?

# Overfiting and regularization layers

The main cause of poor performance in machine learning (ML) is typically either overfitting or underfitting the training dataset. 

- Underfitting occurs when the model is too simple to learn the training data, resulting in poor performance on the training data.
- Overfitting, on the other hand, happens when the model is overly complex for the problem at hand. Instead of learning features that fit the training data, it memorizes the training data. Consequently, it performs very well on the training data but fails to generalize when tested with new data that it hasn't seen before.

## Regularization techniques to avoid overfitting

If you observe that your neural network is overfitting the training data, your network might be too complex and need to be simplified. One of the first techniques you should try is regularization. In the previous notebook we already saw one regularization technique: data augmentation. In this notebook we review other regularization techniques. 

### Dropout

Dropout is another regularization technique. It is effective for simplifying a neural network and thus avoiding overfitting. The algorithm is fairly simple: at every training iteration, every neuron has a probability p of being temporarily ignored (dropped out) during this training iteration. This means it may be active during subsequent iterations. 

Although it is counterintuitive to intentionally pause the learning on some of the network neurons, it is quite surprising how well this technique works. The probability p is a hyperparameter that is called dropout rate. 

> Dropout helps reduce interdependent learning among the neurons. In that sense, it helps to view dropout as a form of ensemble learning. In ensemble learning, we train a number of weaker classifiers separately, and then we use them at test time by averaging the responses of all ensemble members. Since each classifier has been trained separately, it has learned different aspects of the data, and their mistakes (errors) are different. Combining them helps to produce a stronger classifier, which is less prone to overfitting.
> 

### Batch Normalization

When we train a model on a certain dataset, we make certain assumptions about the distribution of the data. However, when we are dealing with real-world data, the distribution is not always constant. This is where the idea of covariate shift comes in. Covariate shift refers to the phenomenon where the distribution of the input data changes between the training phase and the testing phase. In other words, the model is being tested on a different distribution than the one it was trained on. This can lead to poor performance of the model, even if it performed very well during training. Therefore, it is important to take into account the possibility of covariate shift when training machine learning models, and to use techniques addressing this issue.

Covariate shift can occur in neural networks as the values of parameters in previous layers change, causing the activation values in later layers to change as well. Batch norm helps reduce the degree of change in the distribution of hidden unit values, providing more stability for later layers of the network.

Batch normalization adds an operation in the neural network just before the activation function of each layer to do the following:

1. Zero-center the inputs
2. Normalize the zero-centered inputs
3. Scale and shift the results

This operation lets the model learn the optimal scale and mean of the inputs for each layer.

Now is the time for you to put in practice these ideas. In the first part of this notebook you will deal with a slightly more complex classification problem: Cifar10. You will be asked to define a model for it. Feel free to use any of the aforementioned regularization techniques for the problem. 



In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds



# use tensorflow dataset to load cifa10 dataset
cifar10, info = tfds.load('cifar10',data_dir='cifar10_data',download=True,shuffle_files=True,with_info=True, as_supervised=True)

# extrat train and test dataset
ds_train = cifar10['train']
ds_test  = cifar10['test']

# show some examples
tfds.show_examples(ds_train,info)

# shuffle, batch and prefetch train dataset
ds_train = ds_train.shuffle(1024).batch(32).prefetch(tf.data.AUTOTUNE)

# batch and prefetch test dataset
ds_test = ds_test.batch(32).prefetch(tf.data.AUTOTUNE)

# define a normalise function
def normalise_img(image, label):
    return tf.cast(image,tf.float32) / 255., tf.one_hot(label,depth=10)

# map normalise function to train and test dataset
ds_train = ds_train.map(normalise_img,num_parallel_calls=tf.data.AUTOTUNE)
ds_test  = ds_test.map(normalise_img,num_parallel_calls=tf.data.AUTOTUNE)

Create your model in the next cell

In [None]:
model = ####

In [None]:
# compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# train the model
model.fit(ds_train, epochs=15, validation_data=ds_test)
import matplotlib.pyplot as plt

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'r', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'r', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()


# Transfer Learning

If your problem is similar to another problem that has been studied extensively, you should first copy the model and algorithm that are known to perform the best for that task. You can even use a model that was trained on a different dataset for your own problem without having to train it from scratch. This is called transfer learning.

Transfer learning involves taking a pre-trained model and retraining it on a task that has some overlap with the original training task. The analogy of a builder skilled in one material wanting to learn how to use another one is applicable to model deployment and transfer learning, as the skills learned in one area can be valuable in another.

As an example in deep learning, consider a pre-trained model that recognizes different types of cars very well. We want to train a model to recognize types of trucks, and many of the insights gained from the car model would be useful, such as the ability to recognize headlights and wheels.

Transfer learning is especially powerful when we do not have a large and varied dataset. In this case, a model trained from scratch would likely memorize the training data quickly, but not generalize well to new data. With transfer learning, you can increase your chances of training an accurate and robust model on a small dataset.

Despite the Cifar10 is not precisely a small dataset, we are exploring in the following the idea of transfer learning. For this we are going to download a model that has been trained on a different dataset (Imagenet) for classifying images into a set of different classes. We need to adapt that trained model for classifying our dataset onto a completely different set of images. 

The part of the model that is actually doing the classification are the fully connected layers at the deppest part of the model. This part of the model is therefore not useful for our new task and we can discard it. The convolutional layers at the beginning of the model, are simply in charge of extracting features of the images (and these features might be useful for different tasks - thus beneficial for transfer learning). This part we should take and we can either freeze (i.e., not allowing the training to change them anymore and assuming that the features that these layers extrat are enough for our new tasks) or further train (to adapt them to extract new possible features related to the new task.)

An example is showed in the next cell. The `include_top=False` indicates we only want to download the convolutional layers of that trained model on the `imagenet` dataset. 

In [None]:
dense = tf.keras.applications.DenseNet201(weights='imagenet', include_top=False, input_shape=(32,32,3))

# if you want to further trainthese downloaded layers you can uncomment the following piece of code
for layer in dense.layers:
    layer.trainable = False


The downloaded model alone does nothing. In order to have a model that allows classifying the Cifar-10 dataset we need to add to the models some layers for doing the classification. Perform this task in the following cell:

In [None]:
# create a new sequential model
model = tf.keras.models.Sequential()
#model.add(tf.keras.layers.Lambda(lambda image: tf.image.resize(image, (224,224))))
model.add(dense)
# add a flatten layer
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Dense(10, activation='softmax'))

# compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# train the model
history=model.fit(ds_train, epochs=10, validation_data=ds_test)
import matplotlib.pyplot as plt

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'r', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'r', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()


In [None]:
# if you want to further trainthese downloaded layers you can uncomment the following piece of code
for layer in dense.layers:
    layer.trainable = True

history=model.fit(ds_train, epochs=10, validation_data=ds_test)
import matplotlib.pyplot as plt

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'r', label='Training accuracy')
plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'r', label='Training Loss')
plt.plot(epochs, val_loss, 'b', label='Validation Loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()
