<a href="https://colab.research.google.com/github/L4ncelot1024/Learn_Deep_Learning_Le_Wagon/blob/main/Day4/03_Transfer_Learning_CHALLENGE.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cat vs. Dog Image Classification

In the previous, we built a convnet from scratch, and were able to achieve an accuracy of about 70%. 

In this exercise, we'll look at two techniques for repurposing feature data generated from image models that have already been trained on large sets of data, **feature extraction** and **fine tuning**, and use them to improve the accuracy of our cat vs. dog classification model.

In [None]:
%tensorflow_version 2.x

TensorFlow 2.x selected.


In [None]:
%matplotlib inline
import os
import zipfile

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import tensorflow as tf
import pandas as pd

from tensorflow.keras import layers
from tensorflow.keras import Model

## Data

Here we download and process the data like in the previous exercise !

In [None]:
!wget --no-check-certificate \
    https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip \
    -O /tmp/cats_and_dogs_filtered.zip

--2019-11-24 22:19:27--  https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.202.128, 2607:f8b0:400e:c03::80
Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.202.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 68606236 (65M) [application/zip]
Saving to: ‘/tmp/cats_and_dogs_filtered.zip’


2019-11-24 22:19:32 (226 MB/s) - ‘/tmp/cats_and_dogs_filtered.zip’ saved [68606236/68606236]



In [None]:
local_zip = '/tmp/cats_and_dogs_filtered.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp')
zip_ref.close()

base_dir = '/tmp/cats_and_dogs_filtered'
train_dir = os.path.join(base_dir, 'train')
test_dir = os.path.join(base_dir, 'validation')

# Directory with our training cat pictures
train_cats_dir = os.path.join(train_dir, 'cats')

# Directory with our training dog pictures
train_dogs_dir = os.path.join(train_dir, 'dogs')

# Directory with our test cat pictures
test_cats_dir = os.path.join(test_dir, 'cats')

# Directory with our test dog pictures
test_dogs_dir = os.path.join(test_dir, 'dogs')

train_cat_fnames = os.listdir(train_cats_dir)

train_dog_fnames = os.listdir(train_dogs_dir)
train_dog_fnames.sort()

print('total training cat images:', len(os.listdir(train_cats_dir)))
print('total training dog images:', len(os.listdir(train_dogs_dir)))
print('total test cat images:', len(os.listdir(test_cats_dir)))
print('total test dog images:', len(os.listdir(test_dogs_dir)))

total training cat images: 1000
total training dog images: 1000
total test cat images: 500
total test dog images: 500


In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# All images will be rescaled by 1./255, for the train generator we set
# the validation split.
train_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)
test_datagen = ImageDataGenerator(rescale=1./255)

# global settings
target_size = (150, 150)
batch_size = 20

# Flow training images in batches of 20 using train_datagen generator
print('Train Dataset')
train_generator = train_datagen.flow_from_directory(
        train_dir,  # This is the source directory for training images
        target_size=target_size,  # All images will be resized to 150x150
        batch_size=batch_size,
        # Since we use binary_crossentropy loss, we need binary labels
        class_mode='binary',
        subset='training')

print('Validation Dataset')
# Flow validation images
validation_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=target_size,
    batch_size=batch_size,
    class_mode='binary',
    subset='validation')

print('Test Dataset')
# Flow validation images in batches of 20 using val_datagen generator
test_generator = test_datagen.flow_from_directory(
        test_dir,
        target_size=target_size,
        batch_size=batch_size,
        class_mode='binary')

Train Dataset
Found 1600 images belonging to 2 classes.
Validation Dataset
Found 400 images belonging to 2 classes.
Test Dataset
Found 1000 images belonging to 2 classes.


## Pre-trained Model

One thing that is commonly done in computer vision is to take a model trained on a very large dataset, run it on your own, smaller dataset, and extract the intermediate representations (features) that the model generates. These representations are frequently informative for your own computer vision task, even though the task may be quite different from the problem that the original model was trained on. This versatility and repurposability of convnets is one of the most interesting aspects of deep learning.

In our case, we will use the [Inception V3 model](https://arxiv.org/abs/1512.00567) developed at Google, and pre-trained on [ImageNet](http://image-net.org/), a large dataset of web images (1.4M images and 1000 classes). This is a powerful model; let's see what the features that it has learned can do for our cat vs. dog problem.

First, we need to pick which intermediate layer of Inception V3 we will use for feature extraction. A common practice is to use the output of the very last layer before the `Flatten` operation, the so-called "bottleneck layer." The reasoning here is that the following fully connected layers will be too specialized for the task the network was trained on, and thus the features learned by these layers won't be very useful for a new task. The bottleneck features, however, retain much generality.

Let's instantiate an Inception V3 model preloaded with weights trained on ImageNet:


### Preparing the pre-trained Model

In [None]:
!wget --no-check-certificate \
    https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5 \
    -O /tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5

--2019-11-24 22:19:35--  https://storage.googleapis.com/mledu-datasets/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5
Resolving storage.googleapis.com (storage.googleapis.com)... 74.125.142.128, 2607:f8b0:400e:c08::80
Connecting to storage.googleapis.com (storage.googleapis.com)|74.125.142.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 87910968 (84M) [application/x-hdf]
Saving to: ‘/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5’


2019-11-24 22:19:36 (99.3 MB/s) - ‘/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5’ saved [87910968/87910968]



In [None]:
# TODO: Restore the pre-trained model

<details>
<summary markdown='span'>Hints
</summary>
Check here tensorflow.keras.applications.inception_v3 to find the architecture of InceptionV3
</details>

<details>
<summary markdown='span'>View solution
</summary>

```python
from tensorflow.keras.applications.inception_v3 import InceptionV3

local_weights_file = '/tmp/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5'
# By specifying the include_top=False argument, we load a network that
# doesn't include the classification layers at the top—ideal for feature extraction.
pre_trained_model = InceptionV3(
    input_shape=(150, 150, 3), include_top=False, weights=None)
pre_trained_model.load_weights(local_weights_file)
```

Let's make the model non-trainable, since we will only use it for feature extraction; we won't update the weights of the pretrained model during training.

In [None]:
# TODO: make the layers not trainable

<details>
<summary markdown='span'>Hints
</summary>
A Layer object has an attribute `trainable`
</details>

<details>
<summary markdown='span'>View solution
</summary>

```python
for layer in pre_trained_model.layers:
  layer.trainable = False
```

The layer we will use for feature extraction in Inception v3 is called `mixed7`. It is not the bottleneck of the network, but we are using it to keep a sufficiently large feature map (7x7 in this case). (Using the bottleneck layer would have resulting in a 3x3 feature map, which is a bit small.) Let's get the output from `mixed7`:

In [None]:
# TODO: get the output of the layer named mixed7 in the pre_trained model

<details>
<summary markdown='span'>Hints
</summary>
There is an object method called `get_layer(layer_name)` for a Keras Model object
</details>

<details>
<summary markdown='span'>View solution
</summary>

```python
last_layer = pre_trained_model.get_layer('mixed7')
print('last layer output shape:', last_layer.output_shape)
last_output = last_layer.output
```

### Building the Model


Now let's build our model architecture with a fully connected classifier on top of the output of our pre-trained model and train it for only a few epochs (2-5) since there is only the classification layers to optimize

In [None]:
# TODO: Use the Keras Functional API to create the classification layers on top
# of last_output and instantiate a Keras model


<details>
<summary markdown='span'>Hints
</summary>
You just need to sequentially add the layer with the Keras method like this

```python
x = myLayer(previous_output)
x = myLayer(x)
...
model = Model(pre_trained_model.input, x)
```


</details>

<details>
<summary markdown='span'>View solution
</summary>

```python
# Flatten the output layer to 1 dimension
x = layers.Flatten()(last_output)
# Add a fully connected layer with 1,024 hidden units and ReLU activation
x = layers.Dense(1024, activation='relu')(x)
# Add a dropout rate of 0.2
x = layers.Dropout(0.2)(x)
# Add a final sigmoid layer for classification
x = layers.Dense(1, activation='sigmoid')(x)

# Configure and compile the model
model = Model(pre_trained_model.input, x)
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
```

In [None]:
# TODO: train the model for 3 epochs

<details>
<summary markdown='span'>View solution
</summary>

```python
num_epochs = 3

history = model.fit_generator(
      train_generator,
      steps_per_epoch=train_generator.samples // batch_size,  # 2000 images = batch_size * steps
      epochs=num_epochs,
      validation_data=validation_generator,
      validation_steps = validation_generator.samples // batch_size,
      verbose=2)
```

### Evaluation

In [None]:
# TODO: evaluate on the test set and check how your metrics evolve during training to see check for overfitting

<details>
<summary markdown='span'>View solution
</summary>

```python
test_loss, accuracy_test_nn = model.evaluate(test_generator, verbose=2)
print(f'Test model accuracy: {accuracy_test_nn}')

history_df = pd.DataFrame(history.history).reset_index().rename(columns={'index': 'epochs'})
history_df.tail()

fig, axes = plt.subplots(2, 1, figsize=(14, 12))

for i, metric in enumerate(['loss', 'accuracy']):
  ax = axes[i]
  history_df.plot('epochs', f'{metric}', color='g', label='train', ax=ax)
  history_df.plot('epochs', f'val_{metric}', color='r', label='val', ax=ax)
  ax.set_ylabel(metric)
plt.show()
```

You can see that we reach a validation accuracy of 88–90% very quickly. This is much better than the small model we trained from scratch.

## Fine-Tuning

In our feature-extraction experiment, we only tried adding two classification layers on top of an Inception V3 layer. The weights of the pretrained network were not updated during training. One way to increase performance even further is to "fine-tune" the weights of the top layers of the pretrained model alongside the training of the top-level classifier. A couple of important notes on fine-tuning:

- **Fine-tuning should only be attempted *after* you have trained the top-level classifier with the pretrained model set to non-trainable**. If you add a randomly initialized classifier on top of a pretrained model and attempt to train all layers jointly, the magnitude of the gradient updates will be too large (due to the random weights from the classifier), and your pretrained model will just forget everything it has learned.
- Additionally, we **fine-tune only the *top layers* of the pre-trained model** rather than all layers of the pretrained model because, in a convnet, the higher up a layer is, the more specialized it is. The first few layers in a convnet learn very simple and generic features, which generalize to almost all types of images. But as you go higher up, the features are increasingly specific to the dataset that the model is trained on. The goal of fine-tuning is to adapt these specialized features to work with the new dataset.
- When fine-tuning you're close to the end of the optimization part of your weights, so you need to use an optimizer with a pretty small **learning rate** so you don't modify too much your weigths (remember we call this a **fine** tuning)

All we need to do to implement fine-tuning is to set the top layers of Inception V3 to be trainable, recompile the model (necessary for these changes to take effect), and resume training. Let's unfreeze all layers belonging to the `mixed7` module—i.e., all layers found after `mixed6`—and recompile the model:

In [None]:
# TODO: Print the layers present in the pre-trained model

<details>
<summary markdown='span'>View solution
</summary>

```python
print([layer.name for layer in pre_trained_model.layers])
```

In [None]:
# TODO: Set to trainable all the layers after mixed6 and compile again your model
# (but don't initialize it again since you want to keep your weights value)
# with a relevant optimizer. 

<details>
<summary markdown='span'>Hints
</summary>
Let's use a Stochastic Gradient Descent (see tensorflow.keras.optimizers.SGD)
</details>

<details>
<summary markdown='span'>View solution
</summary>

```python
from tensorflow.keras.optimizers import SGD

unfreeze = False

# Unfreeze all models after "mixed6"
for layer in pre_trained_model.layers:
  if unfreeze:
    layer.trainable = True
  if layer.name == 'mixed6':
    unfreeze = True

# As an optimizer, here we will use SGD 
# with a very low learning rate (0.00001)
model.compile(loss='binary_crossentropy',
              optimizer=SGD(
                  lr=0.00001, 
                  momentum=0.9),
              metrics=['accuracy'])
```

In [None]:
# TODO: train the model for 30 epochs

<details>
<summary markdown='span'>View solution
</summary>

```python
num_epochs = 30

history = model.fit_generator(
      train_generator,
      steps_per_epoch=train_generator.samples // batch_size,  
      epochs=num_epochs,
      validation_data=validation_generator,
      validation_steps = validation_generator.samples // batch_size,
      verbose=2)
```

In [None]:
# TODO: evaluate on the test set and check how your metrics evolve during
# training to see check for overfitting

<details>
<summary markdown='span'>View solution
</summary>

```python

test_loss, accuracy_test_nn = model.evaluate(test_generator, verbose=2)
print(f'Test model accuracy: {accuracy_test_nn}')


history_df = pd.DataFrame(history.history).reset_index().rename(columns={'index': 'epochs'})
history_df.tail()

fig, axes = plt.subplots(2, 1, figsize=(14, 12))

for i, metric in enumerate(['loss', 'accuracy']):
  ax = axes[i]
  history_df.plot('epochs', f'{metric}', color='g', label='train', ax=ax)
  history_df.plot('epochs', f'val_{metric}', color='r', label='val', ax=ax)
  ax.set_ylabel(metric)
plt.show()
```

Note your observations