In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow

In [None]:
#Let's unzip the training data
!unzip -qq /kaggle/input/dogs-vs-cats/train.zip

In [None]:
import os, shutil, pathlib 

In [None]:
#We'll use these directories to create our training, validation and test sets. 

original_dir = pathlib.Path("/kaggle/working/train") #where our unzipped data resides
new_base_dir = pathlib.Path("/kaggle/working/cats_vs_dogs_small") #where we want to store our smaller dataset


In [None]:
#Function to create our datasets 

def make_subset(subset_name, start_index, end_index):
    for category in ("cat", "dog"):
        dir = new_base_dir/subset_name/category
        os.makedirs(dir)
        fnames = [f"{category}.{i}.jpg" for i in range(start_index, end_index)]
        for fname in fnames:
            shutil.copyfile(src=original_dir / fname,
                            dst=dir/fname)
    

In [None]:
#Now we can use our make_subset function to create our datasets
#We now have 2000 training, 1000 validation and 2000 testing images.
make_subset("train", start_index=0, end_index=1000)
make_subset("validation", start_index=1000, end_index=1500)
make_subset("test", start_index=1500, end_index=2500)

In [None]:
from tensorflow import keras
from tensorflow.keras import layers

In [None]:
#Time to build our CovNet 
inputs = keras.Input(shape = (180,180,3)) #Our model will now expect RGB images of size 180x180 pixels.
x = layers.Rescaling(1./255)(inputs) #recaling our images from 0-255, to 0-1
x = layers.Conv2D(filters = 32, kernel_size = 3, activation = "relu")(x)
x = layers.MaxPooling2D(pool_size = 2)(x)
x = layers.Conv2D(filters = 64, kernel_size = 3, activation = "relu")(x)
x = layers.MaxPooling2D(pool_size = 2)(x)
x = layers.Conv2D(filters = 128, kernel_size = 3, activation = "relu")(x)
x = layers.MaxPooling2D(pool_size = 2)(x)
x = layers.Conv2D(filters = 256, kernel_size = 3, activation = "relu")(x)
x = layers.MaxPooling2D(pool_size = 2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x) 
x = layers.Flatten()(x)

outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs=inputs, outputs=outputs)

The code above defines a convolutional neural network (CNN) using the Keras library. The CNN takes images of size 180x180x3 as inputs, where 3 is the number of color channels (RGB).

The model consists of a series of layers that transform the input image through a series of convolutions, pooling, and flattening operations, and finally a dense layer that produces a single output value that indicates whether the input image belongs to a specific class or not.

The Rescaling layer scales the pixel values of the input image by a factor of 1/255, so that the pixel values are between 0 and 1. This is a common preprocessing step that helps the model to learn more effectively. 
Scaling every images to the same range [0,1] will make images contributes more evenly to the total loss. Without scaling, the high pixel range images will have a large say to determine how to update weights.

The next series of layers consist of Conv2D and MaxPooling2D layers, which perform convolution and pooling operations on the input image respectively. Convolutional layers apply a set of filters to the input image, extracting features that are important for classification. MaxPooling layers downsample the feature maps obtained from convolutional layers, reducing the size of the input and making the network more computationally efficient.

After several convolutional and pooling layers, the last Conv2D layer outputs a feature map of size (3,3,256). This is then flattened into a 1D array using the Flatten layer, which is then fed into a fully connected (Dense) layer with a single output unit and a sigmoid activation function. The output value of the model is a probability between 0 and 1, indicating the likelihood that the input image belongs to the target class.

Overall, this model architecture is designed for binary classification tasks, where the output of the model indicates whether the input image belongs to a specific class or not.

In [None]:
#Let's have a look at the model structure
model.summary()

The filter size of a convolutional neural network (CNN) determines the receptive field of the filters, which is the region of the input image that each filter "sees". By increasing the filter size in successive layers, the model can capture increasingly complex and higher-level features of the input image.

In the given code, the CNN starts with a filter size of 32 in the first Conv2D layer, then doubles the filter size in each successive layer, up to 256 filters in the last layer. This progressive increase in filter size allows the model to learn more complex and abstract features from the input image as it progresses through the layers.

Moreover, the increasing filter size can also help the model to learn more discriminative features for the classification task, as the higher-level filters can capture more complex patterns in the input image that are specific to the target class. However, it is important to note that increasing the number of filters also increases the number of model parameters, which can make the model more computationally expensive to train and prone to overfitting if not properly regularized.






In [None]:
#Let's set up the compiler 
model.compile(loss = "binary_crossentropy", 
             optimizer = "rmsprop", 
             metrics = ["accuracy"])

Currently, our images are jpeg format. We'll have to convert them into floating point tensors. We can then feed these floating-point tensors into the model. Here's what we'll have to do, step by step:

1. Read the picture files.
2. Decode the JPEG content to RGB grids of pixels.
3. Convert these into floating-point tensors.
4. Resize them to a shared size (we’ll use 180 × 180).
5. Pack them into batches (we’ll use batches of 32 images).

Keras has a inbuilt functionality that'll allow you to easily perform these tasks: we'll use the "image_dataset_from_directory()" function. Which allows you to turn image files into batches of preprocessed tensors, ready to be fed into a neural network.

The function assumes that your directory contains a subdirectory for each class, and that the name of each subdirectory is the name of the class. For example, if you have a directory of images of cats and dogs, you would organize your directory like this:


`dataset/
    cats/
        cat1.jpg
        cat2.jpg
        ...
    dogs/
        dog1.jpg
        dog2.jpg
        ... `

In [None]:
#Let's get started by importing the method
from tensorflow.keras.utils import image_dataset_from_directory

In [None]:
#We'll use this method to create our respective datasets
train_dataset = image_dataset_from_directory(
            new_base_dir / "train",
            image_size=(180, 180),
            batch_size=32)

validation_dataset = image_dataset_from_directory(
            new_base_dir / "validation",
            image_size=(180, 180),
            batch_size=32)

test_dataset = image_dataset_from_directory(
            new_base_dir / "test",
            image_size=(180, 180),
            batch_size=32)

`import tensorflow as tf`

`train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    'dataset',
    validation_split=0.2,
    subset='training',
    seed=123,
    image_size=(128, 128),
    batch_size=32)
`

In this example, the image_dataset_from_directory() function will read in all the images in the dataset directory and create a dataset of (image, label) pairs, where the label is inferred from the subdirectory name. The validation_split parameter specifies the fraction of the dataset to use for validation, and the subset parameter specifies whether to use the training or validation subset of the data. The image_size parameter specifies the size of the images to be returned by the dataset, and the batch_size parameter specifies the size of the batches to use during training.

"TensorFlow makes available the **tf.data** API to create efficient input pipelines for machine learning models. Its core class is **tf.data.Dataset**.
A Dataset object is an iterator: you can use it in a for loop. It will typically return batches of input data and labels. You can pass a Dataset object directly to the **fit()** method of a Keras model.
The Dataset class handles many key features that would otherwise be cumbersome to implement yourself—in particular, asynchronous data prefetching (preprocessing the next batch of data while the previous one is being handled by the model, which keeps execution flowing without interruptions)."

In [None]:
#Let's view the shape of our batches, and labels for a bit more clarity. 

for data_batch, labels_batch in train_dataset:
    print(f'data batch shape: {data_batch.shape}')
    print(f'labels batch shape: {labels_batch.shape}')
    break 

In [None]:
#Time to set our keras callbacks, note how they're passing into a list. 

callbacks = [
    keras.callbacks.ModelCheckpoint(
    filepath = 'covnet_from_scratch.keras',
    save_best_only = True,
    monitor = "val_loss")
]

#When passed into the "fit" method, these callbacks will enable us to save to file, the most recent version of the model with the lowest "val_loss".
#Will save us from manually having to train the model for the complete range of epochs and selecting the best one(with the lowest val_loss).

In [None]:
#Now we use use the fit method on our dataset object. 
#We'll use the validation_data "object" we created above to gauge model fitness

history = model.fit(
        train_dataset, 
        epochs = 30,
        validation = validation_dataset,
        callbacks = callbacks)


In [None]:
#Let's plot our loss and accuracy of our model per epoch

accuracy = history.history["accuracy"]
val_accuracy = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(accuracy) + 1)
plt.plot(epochs, accuracy, "bo", label="Training accuracy") plt.plot(epochs, val_accuracy, "b", label="Validation accuracy") plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss") plt.plot(epochs, val_loss, "b", label="Validation loss") plt.title("Training and validation loss")
plt.legend()
plt.show()

"These plots are characteristic of overfitting. The training accuracy increases linearly over time, until it reaches nearly 100%, whereas the validation accuracy peaks at 75%. The validation loss reaches its minimum after only ten epochs and then stalls, whereas the training loss keeps decreasing linearly as training proceeds."
Let’s check the test accuracy. We’ll reload the model from its saved file to evaluate it as it was before it started overfitting.

In [None]:
test_model = keras.models.load_model("convnet_from_scratch.keras") 
test_loss, test_acc = test_model.evaluate(test_dataset) 
print(f"Test accuracy: {test_acc:.3f}")

## Data augmentation

In order get the most out of our input data, which in the real world may be limited. We can use data augmentation. This is a techniques that we can use on our input data prior to feeding into our model. Data augmentation in effect remixes the input data. Keras provides a lot of built in options: rotation, flip, zoom etc. You can also select by how much you want to augment your images. 

"Given infinite data, your model would be exposed to every possible aspect of the data distribution at hand: you would never overfit. Data augmentation takes the approach of generating more training data from existing training samples by augmenting the samples via a number of random transformations that yield believable-looking images. The goal is that, at training time, your model will never see the exact same picture twice. This helps expose the model to more aspects of the data so it can generalize better.
In Keras, this can be done by adding a number of data augmentation layers at the start of your model." Deep learning with Python (Francois Challot).

In [None]:
#Let's first create our data augmentation layer
data_augmentation = keras.Sequential(
            [
                layers.RandomFlip("horizontal"),
                layers.RandomRotation(0.1),
                layers.RandomZoom(0.2),
            ] )

- RandomFlip("horizontal")—Applies horizontal flipping to a random 50% of the images that go through it
- RandomRotation(0.1)—Rotates the input images by a random value in the range [–10%, +10%] (these are fractions of a full circle—in degrees, the range would be [–36 degrees, +36 degrees])
- RandomZoom(0.2)—Zooms in or out of the image by a random factor in the range [-20%, +20%]


In [None]:
#We can view a grid of the same image from our training set, "augmented" as follows:
#(We can use take(N) to only sample N batches from the dataset. This is equivalent to inserting a break in the loop after the Nth batch)
#The loop below will display the first image, for 9 iterations. 
plt.figure(figsize=(10, 10))
for images, _ in train_dataset.take(1):
    for i in range(9):
        augmented_images = data_augmentation(images)
        ax = plt.subplot(3, 3, i + 1) 
        plt.imshow(augmented_images[0].numpy().astype("uint8")) 
        plt.axis("off")

- "If we train a new model using this data-augmentation configuration, the model will never see the same input twice. But the inputs it sees are still heavily intercorrelated because they come from a small number of original images—we can’t produce new information; we can only remix existing information. As such, this may not be enough to completely get rid of overfitting. To further fight overfitting, we’ll also add a Dropout layer to our model right before the densely connected classifier.
- One last thing you should know about random image augmentation layers: just like Dropout, they’re inactive during inference (when we call predict() or evaluate()). During evaluation, our model will behave just the same as when it did not include data augmentation and dropout. " (DLWP, Callot)
- Data augmentation also allows you to train your CovNet for longer epochs, as you'd expect it overfit much "later" during the training procedure. 

In [None]:
#Let's now include Data Augmentation in our original covnet 
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)
x = layers.Rescaling(1./255)(x)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x) 
model = keras.Model(inputs=inputs, outputs=outputs)

model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="convnet_from_scratch_with_augmentation.keras",
        save_best_only=True,
        monitor="val_loss")
]
history = model.fit(
    train_dataset,
    epochs=100,
    validation_data=validation_dataset,
    callbacks=callbacks)

In [None]:
#Let's look at the loss and accuracy again

import matplotlib.pyplot as plt
accuracy = history.history["accuracy"]
val_accuracy = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(accuracy) + 1)
plt.plot(epochs, accuracy, "bo", label="Training accuracy") plt.plot(epochs, val_accuracy, "b", label="Validation accuracy") plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss") plt.plot(epochs, val_loss, "b", label="Validation loss") plt.title("Training and validation loss")
plt.legend()
plt.show()

Thanks to data augmentation and drop- out, we start overfitting much later, around epochs 60–70 (compared to epoch 10 for the original model). 
The validation accuracy ends up consistently in the 80–85% range a big improvement over our first try.

In [None]:
#Let's evalutate the model with integrated data-augmention on the test set 
test_model = keras.models.load_mode("convnet_from_scratch_with_augmentation.keras")

test_loss, test_acc = test_model.evaluate(test_dataset)

print(f"test accuracy: {test_acc:.3f}") 

#As we have limited data, it'll be difficult to get a higher performance. 
#This is where pre-trained models come in.

## Pre-Trained Models for better performance

- Pretrained models, are usually trained on very large datasets, in the hope the parameters they learn are can be useful in other, in this case Image classification tasks. Being trained on a large model should help it learn spatial hierarchies that apply to new unseen images.
- We'll use the "VGG16" pretrained model, it was trained on the large ImageNet dataset, which contains animals and objects comprising 1000 different classes. In total this dataset contains 1.4m images. 
- As it contains animals, it should be able to identify the cats and dogs in our dataset. 
- This is a huge benefit of deep learning, the ability to repurpose pre-trained models for more nuanced tasks. 

Pretrained models can be used in 2 ways: Feature extraction and Fine-tuning. 
- In Feature extraction, the pretrained model is used to extract "interesting" features, captured via the parameters it learned when it was trained(on a large dataset). These extracted features can then be used as input into a new model, that is trained from scratch.
- Features are extracted via the pre-trained model's convolutional base.
- As you saw previously, convnets used for image classification comprise two parts: they start with a series of pooling and convolution layers, and they end with a densely connected classifier. The first part is called the convolutional base of the model. In the case of convnets, feature extraction consists of taking the convolutional base of a pre- viously trained network, running the new data through it, and training a new classifier on top of the output (see figure 8.12).
-"Why only reuse the convolutional base? Could we reuse the densely connected classifier as well? In general, doing so should be avoided. The reason is that the repre- sentations learned by the convolutional base are likely to be more generic and, there- fore, more reusable: the feature maps of a convnet are presence maps of generic concepts over a picture, which are likely to be useful regardless of the computer vision problem at hand. But the representations learned by the classifier will necessarily be specific to the set of classes on which the model was trained—they will only contain information about the presence probability of this or that class in the entire picture. Additionally, representations found in densely connected layers no longer contain any information about where objects are located in the input image; these layers get rid of the notion of space, whereas the object location is still described by convolutional fea- ture maps. For problems where object location matters, densely connected features are largely useless.
Note that the level of generality (and therefore reusability) of the representations extracted by specific convolution layers depends on the depth of the layer in the model. Layers that come earlier in the model extract local, highly generic feature maps (such as visual edges, colors, and textures), whereas layers that are higher up extract more-abstract concepts (such as “cat ear” or “dog eye”). So if your new dataset differs a lot from the dataset on which the original model was trained, you may be bet- ter off using only the first few layers of the model to do feature extraction, rather than using the entire convolutional base."

## There are 2 methods available for using a pretrained convolutional base for feature-extraction. 

- The first consists of freezing the weights of the convolutional base, passing our data through the base, and using the extracted features as input to a new dense layer. This method is fast, as the images are only run through the base once, and we only need to train the final dense layers we appended. However, this method does not allow for data-augmentation. 
- The second method consists of extending the convolutional base, again freezing it's weights, but adding data-augmentation layers prior to inputting into the model. This is more computational expensive. As it requires end to end training. 

In [None]:
#Setting up our convolutional base
conv_base = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False,
    input_shape=(180, 180, 3)) #shape of the input tensors we'll input into the pre-trained model.

#"weights" specifies the weight checkpoint from which to instantiate the model
#"include_top" allows you to include/or not the last densely connected layer of the pre-trained base.
#VGG16's top dense layer was trained for classifying 1000 classes, we only want to classify 2, so we'll add our own.


In [None]:
#Summary of the architecture of our pre-trained base

conv_base.summary()

#The final feature map has shape (5,5,512)

In general, we can proceed in one of 2 ways once you've set up your pretrained base:
The first way: 

1. "Run the convolutional base over our dataset
2. record its output to a NumPy array on disk,
3. and then use this data as input to a standalone, densely connected classifier. 
- This solution is fast and cheap to run, because it only requires running the convolutional base once for every input image, and the convolutional base is by far the most expensive part of the pipeline. But for the same reason, this technique won’t allow us to use data augmentation."

The second way: 
1. "Extend the model we have (conv_base) by adding Dense layers on top, and run the whole thing from end to end on the input data. 
- This will allow us to use **data augmentation**, because every input image goes through the convolutional base every time it’s seen by the model. But for the same reason, this technique is far more expensive than the first." (Challot, DLWP)

We'll start with first method: Feature extraction without data-augmentation.

In [None]:
#Feature extraction without data-augmentation 
#Let's first define a function to extract features via VGG16 from our training, validation and testing datasets.

def get_features_and_labels(dataset): 
    all_features = []
    all_labels = []
    for images, labels in dataset:
        preprocessed_images = keras.applications.vgg16.preprocess_input(images) 
        features = conv_base.predict(preprocessed_images) 
        all_features.append(features)
        all_labels.append(labels)
return np.concatenate(all_features), np.concatenate(all_labels)


In [None]:
#Using our function on our datasets
train_features, train_labels = get_features_and_labels(train_dataset) 
val_features, val_labels = get_features_and_labels(validation_dataset) 
test_features, test_labels = get_features_and_labels(test_dataset)


"
Importantly, predict() only expects images, not labels, but our current dataset yields batches that contain both images and their labels. Moreover, the VGG16 model expects inputs that are preprocessed with the function keras.applications.vgg16.preprocess_input, which scales pixel values to an appropriate range. "

In [None]:
#We know that the features extracted from the convolutional base should be of shape (5,5,512)
train_features.shape

In [None]:
#Now we can create our densely connected classifier. 
#We'll have to use a Flatten() layer before we pass our extracted features to the dense layer
inputs = keras.Input(shape=(5, 5, 512))
x = layers.Flatten()(inputs)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)

model.compile(loss="binary_crossentropy",
                      optimizer="rmsprop",
                      metrics=["accuracy"])

callbacks = [
            keras.callbacks.ModelCheckpoint(
                filepath="feature_extraction.keras",
                save_best_only=True,
                monitor="val_loss")
       ]


In [None]:
#Fitting our model
history = model.fit(
            train_features, train_labels,
            epochs=20,
            validation_data=(val_features, val_labels),
            callbacks=callbacks)

#This feature extraction method allows the model to trained quickly, as there's only 2 Dense Layers.

In [None]:
#Let's see how our model performed per epoch, in terms of loss and accuracy

acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, "bo", label="Training accuracy") 
plt.plot(epochs, val_acc, "b", label="Validation accuracy") 
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()

The Feature extraction method reached a validation accuracy of 97%. Which is quite good, but the plots also indicate that the model began overfitting within 10 epochs. One of the reasons for this is that this method doesn't allow for the use of data augmentation. 

We'll now implement the second more expensive approach that allows for data augmentation. This method basically extends the convolutional base to a pipeline that allows for data augmentation, and training it end to end on the inputs. 
- We'll need to make sure that we freeze the weights of the convultional base, we want to keep the representations it learned. If we don't do this, you're basically overwriting the information that was gained when the model was pre-trained. ("Because the Dense layers on top are randomly initialized, very large weight updates would be propagated through the network, effectively destroying the representations previously learned.").
- We can freeze a layer in Keras by setting it's `trainable` attribute to `False`.

In [None]:
conv_base  = keras.applications.vgg16.VGG16(
    weights="imagenet",
    include_top=False)

conv_base.trainable = False
#Setting trainable to False empties the list of trainable weights of the layer or model.

You can check that the number of trainable weights is zero via the following code.

conv_base.trainable = True
- print("This is the number of trainable weights ""before freezing the conv base:", len(conv_base.trainable_weights)) 
- output: This is the number of trainable weights before freezing the conv base: 26 

- conv_base.trainable = False
- print("This is the number of trainable weights ""after freezing the conv base:", len(conv_base.trainable_weights)) 
- output: This is the number of trainable weights after freezing the conv base: 0

Now we can create a new model that chains together:

1. A data augmentation stage
2. Our frozen convolutional base
3. A dense classifier

In [None]:
data_augmentation = keras.Sequential(
    [
        layers.RandomFlip("horizontal"),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.2),
] )

inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs) #adding data-augmentation
x = keras.applications.vgg16.preprocess_input(x) #applying input value scaling
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)


model.compile(loss="binary_crossentropy",
              optimizer="rmsprop",
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="feature_extraction_with_data_augmentation.keras",
        save_best_only=True,
        monitor="val_loss")
    
    #Fitting the model
    
    history = model.fit(
            train_dataset,
            epochs=50,
            validation_data=validation_dataset,
            callbacks=callbacks)
]


"With this setup, only the weights from the two Dense layers that we added will be trained. That’s a total of four weight tensors: two per layer (the main weight matrix and the bias vector). Note that in order for these changes to take effect, you must first compile the model. If you ever modify weight trainability after compilation, you should then recompile the model, or these changes will be ignored."

With data augmentation now added, we can train the model for larger epochs, as we don't have to worry about it overfitting so soon after training.

To train this model, you'll need to use a GPU. As it's quite expensive to run.


In [None]:
#Lets plot the per epoch loss and accuracy for the training and validation set again
acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, "bo", label="Training accuracy") 
plt.plot(epochs, val_acc, "b", label="Validation accuracy") 
plt.title("Training and validation accuracy")
plt.legend()
plt.figure()
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.legend()
plt.show()

In [None]:
#Evaluting on the test set
test_model = keras.models.load_model(
            "feature_extraction_with_data_augmentation.keras")
test_loss, test_acc = test_model.evaluate(test_dataset) 
print(f"Test accuracy: {test_acc:.3f}")

## Now we'll show how to use the convolutional base for fine-tuning.

- Unlike the feature-extraction method, this method unfreezes the weights of the topmost convolutional layers of the conv_base. 
- "...it’s only possible to fine-tune the top layers of the convolutional base once the classifier on top has already been trained. If the classifier isn’t already trained, the error signal propagating through the network during training will be too large, and the representations previously learned by the layers being fine-tuned will be destroyed."
Here are the steps: 
1. Add our custom network on top of an already-trained base network.
2. Freeze the base network.
3. Train the part we added.
4. Unfreeze some layers in the base network.(Note that you should not unfreeze “batch normalization” layers, which are not relevant here since there are no such layers in VGG16. Batch normalization and its impact on fine- tuning is explained in the next chapter.)
5. Jointly train both these layers and the part we added.

We generally only want to fine-tune the topmost layers of the conv_base. The earlier layers of the pre-trained models learn generic-reusable features, so we want to keep these. Layers higher up encode representations of the more finer details of images, so these are the layers that we'll want to fine-tune to our task.  
There's also the number of parameters you're training, depending on the pretrained model, there'll generally be millions of parameters. So training deeper layers doesn't make sense, especially on a small dataset. The risk of overfitting will be increased.

Generally, we should aim to fine-tune the top 3 or 4 layers of the convolutional base. And we'll keep the learning rate very small, as we just want to slightly tweak the pre-trained model for our dataset, we don't want to harm useful represations it may have learning when it was pre-trained.

In [None]:
#Let's remind ourselves of the convolutional base
#Note, the number of parameters!
conv_base.summary()

**"We’ll fine-tune the last three convolutional layers, which means all layers up to block4_pool should be frozen, and the layers block5_conv1, block5_conv2, and block5_conv3 should be trainable."**

In [None]:
#Freezing the layers except the ones we're gna fine-tune
conv_base.trainable = True
for layer in conv_base.layers[:-4]:
    layer.trainable = False

In [None]:
#Specifying our CovNet for fine-tuning.

model.compile(loss="binary_crossentropy",
              optimizer=keras.optimizers.RMSprop(learning_rate=1e-5),
              metrics=["accuracy"])

callbacks = [
    keras.callbacks.ModelCheckpoint(
        filepath="fine_tuning.keras",
        save_best_only=True,
        monitor="val_loss")
]

history = model.fit(
    train_dataset,
    epochs=30,
    validation_data=validation_dataset,
    callbacks=callbacks)


In [None]:
#Evaluating our fine-tuned model on test-data.
model = keras.models.load_model("fine_tuning.keras") 
test_loss, test_acc = model.evaluate(test_dataset) 
print(f"Test accuracy: {test_acc:.3f}")