# Introduction #

We saw in the first lesson how to build an image classifier by attaching an untrained head of dense layers to a pretrained convolutional base. Since the pretrained base had already learned how to extract visual features, we could train an accurate image classifier with relatively little data.

In this lesson, we'll use the understanding we've gained of the layers in the convolutional base to improve this method even more. The key is in the hierarchy of visual features we learned about in the previous lesson.

# Using the Hierarchy of Features #

Let's review what we know about the feature extraction in the convolutional base.

The base is a sequence of layers that each perform a step in the extraction process. By the time the data has reached the classifier, it has undergone the extraction process many times. Simple features in the shallow layers combine and recombine to form the complex features in the deepest layers.

<!--TODO: feature extraction-->

Because the network learns which features to extract to solve its classification problem, the features produced by the deepest layers will often closely resemble the training data.

The features produced by shallow layers, however, tend to be more generic and will generalize much better to a range of datasets.

For instance, suppose we train a convnet on images of dogs. The shallow layers will tend to produce features like lines and textures and contrasts, features useful on almost every problem. The deepest layers will tend to produce features resembling fur and eyes and snouts. It is unlikely these layers would be useful when classifying cars or flowers.

<!--TODO: optimal features-->

# Fine Tuning #

What this means is that when we are using a pretrained base as a feature extractor, it may be best to retrain some of its layers, starting with those closest to the output.

This technique is called **fine tuning**. It ensures that the layers most important for the classification have a chance to adapt to the new dataset.

In Lesson 1, we created a classifier by using a pretrained base with an untrained head of dense layers. During training, *only* the weights of the dense layers were updated. The the layers in the base were not trained; their weights we kept **frozen**.

To fine tune the model, we continue the training by **unfreezing** the layers in the base one-by-one until the model begins to **overfit**.

<!--TODO: overfitting-->

At that point, we know that unfreezing additional layers will be unlikely to improve the accuracy of the model on unseen data.

# Example - Training with Fine Tuning #

Let's work through an example to see how it goes.

## Step 1 - Create the Model ##

Start by creating a model just like you did in the first lesson, with a pretrained base and several dense layers as a head to act as the classifier.

## Step 2 - Initial Training of the Dense Layers ##

Now, keeping all the layers in the base frozen, train the model until the validation loss no longer improves.

## Step 3 - Fine Tune Top Layer ##

Next, unfreeze the top layer in the base. We do this by setting its `trainable` property to `True`.

Now we'll run another round of training. There are two things to notice here before we start. First, we'll tell Keras to begin training at epoch `X`, which is the epoch at following where we stopped the previous round. Second, we have set the training rate smaller than usual. This helps preserve the existing information in the pretrained layer.

## Step 4 - Continue Fine Tuning ##

Continue tuning additional layers until the model begins to overfit.

After two more rounds, we're done. Fine tuning has improved the validation accuracy of the model from `X` to `X`.

# Conclusion #

