## What Is Transfer Learning?
- Transfer learning generally refers to a process where a model trained on one problem is used in some way on a second, related problem. In deep learning, transfer learning is a technique whereby a neural network model is first trained on a problem similar to the problem that is being solved. One or more layers from the trained model are then used in a new model trained on the problem of interest.

## Fine Tuning

- Upate the whole model on labeled data + any additional layers added on top
- Freeze a subset of the model
- Freeze the whole model and only train the additional layers added on top

### With a learning rate set too high, the pretrained weights are being changed in large steps and all the information learned during pretraining is lost. Finding a learning rate that works can be tricky—set the learning rate too low and convergence is very slow, too high and pretrained weights are lost.



## Learning rate schedule
### The most traditional learning rate schedule when training neural networks is to have the learning rate start high and then decay exponentially throughout the training. When fine-tuning a pretrained model, a warm-up ramp period can be added (see Figure 3-6).

## Differential learning rate

### Another good trade-off is to apply a differential learning rate, whereby we use a low learning rate for the pretrained layers and a normal learning rate for the layers of our custom classification head.

 ### In fact, we can extend the idea of a differential learning rate within the pretrained layers themselves—we can multiply the learning rate by a factor that varies based on layer depth, gradually increasing the per-layer learning rate and finishing with the full learning rate for the classification head.

### In order to apply a complex differential learning rate like this in Keras, we need to write a custom optimizer. But fortunately, an open source Python package called AdamW exists that we can use by specifying a learning rate multiplier for different layers (see 03_image_models/03b_finetune_MOBILENETV2_flowers5.ipynb in the GitHub repository for the complete code):



## Example

In [None]:
mult_by_layer={
    'block1_': 0.1,
    'block2_': 0.15,
    'block3_': 0.2,
    ... # blocks 4 to 11 here
    'block12_': 0.8,
    'block13_': 0.9,
    'block14_': 0.95,
    'flower_prob': 1.0, # for the classification head
}

optimizer = AdamW(lr=LR_MAX, model=model,
                   lr_multipliers=mult_by_layer)