# Continuous Self Motivation
> 1% Better Everyday

- toc: true
- badges: true
- categories: [personal development,motivation,habit]

![](https://jamesclear.com/wp-content/uploads/2015/08/tiny-gains-graph.jpg)

# I. Customize 1cycle


This learning rate scheduler allows us to easily train a network using Leslie Smith's 1cycle policy. To learn more about the 1cycle technique for training neural networks check out [Leslie Smith's paper](https://arxiv.org/pdf/1803.09820.pdf) and for more graphical and intuitive explanation checkout out [Sylvain Gugger's post](https://sgugger.github.io/the-1cycle-policy.html).
 
 To use 1cycle policy we will need an [optimum learning rate](https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html). We can find this learning rate by using a learning finder which can be called by using `lr_finder` as `fastai` does. It will do a mock training by going over a large range of learning rates, then plot them against the losses. We will then pick a value a bit before the minimum, where the loss still improves. Our graph would something like this:

 ![](https://fastai1.fast.ai/imgs/onecycle_finder.png)

There is somthing to add, if we are transfer learning, we do not want to start off with too large a learning rate, or we will erase the intelligence of the model already contained in its weights. Instead, we begin with a very small learning rate and increase it gradually before lowering it again to fine-tune the weights.

> Important: After digging into the rabbit hole, I found there are two different learning rate schedule utility in tensorflow, the naming is very confusing, [keras.optimizers.schedules.LearningRateSchedule](https://github.com/tensorflow/tensorflow/blob/582c8d236cb079023657287c318ff26adb239002/tensorflow/python/keras/optimizer_v2/learning_rate_schedule.py#L34-L61) and [keras.callbacks.LearningRateScheduler](https://github.com/tensorflow/tensorflow/blob/582c8d236cb079023657287c318ff26adb239002/tensorflow/python/keras/callbacks.py#L1858-L1920). Although the naming is very similar, they are different in some senses.
- The former is subclassing from `tf.keras.optimizers` while the latter is from `tf.keras.Callback`
- The former schedule the learning rate per iteration while the former is per epoch.

# II. EfficientNet
**(Read the EfficientNet paper and summarize in one of the section of this notebook)**

EfficientNet, first introduced in [Tan and Le, 2019](https://arxiv.org/abs/1905.11946) is among the most efficient models (i.e. requiring least FLOPS for inference) that reaches state-of-the-art accracy on both imagenet and common image classification transfer learning tasks.

The smallest base model is similar to [MnasNet](https://arxiv.org/abs/1807.11626), which reached near-SOTA with a significantly smaller model. By introducing a heuristic way to scale the model, EfficientNet provides a family of models (B0 to B7) that represents a good combination of efficiency and accuracy on a variety of scales. Such a scaling heuristics (**compound-scaling**, details see [Tan and Le, 2019](https://arxiv.org/abs/1905.11946)) allows the efficiency-oriented base model (B0) to surpass models at every scale, while avoiding extensive grid-search of hyperparameters.

A summary of the latest updates on the model is available at [here](), where various augmentation schemes and semi-supervised learning approaches are applied to further improve the imagenet performance of the models. These extensions of the model can be used by updating weights without changing model topology

## B0 to B7 variats of EfficientNet

## Keras implementation of EfficientNet

An implementation of EfficientNet B0 to B7 has been shipped with `tf.keras` since TF2.3. To use EfficientNetB0 for classifying 1000 classes of images from imagenet, run:

```python
from tensorflow.keras.applications import EfficientNetB0
model = EfficientNetB0(weights='imagenet')
```

The B0 model takes input images of shape (224,224,3), and the input data should range [0,255]. ***Normailzation is included as part of the model.***

Because training EfficientNet on imagenet takes a tremendous amount of resources and several techniques that are not a part of the model architecture itself. Hence the Keras implementations by default loads pre-trained weights obtained via training with [AutoAugment](https://arxiv.org/abs/1805.09501).

From B0 to B7 base model, the input shapes are different. Here is a list of input shpae expected for each model:

| Base model | resolution|
|----------------|-----|
| EfficientNetB0 | 224 |
| EfficientNetB1 | 240 |
| EfficientNetB2 | 260 |
| EfficientNetB3 | 300 |
| EfficientNetB4 | 380 |
| EfficientNetB5 | 456 |
| EfficientNetB6 | 528 |
| EfficientNetB7 | 600 |

When the model is intended for transfer learning, the Keras implementation provides a option to remove the top layers:

```python
model = EfficientNetB0(include_top=False, weights='imagenet')
```

This option excludes the final Dense layer that turns 1280 features on the penultimate layer into prediction of the 1000 ImageNet classes. Replacing the top layer with custom layers allows using EfficientNet as a feature extractor in a transfer learning workflow.

Another argument in the model constructor worth noticing is drop_connect_rate which controls the dropout rate responsible for stochastic depth. This parameter serves as a toggle for extra regularization in finetuning, but does not affect loaded weights. For example, when stronger regularization is desired, try:

```python
model = EfficientNetB0(weights='imagenet', drop_connect_rate=0.4)
```
The default value for `drop_connect_rate` is 0.

# Clarification

## [AutoAugment](https://arxiv.org/abs/1805.09501)

In this [article](https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/), in section `Keras implementation of EfficientNet`, it says

> Because training EfficientNet on ImageNet takes a tremendous amount of resources and several techniques that are not a part of the model architecture itself. Hence the Keras implementation by default loads pre-trained weights obtained via training with AutoAugment

It means the weights of keras EfficientNet are trained on the pre-trained from AutoAugment. My follow-up question is what dataset does the AutoAugment trained on?




