### EfficientNets 


### EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (Tan M., Le Q., 2019, Google Research, Brain Team)

* arXiv manuscript updates in 2020

[Paper](https://arxiv.org/abs/1905.11946)

In [1]:
import torch

assert torch.cuda.is_available() is True
%load_ext watermark

In [2]:
%watermark -p torch

torch: 1.10.2



#### EfficientNet

* One of the first systematic approaches to the network scaling question which basically is a combinatorics problem.

__Basic scaling types__:
* depth, i.e. number of layers
* width, i.e. number of channels
* resolution

__What was before__:

* Single dimension scaling, e.g. Wide ResNet.

__Searching algorithm__

* Neural architecture search (NAS) [Paper](https://arxiv.org/abs/1611.01578)

<img src="../assets/18_mobilenet.png" width="470">

__How to scale any convnet efficiently?__

* Simply scaling each of dimensions with constant ratio. The idea was named as __compound scaling method__.
* Details: 
    * To reduce the design space researchers restricted that all layers must be scaled uniformly with constant ratio.
    * Scaling up any dimension of network width, depth, or resolution improves accuracy, but the accuracy gain diminishes for bigger models.



<img src="../assets/1_efficientnet.png" width="900">

__Compound scaling__


* Note: compound scaling is model-independent.

compund scaling coefficient: $\phi$;\
depth: $d=\alpha^\phi$;\
width: $w=\beta^\phi$;\
resolution: $r=\gamma^\phi$.

$$\alpha \cdot \beta^2 \cdot \gamma^2 ≈ 2, \alpha \ge 1, \beta \ge 1, \gamma \ge 1$$

Constraints explanation: for any new $\phi$, the total FLOPS will approximately increase by $2^\phi$ because scaling a ConvNet with $d,w,r$ will approximately increase FLOPS by $(\alpha \cdot \beta^2 \cdot \gamma^2)^\phi$.

Algorithm:

Considering constraints above:
1. $\phi$ - const, searching  $\alpha, \beta, \gamma$;\
2. $\alpha, \beta, \gamma$ - const, searching best $\phi$;\
3. Return to 1.

EfficientNet-B0: $\alpha=1.2, \beta=1.1, \gamma=1.15$.



__EfficientNet__

* Proposed model was optiized with compound scaling.

* NAS optimization goal for model $m$: 
$$ ACC(m) * \left[\frac{F(m)}{T}\right]^\omega$$

$ACC(m)$ - model accuracy;\
$F(m)$ - model FLOPS;\
$T$ - the target FLOPS, 400M;\
$\omega=$-0.07, a hyperparameter for controlling the trade-off between accuracy and FLOPS.

* Main building block  - MBConv, mobile inverted bottleneck.
* [SiLU](https://pytorch.org/docs/stable/generated/torch.nn.SiLU.html) (Swish) activation
* [AutoAugment](http://pytorch.org/vision/main/generated/torchvision.transforms.AutoAugment.html) algorithm which considers augmentation policy as a discrete search problem.
* [Stochastic depth](https://paperswithcode.com/method/stochastic-depth) with survival probability 0.8.It shrinks the depth of a network during training, while keeping it unchanged during testing.

<img src="../assets/2_efficientnet.png" width="500">

<img src="../assets/3_efficientnet.png" width="750">

```
      Arch           w,   d,   r,  dropout
'efficientnet-b0': (1.0, 1.0, 224, 0.2),
'efficientnet-b1': (1.0, 1.1, 240, 0.2),
'efficientnet-b2': (1.1, 1.2, 260, 0.3),
'efficientnet-b3': (1.2, 1.4, 300, 0.3),
'efficientnet-b4': (1.4, 1.8, 380, 0.4),
'efficientnet-b5': (1.6, 2.2, 456, 0.4),
'efficientnet-b6': (1.8, 2.6, 528, 0.5),
'efficientnet-b7': (2.0, 3.1, 600, 0.5),
'efficientnet-b8': (2.2, 3.6, 672, 0.5),
'efficientnet-l2': (4.3, 5.3, 800, 0.5),
```

#### Torch [implementation](https://github.com/pytorch/vision/blob/f40c8df02c197d1a9e194210e40dee0e6a6cb1c3/torchvision/models/efficientnet.py#L152)

#### Your training code here

In [None]:
# Define data transformation pipeline.


# Initialize dataset and dataloaders.


# Initialize pretrained network, replace Linear layer with a new one for your dataset.


# Initialize optimizer, loss function and training procedure with handlers/callbacks.

#### References

* https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
* https://onnx.ai/
* https://pytorch.org/docs/stable/index.html
* https://paperswithcode.com/method/autoaugment
* http://pytorch.org/vision/main/generated/torchvision.transforms.AutoAugment.html
* https://github.com/4uiiurz1/pytorch-auto-augment
* https://paperswithcode.com/method/stochastic-depth