### Wide ResNet & ResNext

In [1]:
import os
import re
import numpy as np
import netron
import torch
import torchvision

assert torch.cuda.is_available() is True
%load_ext watermark

In [2]:
%watermark -p torch,ignite,numpy,netron,sklearn,pandas,plotly

torch  : 1.10.2
ignite : 0.4.8
numpy  : 1.22.1
netron : 5.5.5
sklearn: 0.24.2
pandas : 1.4.1
plotly : 5.6.0



###  Wide ResNet, Wide Residual Networks (Zagoruyko S., 2016)

[Paper](https://arxiv.org/abs/1605.07146)

* *The residual block with identity mapping that allows to train very deep networks is at the same time a weakness of residual networks. As gradient flows through the network there is nothing to force it to go through residual block weights and it can avoid learning anything during training, so it is possible that there is either only a few blocks that learn useful representations, or many blocks share very little information with small contribution to the final goal.*


* Two factors: 
    * deepening factor $l$ is the number of convolutions in a block, 
    * widening factor $k$ multiplies the number of features in convolutional layers. The baseline «basic» block: $l$ = 2, $k$ = 1. 
   
   
* $B(M)$ is the residual block structure with list $M$ of conv kernel sizes. so

    1. $B(3, 3)$ - original «basic» block;
    2. $B(3, 1, 3)$ - with one extra 1 × 1 layer;
    3. $B(1, 3, 1)$ - with the same dimensionality of all convolutions, «straightened» bottleneck;
    4. $B(1, 3)$ - the network has alternating 1 × 1 - 3 × 3 convolutions everywhere;
    5. $B(3, 1)$ - similar idea to the previous block;
    6. $B(3, 1, 1)$ - Network-in-Network style block;


* It is more computationally effective to widen the layers than have thousands of small kernels as GPU is much more efficient in parallel computations on large tensors, so we are interested in an optimal $\frac{d}{k}$ ratio.


* The widening of ResNet blocks (if done properly) provides a much more effective way of improving performance of residual networks compared to increasing their depth.


* __WRN-n-k__ denotes a residual network that has a total number of convolutional layers $n$ and a widening factor $k$.

<img src="../assets/1_wide_resnet.png" width="600">

<img src="../assets/2_wide_resnet.png" width="600">

<img src="../assets/3_wide_resnet.png" width="650">

#### Torch [implementation](https://github.com/pytorch/vision/blob/b4cb352c586ee6104a79b2d367d019ca480759b3/torchvision/models/resnet.py#L382)

In [3]:
tuple(arch for arch in dir(torchvision.models) if re.match('wide', arch))

('wide_resnet101_2', 'wide_resnet50_2')

### ResNext, Aggregated Residual Transformations for Deep Neural Networks (Xie S. et al., Facebook AI Research, 2017)

[Paper](https://arxiv.org/abs/1611.05431)


* Homogeneous, multi-branch architecture that has only a few hyper-parameters to set;


* New “cardinality” dimension $C$ (the size of the set of transformations), as an essential factor in addition to the dimensions of depth and width;


* Group convolutions as equivalent to the multiple branches in a ResNeXt block.

<img src="../assets/1_resnext.png" width="500">

<img src="../assets/2_resnext.png" width="450">

<img src="../assets/3_resnext.png" width="450">

* resnext50_32x4d 32 groups, 4d channels per group;
    * conv2, d=1, 4 channels per group;
    * conv3, d=2, 8 channels per group;
    * conv4, d=4, 16 channels per group;
    * conv5, d=8, 32 channels per group;
* resnext101_32x8d: 32 groups, 8d channels per group;

#### Torch [implementation](https://github.com/pytorch/vision/blob/b4cb352c586ee6104a79b2d367d019ca480759b3/torchvision/models/resnet.py#L356)

In [4]:
tuple(arch for arch in dir(torchvision.models) if re.match('resnext', arch))

('resnext101_32x8d', 'resnext50_32x4d')

In [5]:
resnext50 = torchvision.models.resnext50_32x4d()
x = torch.Tensor(np.random.normal(size=(1, 3, 224, 224)))
model_path = os.path.join('onnx_graphs', 'mbnet2.onnx')
torch.onnx.export(resnext50, x, model_path,
                  input_names=['input'], output_names=['output'], opset_version=10)
netron.start(model_path, 30000)

Serving 'onnx_graphs/mbnet2.onnx' at http://localhost:30000


('localhost', 30000)

#### Your training code here

In [None]:
# Define data transformation pipeline.


# Initialize dataset and dataloaders.


# Initialize pretrained network, replace Linear layer with a new one for your dataset.


# Initialize optimizer, loss function and training procedure with handlers/callbacks.

#### References

* https://github.com/szagoruyko/wide-residual-networks
* https://pytorch.org/hub/pytorch_vision_wide_resnet/
* https://github.com/facebookresearch/ResNeXt
* https://pytorch.org/hub/pytorch_vision_resnext/
* https://onnx.ai/
* https://pytorch.org/docs/stable/index.html