### This notebook is optionally accelerated with a GPU runtime.
### If you would like to use this acceleration, please select the menu option "Runtime" -> "Change runtime type", select "Hardware Accelerator" -> "GPU" and click "SAVE"

----------------------------------------------------------------------

# Wide ResNet

*Author: Sergey Zagoruyko*

**Wide Residual Networks**

<img src="https://pytorch.org/assets/images/wide_resnet.png" alt="alt" width="50%"/>

In [1]:
import torch
# load WRN-50-2:
model = torch.hub.load('pytorch/vision:v0.6.0', 'wide_resnet50_2', pretrained=True)
# or WRN-101-2
model = torch.hub.load('pytorch/vision:v0.6.0', 'wide_resnet101_2', pretrained=True)
model.eval()

Downloading: "https://github.com/pytorch/vision/archive/v0.6.0.zip" to /root/.cache/torch/hub/v0.6.0.zip
Downloading: "https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth" to /root/.cache/torch/hub/checkpoints/wide_resnet50_2-95faca4d.pth


HBox(children=(FloatProgress(value=0.0, max=138223492.0), HTML(value='')))




Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.6.0
Downloading: "https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth" to /root/.cache/torch/hub/checkpoints/wide_resnet101_2-32ee1156.pth


HBox(children=(FloatProgress(value=0.0, max=254695146.0), HTML(value='')))




ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), strid

All pre-trained models expect input images normalized in the same way,
i.e. mini-batches of 3-channel RGB images of shape `(3 x H x W)`, where `H` and `W` are expected to be at least `224`.
The images have to be loaded in to a range of `[0, 1]` and then normalized using `mean = [0.485, 0.456, 0.406]`
and `std = [0.229, 0.224, 0.225]`.

Here's a sample execution.

In [2]:
# Download an example image from the pytorch website
import urllib
url, filename = ("https://github.com/pytorch/hub/raw/master/images/dog.jpg", "dog.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)

In [3]:
# sample execution (requires torchvision)
from PIL import Image
from torchvision import transforms
input_image = Image.open(filename)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

with torch.no_grad():
    output = model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
print(output[0])
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
print(torch.nn.functional.softmax(output[0], dim=0))


tensor([ 5.6478e-01, -3.8633e-01, -7.7825e-01, -7.2455e-01, -1.3808e+00,
         1.8400e+00, -9.1918e-01,  1.5421e+00,  2.7329e+00, -4.7667e-01,
        -1.8975e+00, -1.3627e+00, -1.7911e+00, -1.3912e+00, -1.4620e+00,
        -1.4669e+00, -1.2503e+00, -4.7013e-02, -5.1384e-01,  2.1798e-01,
        -1.2788e+00, -1.8587e+00, -1.6303e+00, -2.5032e-01, -1.4787e-01,
        -1.2835e+00, -1.4187e+00, -4.4376e-01, -9.8427e-01,  3.2866e-01,
        -8.5129e-01, -8.8562e-01,  6.9742e-01, -1.2722e+00, -5.3491e-01,
        -1.9054e+00, -1.0797e+00, -1.4366e+00, -8.5091e-01, -7.4196e-01,
        -1.0407e+00, -1.9710e+00, -2.4391e+00, -1.3197e+00, -2.3621e+00,
        -1.1250e+00, -1.4544e+00, -1.3831e+00, -1.3054e+00, -6.0875e-01,
        -1.1978e+00, -9.3888e-01, -1.1911e+00, -1.3870e+00, -7.9193e-01,
        -1.2338e+00, -1.9928e+00, -1.3876e+00, -1.8375e-01, -1.3963e+00,
        -2.1826e-01, -6.9043e-01, -1.6411e+00, -1.8017e+00, -1.2745e+00,
        -3.5807e-01,  2.0469e-01, -1.1354e+00,  3.5

### Model Description

Wide Residual networks simply have increased number of channels compared to ResNet.
Otherwise the architecture is the same. Deeper ImageNet models with bottleneck
block have increased number of channels in the inner 3x3 convolution.

The `wide_resnet50_2` and `wide_resnet101_2` models were trained in FP16 with
mixed precision training using SGD with warm restarts. Checkpoints have weights in
half precision (except batch norm) for smaller size, and can be used in FP32 models too.

| Model structure   | Top-1 error | Top-5 error | # parameters |
| ----------------- | :---------: | :---------: | :----------: |
|  wide_resnet50_2  | 21.49       | 5.91        | 68.9M        |
|  wide_resnet101_2 | 21.16       | 5.72        | 126.9M       |

### References

 - [Wide Residual Networks](https://arxiv.org/abs/1605.07146)
 - [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)
 - [Mixed Precision Training](https://arxiv.org/abs/1710.03740)
 - [SGDR: Stochastic Gradient Descent with Warm Restarts](https://arxiv.org/abs/1608.03983)