# **Demo:** Net2DeeperNet on CIFAR with Inception-V2

The following demo shows how to apply Net2WDeeperNet to Inception-V2 in order to increase the number of output filters in each layer of the Inception blocks. The input image shape is the one of CIFAR-10 but the network and the Net2DeeperNet algorithm can be applied to any other image size.

In [1]:
# Import libraries
import torch
import numpy as np
import torchinfo
import ssl

# Import custom modules and packages
from models.inceptionv2 import GoogleNetBN
import params.inceptionv2_cifar
import net2net.net2net_deeper

### 1. Create an Inception-V2 model narrower than the original one

We start by creating an Inception-V2 model, narrower than the standard model: the number of convolution channels at each layer within all Inception modules is reduced by a factor of $\sqrt{0.3}$. The rest of the network remains the same.

In [2]:
# Create a downsized version of the Inception-V2 network
# (with 10 classes instead of 1000 for demo purposes)
model = GoogleNetBN(nb_classes=10, inception_factor=np.sqrt(0.3))

# Create a random input
x = torch.randn(1,
                params.inceptionv2_cifar.NB_CHANNELS,
                *params.inceptionv2_cifar.IMAGE_SHAPE)

# Compute the output of the teacher network
# (forward pass to initialize the Lazy modules)
y_teacher = model(x)



### 2. Expand the standard architecture of Inception-V2 using the Net2DeeperNet algorithm

 The algorithm is applied to the Inception modules and the fully-connected layer only, since the rest of the network is already standard. The weights and biases of the student model (the wider one) are initialized with those of the teacher model (the narrower one), in such a way that the output of the student model is the same as the output of the teacher model for the same input at initialization.

In [3]:
ssl._create_default_https_context = ssl._create_unverified_context

# Instantiate a Net2Net object from a (pre-trained) model
net2net = net2net.net2net_deeper.Net2Net(teacher_network=model, dataset_used="CIFAR10")

# Get the list of deepening operations
deeper_operations = params.inceptionv2_cifar.deeper_operations

# Add some noise to the copied weights (optional)
sigma = 0.  # Standard deviation of the noise

# Apply the Net2Net widening operations and get the student network
net2net.net2deeper(deeper_operations)
student_model = net2net.student_network

# Compute the output of the student network
y_student = student_model(x)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to cifar-10-batches-py\cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:33<00:00, 5029956.54it/s]


Extracting cifar-10-batches-py\cifar-10-python.tar.gz to cifar-10-batches-py
Files already downloaded and verified
The weights and bias of the new batch normalization layer arenot initialized yet. To be implemented.
The weights and bias of the new batch normalization layer arenot initialized yet. To be implemented.
The weights and bias of the new batch normalization layer arenot initialized yet. To be implemented.
The weights and bias of the new batch normalization layer arenot initialized yet. To be implemented.
The weights and bias of the new batch normalization layer arenot initialized yet. To be implemented.
The weights and bias of the new batch normalization layer arenot initialized yet. To be implemented.
The weights and bias of the new batch normalization layer arenot initialized yet. To be implemented.
The weights and bias of the new batch normalization layer arenot initialized yet. To be implemented.
The weights and bias of the new batch normalization layer arenot initialized 



Device: cpu



Epoch 0 [train]: 100%|██████████| 313/313 [05:03<00:00,  1.03batch/s, batch_loss=2.5] 


### 3. Check that the student and teacher models have the same output for the same input

We check that the output of the student model is the same as the output of the teacher model for the same input at initialization. They can be slightly different if some noise has been added to the weights of the student model during the initialization.

In [4]:
# The outputs should be the same
print("Teacher output: ", y_teacher)
print("Student output: ", y_student, "\n")

Teacher output:  tensor([[ 0.1244,  0.1411, -0.1354, -0.0341,  0.1533, -0.1911,  0.1690, -0.3691,
         -0.2764,  0.3873]], grad_fn=<AddmmBackward0>)
Student output:  tensor([[ 1.3713e-01,  1.7307e-01, -1.1356e-01, -1.6519e-04,  1.2193e-01,
         -1.9643e-01,  1.6833e-01, -3.5557e-01, -3.1146e-01,  3.5010e-01]],
       grad_fn=<AddmmBackward0>) 



### 4. Have a look at the student and teacher architectures

We start by displaying the architecture of the teacher model. We can check that the number of convolution channels at each layer within all Inception modules is reduced by a factor of $\sqrt{0.3}$. The model has $1.886.577$ trainable parameters.

In [5]:
# Display the architecture of the student network
torchinfo.summary(model, input_size=(1,
                                     params.inceptionv2_cifar.NB_CHANNELS,
                                     *params.inceptionv2_cifar.IMAGE_SHAPE))

Layer (type:depth-idx)                        Output Shape              Param #
GoogleNetBN                                   [1, 10]                   --
├─Sequential: 1-1                             [1, 10]                   --
│    └─Sequential: 2-1                        [1, 64, 9, 9]             --
│    │    └─Conv2d: 3-1                       [1, 64, 17, 17]           9,472
│    │    └─BatchNorm2d: 3-2                  [1, 64, 17, 17]           128
│    │    └─ReLU: 3-3                         [1, 64, 17, 17]           --
│    │    └─MaxPool2d: 3-4                    [1, 64, 9, 9]             --
│    └─Sequential: 2-2                        [1, 192, 5, 5]            --
│    │    └─Conv2d: 3-5                       [1, 64, 9, 9]             4,160
│    │    └─BatchNorm2d: 3-6                  [1, 64, 9, 9]             128
│    │    └─ReLU: 3-7                         [1, 64, 9, 9]             --
│    │    └─Conv2d: 3-8                       [1, 192, 9, 9]            110,784
│    │ 

We then display the architecture of the student model. We can check that the number of convolution channels at each layer within all Inception modules is the same as the standard model. The model has $5.998.362$ trainable parameters. Thus, the number of parameters in the teacher model is about $31.5\%$ of the number of parameters in the student model.

In [6]:
# Display the architecture of the student network
torchinfo.summary(student_model, input_size=(1,
                                             params.inceptionv2_cifar.NB_CHANNELS,
                                             *params.inceptionv2_cifar.IMAGE_SHAPE))

Layer (type:depth-idx)                        Output Shape              Param #
GoogleNetBN                                   [1, 10]                   --
├─Sequential: 1-1                             [1, 10]                   --
│    └─Sequential: 2-1                        [1, 64, 9, 9]             --
│    │    └─Conv2d: 3-1                       [1, 64, 17, 17]           9,472
│    │    └─BatchNorm2d: 3-2                  [1, 64, 17, 17]           128
│    │    └─ReLU: 3-3                         [1, 64, 17, 17]           --
│    │    └─MaxPool2d: 3-4                    [1, 64, 9, 9]             --
│    └─Sequential: 2-2                        [1, 192, 5, 5]            --
│    │    └─Conv2d: 3-5                       [1, 64, 9, 9]             4,160
│    │    └─BatchNorm2d: 3-6                  [1, 64, 9, 9]             128
│    │    └─ReLU: 3-7                         [1, 64, 9, 9]             --
│    │    └─Conv2d: 3-8                       [1, 192, 9, 9]            110,784
│    │ 