# **Deep learning for image analysis with Python**

#### Fernando Cervantes, Systems Analyst I, Imaging Solutions, Research IT
#### fernando.cervantes@jax.org    (slack) @fernando.cervantes

## **3 Implement a deep neural network**

## 3.1 _Neural network modules_

PyTorch provides several operations that can be used as building blocks to construct a neural network.<br>
Each operation is commonly referred as a **Module**, and those are implemented inside the *nn* module of pytorch.

In [3]:
import torch
import torch.nn as nn

![Image](https://pytorch.org/tutorials/_images/mnist.png)

Lets create the first convolutional layer from the LeNet architecture.<br>
That layer applies **six** $5\times5$ linear kernels over every pixel of the input image.<br>
More details of the available parameters to create a two dimensional convolutional layer can be found [here](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d)

In [6]:
conv_1 = nn.Conv2d(
    in_channels=1, # Because the input image is in gray-levels
    out_channels=6, # to generate six new feature maps / channels
    kernel_size=5,
    stride=1, # to pass the kernel filters over each pixel of the image
    padding=0, # do not add padding to the image edges (this will reduce the size of the output)
    bias=False # do not add a bias intercept to the output of this layer
)

In [7]:
conv_1

Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), bias=False)

In [8]:
print(conv_1.weight.size())
print(conv_1.weight)

torch.Size([6, 1, 5, 5])
Parameter containing:
tensor([[[[ 0.1507, -0.0824, -0.1526, -0.0141, -0.0359],
          [-0.1659, -0.0180, -0.0448, -0.1615,  0.0865],
          [ 0.1426,  0.1702,  0.1123,  0.1452,  0.1622],
          [ 0.1539, -0.1971, -0.1505, -0.0166, -0.0601],
          [ 0.0725, -0.1070, -0.1634, -0.0151,  0.0615]]],


        [[[ 0.1305,  0.0272, -0.0253,  0.1500,  0.1329],
          [ 0.1118, -0.0529,  0.1432,  0.0574,  0.1701],
          [-0.1109,  0.0641,  0.1951,  0.1996,  0.1766],
          [ 0.0248, -0.1185, -0.1578,  0.0746,  0.0195],
          [ 0.1953, -0.0111,  0.1623,  0.0767,  0.1769]]],


        [[[ 0.0219, -0.1884,  0.1405, -0.0339,  0.0444],
          [-0.0802, -0.1462, -0.1162, -0.1651, -0.0350],
          [ 0.1818, -0.1035, -0.1020,  0.1539,  0.1052],
          [ 0.0421,  0.1311,  0.1847,  0.1370, -0.0482],
          [ 0.1828,  0.1060,  0.1709,  0.0451,  0.0193]]],


        [[[ 0.0563,  0.0274, -0.1125,  0.1380,  0.1097],
          [-0.1477, -0.0965, 

Because we defined *conv_1* using a **nn.Module**, the operation will track the gradients for the weights and bias of that operation.<br>
Those weights and bias are kwnown as the *learnable parameters*.<br>
By default, bias is always added to linear and convolution operations.<br>
This single layer has $6\times1\times5\times 5 = 150$ parameters.

***
The following operation applied to the first convolution layer is a ReLU activation function.<br>
These functions can be found also inside the PyTorch's *nn* module (follow this [link](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) to see the different activation functions available).

In [9]:
act_1 = nn.ReLU()

In [10]:
act_1

ReLU()

***
The subsampling operation (S2 in the LeNet's illustration) is implemented by a *maximum pooling* operation.

In [11]:
sub_1 = nn.MaxPool2d(
    kernel_size=2,
    stride=2,
    padding=0
)

In [12]:
sub_1

MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)

***
Depending the task, the last layers of a network will be used to abstract the spatial information into a one-dimensional representation.
This is achieved by *flattening* the tensor into a *vector*, to perform linear operations (matrix-vector).
These layers are known as **Fully Connected** (FC), and are implemented by multilayer perceptrons.

In [15]:
fc_1 = nn.Linear(
    in_features=16*5*5, 
    out_features=120, 
    bias=True
)

In [16]:
fc_1

Linear(in_features=400, out_features=120, bias=True)

***
## 3.2 _Neural network inputs_

PyTorch modules expect the inputs to be in the shape of *batch size* $\times$ *input channels* $\times$ *height* $\times$ *width* 

In [18]:
x = torch.rand((1, 1, 32, 32))

Modules defined from the *nn* module have a built-in *forward* function.
The default behavior of nn modules when called is to use their correspondig *forward* function.

In [19]:
fx = conv_1(x)

In [20]:
fx.size()

torch.Size([1, 6, 28, 28])

In the LeNet's architecture illustration, the output of the first convolution layer (C1) are six feature maps of size $28\times28$.

![Image](https://pytorch.org/tutorials/_images/mnist.png)

***
The next operation is a ReLU activation function that is applied element-wise to each element of the tensor.

In [21]:
fx = act_1(fx)

In [22]:
fx.size()

torch.Size([1, 6, 28, 28])

***
The subsampling operation is applied using a kernel of size $2\times2$, and is applied every $2$ pixels, resulting on a feature map with half the size of the input tensor.

In [24]:
fx = sub_1(fx)

In [25]:
fx.size()

torch.Size([1, 6, 14, 14])

Now, the feature maps have a size of $14\times14$, just as illustrated in the LeNet's architecture.

***

## 3.3 _Defining a neural network architecture as a python class_

PyTorch provides a *pythonic* framework to develop neural networks.
For that reason, architectures are defined as classes derived from the **nn.Module** class.<br>
A class defining an neural network architecture is rquired to call the nn.Module initialization function and implement a **forward** function.

![Image](https://pytorch.org/tutorials/_images/mnist.png)

Fully connected layers are a fancy name for Multilayer perceptrons

In [26]:
class LeNet(nn.Module):
    def __init__(self, in_channels=1, num_classes=10):
        """
        Always call the initialization function from the nn.Module parent class.
        This way all parameters from the operations defined as members of *this* class are tracked for their optimization.
        """
        super(LeNet, self).__init__()
        
        self.conv_1 = nn.Conv2d(in_channels=in_channels, out_channels=6, kernel_size=5, stride=1, padding=0, bias=False)
        self.act_1 = nn.ReLU()
        self.sub_1 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        
        self.conv_2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1, padding=0, bias=False)
        self.act_2 = nn.ReLU()
        self.sub_2 = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)
        
        self.fc_1 = nn.Linear(in_features=5*5*16, out_features=120, bias=True)
        self.act_fc_1 = nn.ReLU()
        
        self.fc_2 = nn.Linear(in_features=120, out_features=84, bias=True)
        self.act_fc_2 = nn.ReLU()
        
        self.fc_3 = nn.Linear(in_features=84, out_features=num_classes, bias=True)

    def forward(self, x):
        # Apply convolution layers to extract feature maps with image context
        fx = self.act_1(self.conv_1(x))
        fx = self.sub_1(fx)
        
        fx = self.act_2(self.conv_2(fx))
        fx = self.sub_2(fx)
        
        # Flatten the feature maps to perform linear operations
        fx = fx.view(-1, 16*5*5)
        
        fx = self.act_fc_1(self.fc_1(fx))
        fx = self.act_fc_2(self.fc_2(fx))
        fx = self.fc_3(fx)
        
        y = torch.softmax(fx, dim=1)
        
        return y

In [27]:
net = LeNet(1, 10)

In [29]:
print(net)

LeNet(
  (conv_1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), bias=False)
  (act_1): ReLU()
  (sub_1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv_2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1), bias=False)
  (act_2): ReLU()
  (sub_2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc_1): Linear(in_features=400, out_features=120, bias=True)
  (act_fc_1): ReLU()
  (fc_2): Linear(in_features=120, out_features=84, bias=True)
  (act_fc_2): ReLU()
  (fc_3): Linear(in_features=84, out_features=10, bias=True)
)


In [32]:
print(x.size())

torch.Size([1, 1, 32, 32])


In [36]:
y = net(x)

In [37]:
print(y.size())

torch.Size([1, 10])


In [38]:
print(y)

tensor([[0.1055, 0.1062, 0.0923, 0.1089, 0.0953, 0.1045, 0.0970, 0.0953, 0.0931,
         0.1020]], grad_fn=<SoftmaxBackward0>)


In [40]:
print(y.sum(dim=1))

tensor([1.], grad_fn=<SumBackward1>)


In [43]:
total_parameters = 0
for par in net.parameters():
    total_parameters += torch.numel(par)

In [44]:
total_parameters

61684