<a href="https://colab.research.google.com/github/DavoodSZ1993/Dive-into-Deep-Learning-Notes-/blob/main/08_modern_CNNs_notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Convolutional Neural Networks

* **Convolution** (`nn.Conv2d()` & `nn.LazyConv2d()`): Given the input size ($n_h\times n_w$), and the kernel size ($k_h \times k_w$), the output size is as follows:
$$
(n_h - k_h + 1) \times (n_w - k_w + 1)
$$

* **Padding**: Given the input size ($n_h \times n_w$), the kernel size ($k_h \times k_w$), when adding a total of $p_h$ rows of padding and a total of $p_w$ columns of padding, the output size will be as follows:
$$
(n_h - k_h + p_h + 1) \times (n_w - k_w + p_w + 1)
$$

The `padding=1` argument in `nn.Conv2d()` will add one row at top, and one row at bottom ($p_h=2$), and one column at left and one column at right ($p_w=2$)

* **Stride**: Given the input size ($n_h \times n_w$), the kernel size ($k_h \times k_w$), padding size ($p_h \times p_w$), when the stride for hight is $s_h$ and th stride for the width is $s_w$, the output shape will be as follows:
$$
[{n_h - k_h + p_h + s_h \over s_h}] \times [{n_w - k_w + p_w + s_w \over s_w}] 
$$



### Class `nn.AdaptiveAvgPool2d(output_size)`: 
Applies a 2D adaptive average pooling over an input signal composed of several input planes. 

* input: ($N, C, H_{in}, W_{in}$) or ($C, H_{in}, W_{in}$)
output: ($N, C, S_0, S_1$) or ($C, S_0, S_1$) where $S$=`output_size`.

In [2]:
import torch
from torch import nn

In [5]:
X = torch.tensor([[1, 2],
                  [3, 4]], dtype=torch.float32)  # 1 x 1 2 x 2

net = nn.AdaptiveAvgPool2d((1))   # 1 x 1 x 1 x 1

net(X), X.mean()

(tensor([[2.5000]]), tensor(2.5000))

## Batch Normalization

$$
BN(𝐱) = 𝛄 ⊗ {𝐱 - 𝝻̂_{𝖁} \over 𝛔̂_{𝖁}} + 𝛃
$$
where 𝖁 is minibatch and $𝙭 ∈ 𝖁$. $𝓤̂$ is the sample mean and $𝛔̂_{𝖁}$ is the sample standard deviation. 𝛄 and 𝛃 are scale parameter and shift parameter respectively. 


### Fully Connected Layers

When using a fully connected layer, calculate the mean and variance on the feature dimension.
$$
𝐡 = Φ(BN(𝑾𝐱 + 𝐛))
$$


In [6]:
X = torch.tensor([[1, 2],
                  [3, 4]], dtype=torch.float32)

X.mean(dim=0), X.mean(dim=1)  # mean along rows, mean along columns

(tensor([2., 3.]), tensor([1.5000, 3.5000]))

### Convolutional Layers

When using a two-dimensional convolution layer, calculate the mean and variance on the channel dimension (dim=1).

### Layer Normalization

## General Notes

### Python `super()`

Returns objects represented in the parent's class.