# Structure of CNN

- Convolution Layer
    - padding
    - stride
- Pooling Layer

## `Convolution Layer`

### Caculating the output

In [1]:
def get_conv_outsize(input_size, kernel_size, stride, pad):
    return (input_size + pad*2 - kernel_size) // stride + 1

In [2]:
H, W = 4, 4 # input
KH, KW = 3, 3 # kernel
SH, SW = 1, 1 # stride
PH, PW = 1, 1 # padding

OH = get_conv_outsize(H, KH, SH, PH)
OW = get_conv_outsize(W, KW, SW, PW)

print(OH, OW)

4 4


### 3D Tensor calculation

> Even we calculate 3D Tensor if we use 3D Kernel it's still **2D Convolution**!

- `3D Tensor block` - $Channel \times H \times W$
- `Kernel block` - $Channel \times KH \times KW$

$(C \times H \times W) \otimes (C \times KH \times KW) \rightarrow  (1 \times OH \times OW)$

---
When we want to make more channels at output we have to increase `Kernel number`

- `3D Tensor block` - $Channel \times H \times W$
- `Kernel block` - $OC \times Channel \times KH \times KW$

$(C \times H \times W) \otimes (OC \times C \times KH \times KW) \rightarrow (OC \times OH \times OW)$

#### Kernel filter shape - `(output channel, input_channel, height, width)`

### Bias of Convolution

- `3D Tensor block` - $C \times H \times W$
- `Kernel block` - $OC \times C \times KH \times KW$
- `Bias` - $OC \times 1 \times 1$

---

- $(C \times H \times W) \otimes (OC \times C \times KH \times KW) \rightarrow (OC \times OH \times OW)$
- `Bias` - $(OC \times OH \times OW) + (OC \times 1 \times 1) \rightarrow (OC \times OH \times OW)$ 

### Mini-batch of Convolution

- `3D Tensor block` - $N(Batch size) \times C \times H \times W$
- `Kernel block` - $OC \times C \times KH \times KW$
- `Bias` - $OC \times 1 \times 1$

---

- $(N \times C \times H \times W) \otimes (OC \times C \times KH \times KW) \rightarrow (N \times OC \times OH \times OW)$
- `Bias` - $(N \times OC \times OH \times OW) + (OC \times 1 \times 1) \rightarrow (N \times OC \times OH \times OW)$ 

## `Pooling Layer`
- Max Pooling
- Average Pooling
- etc

---

1. No need of `Parameter`
    - Because we just need to calculate **Max** or **Average** from inputs
2. `Channel` number doesn't change
    - The calculation of pooling held to each channel independently
3. `Robust` to small changes
    - If the input data changes are not small pooling result doesn't change at all.
    - So we say pooling layer is robust to small changes

In [7]:
###### import numpy as np

original = np.array([[1, 2, 3, 4],
                     [5, 6, 0, 1]])
small_change = np.array([[2, 1, 3, 4],
                         [6, 5, 2, 1]])
original, small_change

(array([[1, 2, 3, 4],
        [5, 6, 0, 1]]),
 array([[2, 1, 3, 4],
        [6, 5, 2, 1]]))

**We can see that 2 tensor has difference!**

In [10]:
original - small_change

array([[-1,  1,  0,  0],
       [-1,  1, -2,  0]])

**But Max pooling gives the same result! We say this is `robust`**

In [8]:
np.max(original[:, :2]), np.max(original[:, 2:])

(6, 4)

In [9]:
np.max(small_change[:, :2]), np.max(small_change[:, 2:])

(6, 4)