# Convoluntional Neural Network
## Motivation
We need to utilize the spatial structure of our data.
### 2D Convolution Layer
In fully-connected layer we stretch the $32\times 32\times 3$ image to $3072\times 1$(where we lose the spatial information of data)
![image.png](attachment:image.png)

In convolution layer we preserve the tensor of image and thus preserve the spatial structure.
We use convolving filter, which acts like the weight matrix in fully-connected layer, to get the activation map.
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)
And we can set multiple filters and get a stack of activation maps.
![image-4.png](attachment:image-4.png)
We also set bia terms for each filter, here it becomes a 6-dim bias vector.
- The activation map can be seen as how the image has responded to the filter; feature map.

![image-5.png](attachment:image-5.png)
The $C_{OUT}$ is just the number of filters from the last convolution.

Convolution is also a sort of linear operation.  
Multiple convolution is still one convolution operation.
So we still should insert some non-linear activation function.
![image-6.png](attachment:image-6.png)

### Padding
As in the process of doing convolution, the features are shrinking, we add padding to avoid the constraints to the layers
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)

### Stride
Motivation:
In one-layer convolution nn, the elements of the output depend on the receptive fields(kxk) of the conv layer.   
Successively, in multi-layer conv net, each conv layer adds k-1 to the receptive field size. 
And we hope, after some number of layers the output can utilize the information of the whole image. But for large images, we want to accelerate the process(use fewer conv layers).
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
Usually we can set the conv layer so that the stride can divide.

### Summary
![image.png](attachment:image.png)

#### Other Types of Convolution
1D Convolution:   
Input:$C_{in}\times W$ (1 channel x 1 spatial information)      
Out:$C_{out}\times C_{in}\times K$
Can be used to process sequence of textual or audio data.
![image.png](attachment:image.png)


### Pooling Layer
A way to downsample, similar to conv layer.    
Hyperparameter: kernel size; stride; function
#### Max Pool
The function is max function.
![image.png](attachment:image.png)
It brings some invariance(the original elements).       
And it has no learnable parameters.     
![image-2.png](attachment:image-2.png)

### Batch Normalization
#### Motivation
When the network becomes very deep, it may be very difficult to converge.  
#### BN: Training       
We can add a normalization layer inside, which helps reduce "internal covariate shift".  
![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)
But zero-mean, unit variance may be a too hard constraint.    
So we add learnable parameters $\gamma$ and $\beta$
$$y_{i,j} = \gamma_j \hat x_{i,j} + \beta_j$$
this allows the nn to learn the means and variances to minimize the loss; and if $\gamma = \sigma,\ \beta = \mu$, it will recover the identity function

#### BN:Testing
But doing batch normalization means the output of the elements of the batch depend on the other elements in the batch, which is a bad property at the test time.(??????)
So we normally use different strategy in training and testing processes.
![image-3.png](attachment:image-3.png)

![image-4.png](attachment:image-4.png)

#### Pros
- Faster and easier to train
- Allow higher-learning rates, faster convergence

#### Cons
- Lacks theoretical basis
- Behaves differently during training and testing, a common source of bugs.

### Layer Normalization
Instead of doing average and variance of a batch, we do this to the features.(Thus the elements are still independent in the batch)    
And also we can do the same during training and testing.
Used in RNNs and Transformers.
![image.png](attachment:image.png)

### Instance Normalization
For convolutional networks.
Average only on the spatial dimentions.
![image.png](attachment:image.png)

![image.png](attachment:image.png)