In [None]:
#|default_export conv

### Convolutions

In [None]:
#|export
import torch
import torch.nn as nn
from torch.utils.data import default_collate
from miniai.training import *
from miniai.datasets import *

In [None]:
import gzip,pickle,matplotlib.pyplot as plt,numpy as np

In [None]:
with gzip.open('../data/mnist.pkl.gz', 'rb') as f: data = pickle.load(f, encoding='latin')
((x_train,y_train),(x_val,y_val),_) = data
(x_train,y_train,x_val,y_val) = map(torch.tensor,(x_train,y_train,x_val,y_val))
(x_train.shape,y_train.shape,x_val.shape,y_val.shape)

(torch.Size([50000, 784]),
 torch.Size([50000]),
 torch.Size([10000, 784]),
 torch.Size([10000]))

### Understanding Convolution Equations

* https://medium.com/impactai/cnns-from-different-viewpoints-fab7f52d159c (How it differ from norm FNN)
* https://arxiv.org/pdf/1603.07285.pdf (Convolution Arithmetic)


* The mechanics is quite simple, you take your input image and flat it out all the pixel as a single vector and start multiply each pixel with it's corresponding weights and bias then you will get one output, and if you want multiple output, you can copy this same process, just with different weights for the each output, and finaly pass all the outputs through some non-linear function. this is our simple FNN operation. the only difference is, we are going to keep the same weights across multiple outputs and the number of weights are not going to be same as number of input pixels, actually number of weights will be much less than number of input pixels. because of this, each neurons will only be multiplied with few input pixels. But we don't want to miss any input pixels, to solve this issue what we can do is, each output focus on different set of pixels, means for each output, wieghts (same weights) will be multiplied with different set of input pixels, so this way, we cover all the input pixels.
 <br/>
    
* Now we need to choose which set of weights each outputs should focus on. this is done, simply by taking weights and reshaping it into 2D format, example if number of weights are 9 then it will be reshaped as 3x3. then you place it across image horizontal and vertical directions, each placement is the set of weights for each output.  that's it. this is "Convolution".


##### Convolution Arithmetic

* In Convolution operation output number of neurons depends on input size, number of weights, padding, and stride.
 <br/>
 
* Input size: number of input pixels.
 <br/>
 
* Padding: extra number of dummy pixels added to the inputs.
<br/>

* Stride: number of pixels needs to be skipped during sliding.
<br/>

* Number of weights (kernel size): size of the shared weights.


##### Deriving General Formula for the Output Size

```python
    # Asumme input is 1D, as it can be extended to 2D.
    # Input Size: W
    # Kernel Size: K
    # Number of Strides: S
    # Number of Paddings: P
    # Output Size: O
    
    # Case 1:
        # S=1, 0<K<W, P=0
        # In this case kernel sliding one pixel at a time on the input pixels
        # until it reaches the end where kernel no longer can be contained inside the pixels, so we exclude 
        # thoes end pixels.
        
        O = (W-K+1)
        
    # Case 2:
        # P>0, S=1, 0<K<W
        # If padding is greater than 0, which means those number extra dumpy pixels both the size,
        # so we need to multiply with 2.
        
        O = (W-k+1+2*P)
        
    # Case #:
        # P>0, S>1, 0<K<W
        # When stride is greater than 0, we are going to skip those many pixels.
        
        O = ((W-K+2*P)/S) + 1
        

```