# Building MakeMore - Moving to CNN

Last time we implemented the architecture of WaveNet, but without the residual gates or the convolutions, just the heirachical. This time around we are going to add in the convolutions instead of only the Linear. Andrej Karparthy turtorials stop at the last one so we will be going forward learning it ourselves. 


Over all I will work through these papers:
- Bigram (one character predicts the next one with a lookup table of counts)
- MLP, following [Bengio et al. 2003](https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf)
- CNN, following [DeepMind WaveNet 2016](https://arxiv.org/abs/1609.03499) 
- RNN, following [Mikolov et al. 2010](https://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf)
- LSTM, following [Graves et al. 2014](https://arxiv.org/abs/1308.0850)
- GRU, following [Kyunghyun Cho et al. 2014](https://arxiv.org/abs/1409.1259)
- Transformer, following [Vaswani et al. 2017](https://arxiv.org/abs/1706.03762

In [34]:
## Lets import the starter code
import torch
import math
import numpy as np
import matplotlib.pyplot as plt
import random
import torch.nn.functional as F
import torch.nn as nn
import scipy
%matplotlib inline

# Let's learn CNN
A convolution is a summation of the some product. The math for this for 1d is take two arrays, flip the second one, and then slide it over the with the first doing this summation of product on each. This is shown below. 

Now when we think of this for images we have the main image (2d with no channels for gray scale) and a filter or kernal that is smaller. This smaller knernal get's slide across the image resulting in a new tensor. This is used for edge detection, guassian blur, etc. In a CNN this kernal is not set to certain values but it is learned. 

Convolutions are quite intensive O(N^2) but we can use Fast Fourier Transform (since this is a discrete Fourier transform) to get us to O(NlogN) 

np.convolve((1,2,3), (4,5,6))

In [32]:
arr1 = np.random.random(100000)
arr2 = np.random.random(100000)

In [33]:
%%timeit
np.convolve(arr1, arr2)

4.69 s ± 22.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [35]:
%%timeit
scipy.signal.fftconvolve(arr1, arr2)

5.96 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Notice that the output of this is bigger than the original arrays. This is why there are pooling layers without weights that run across it afterwards to get it back to the original size 

In [55]:
output = torch.tensor(scipy.signal.fftconvolve((1,2,3), (4,5,6))).view(1, 5)
output

tensor([[ 4.0000, 13.0000, 28.0000, 27.0000, 18.0000]], dtype=torch.float64)

In [92]:
m = nn.MaxPool1d(2, padding=1) 
pooled_output = m(output)
pooled_output

tensor([[ 4.0000, 28.0000, 27.0000]], dtype=torch.float64)