# Physics 494/594
## Convolutional Networks Structure

In [None]:
# %load ./include/header.py
import numpy as np
import matplotlib.pyplot as plt
import sys
from tqdm import trange,tqdm
sys.path.append('./include')
import ml4s

%matplotlib inline
%config InlineBackend.figure_format = 'svg'
plt.style.use('./include/notebook.mplstyle')
np.set_printoptions(linewidth=120)
ml4s.set_css_style('./include/bootstrap.css')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
π = np.pi

## Last Time

### [Notebook Link: 24_Image_Filters.ipynb](./24_Image_Filters.ipynb)

- Image filters: local feature detection and sampling

## Today

- Convolutional networks: reducing free parameters and encoding the properties of images 
- Explore a simple example to understand how filters work


### Network Structure

Until now we have completely focussed on deep neural networks where all neurons are connected between each layer.  

In [None]:
n = [9,9]
ml4s.draw_network(n)

Here we have $9\times 9 + 9 = 90$ parameters.  If we have consider the MNIST data set ($28\times28 = 784$) with an input layer, 2 hidden layers each with 100 neurons, and a softmax layer we would have:

In [None]:
print(f'Number weights = {784*100*100*10:g}')
print(f'Number biases = {100+100+10}')

For a convolutional layer, we appeal to the action of the kernel's above, they act locally, everywhere across the image.   For a filter (kernel) size of $F=3$ the equivalent convolutional layer would have the structure:

In [None]:
n = [9,9]

# Generate F random weights and set the rest to zero
F = 3
weights = np.zeros(n)
fw = np.random.uniform(low=0.5,high=1.0,size=F)
for i in range(n[0]):
    for μ in range((F//2)*(-1),(F//2)+1):
        if 0 <= i+μ < n[1]:
            weights[i,i+μ] = fw[μ+1]
            
w = [weights]
b = [np.random.random()*np.ones(n[1])]

ml4s.draw_network(n,weights=w,biases=b, weight_thickness=True)

Thus we use the **same weights and biases** for each neuron in the connected layer.  Here we have chosen random weights, but for our ConvNet we will scan the weights and biases over the input image (with a step size `stride`, usually 1) and learn the kernel weights and biases via backpropagation!

#### Plot the weight matrix

In [None]:
plt.matshow(weights, cmap='coolwarm')
plt.colorbar(label='weights', shrink=0.8)

### Example

Let's consider a 1D input with $5$ neurons with zero padding $P=1$, a stride of $S=1$ and kernel with $F=3$.  We  will take the weights $w = [2,1,-1]$ and biases $0$.

In [None]:
n = [7,5]
weights = np.zeros(n)
fw = [2,1,-1]

for i in range(5):
    weights[i,i] = fw[0]
    
for i in range(1,6):
    weights[i,i-1] = fw[1]
    
for i in range(2,7):
    weights[i,i-2] = fw[2]

w = [weights]
b = [np.zeros(n[1])]

ml4s.draw_network(n,weights=w,biases=b, weight_thickness=True, node_labels=[[0,1,2,-1,1,-3,0],[-1,5,2,2,-1]])

#### Summary:

1. This process exploits translational invariance, as the filter will pick up a feature located anywhere in the image.
2. There is a drastic reduction in the number of weights and biases that need to be stored and learned.  We go from $n_0 \times n_1$ to the kernel size ($F\times F$ for 2D images). This is independent of the image resolution (size).
3. ConvNets will train faster and are the state-of-the art for image classification tasks.

There are some nice lecture notes on CNNs here: https://cs231n.github.io/convolutional-networks/

The final hyper-parameter is that we can have multiple filters for each input.  These are called `filters`.  They could represent many operations, e.g. smoothing, extracting edges, etc.  We will have a new set of weights for each channel.   