# Exercises:

$$\begin{bmatrix} 3 & 0 & 1 & 2 & 7 & 4 \\ 1 & 5 & 8 & 9 & 3 & 1 \\ 2 & 7 & 2 & 5 & 1 & 3 \\ 0 & 1 & 3 & 1 & 7 & 8 \\ 4 & 2 & 1 & 6 & 2 & 8 \\ 2 & 4 & 5 & 2 & 3 & 9\end{bmatrix} \circledast \begin{bmatrix} 1 & 0 & -1 \\ 1 & 0 & -1 \\ 1 & 0 & -1 \end{bmatrix} = \begin{bmatrix} ? & ? & ? & ? \\ ? & ? & ? & ? \\ ? & ? & ? & ? \\ ? & ? & ? & ? \end{bmatrix}$$

<p style="text-align: center;"><b>Figure 1</b></p>

1. Using Figure 1, find the resulting matrix.


2. Perform average pooling to get the resulting matrix from 4x4 to a 2x2 matrix.


3. How much padding would we need to add to the input matrix to ensure the resulting matrix is the same size (i.e. 6x6)?


4. What is a batch or mini-batch?

In [1]:
import numpy as np
from scipy import signal, ndimage
from skimage import measure

input_arr = np.array([[3, 0, 1, 2, 7, 4],
                      [1, 5, 8, 9, 3, 1],
                      [2, 7, 2, 5, 1, 3],
                      [0, 1, 3, 1, 7, 8],
                      [4, 2, 1, 6, 2, 8],
                      [2, 4, 5, 2, 3, 9]])

kerna_arr = np.array([[1, 0, -1],
                      [1, 0, -1],
                      [1, 0, -1]])

#NOTE THE CORRELATE INSTEAD OF CONVOLVE. IN CONVNETS WE DO CORRELATIONS
print("Correlation")
a_out = signal.correlate(input_arr, kerna_arr, mode='valid')
print(a_out)
print("\nConvolution")
b_out = signal.convolve(input_arr, kerna_arr, mode='valid')
print(b_out)
print("\nCorrelation with Padding of 0s")
print(ndimage.correlate(input_arr, kerna_arr))
print("\nConvolution with Padding of 0s")
print(ndimage.convolve(input_arr, kerna_arr))

print("\nAverage Pooling")
print(measure.block_reduce(a_out, (2,2), np.average))

print("\nMax Pooling")
print(measure.block_reduce(a_out, (2,2), np.max))
print(measure.block_reduce(b_out, (2,2), np.max))

Correlation
[[ -5  -4   0   8]
 [-10  -2   2   3]
 [  0  -2  -4  -7]
 [ -3  -2  -3 -16]]

Convolution
[[ 5  4  0 -8]
 [10  2 -2 -3]
 [ 0  2  4  7]
 [ 3  2  3 16]]

Correlation with Padding of 0s
[[  2  -3  -8  -7   4   8]
 [ -6  -5  -4   0   8   3]
 [-10 -10  -2   2   3  -1]
 [ -4   0  -2  -4  -7  -9]
 [ -1  -3  -2  -3 -16 -13]
 [ -2  -3   0   3 -16 -18]]

Convolution with Padding of 0s
[[-2  3  8  7 -4 -8]
 [ 6  5  4  0 -8 -3]
 [10 10  2 -2 -3  1]
 [ 4  0  2  4  7  9]
 [ 1  3  2  3 16 13]
 [ 2  3  0 -3 16 18]]

Average Pooling
[[-5.25  3.25]
 [-1.75 -7.5 ]]

Max Pooling
[[-2  8]
 [ 0 -3]]
[[10  0]
 [ 3 16]]


### Softmax
Think back to logistic regression and the equations we needed to use there. For multinomial logistic regression the goal was to find the probability of the output as each of the classes. This required we have a vector output $z = [z_1, z_2, ..., z_k]$ of $k$ arbitrary values. This vector was then mapped to a probability distribution with each value in the range $(0,1)$ and altogether summing to $1$. This resulted in us deriving the softmax function:

$$softmax(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{k} e^{z_j}} 1 \le i \le k $$

We occassionally need to do this in Neural Networks as well. We have multiple outputs and so in order to effectively get a resulting classification, we apply the softmax so each output node gives us the probability of itself being correct.

### Batch Norm
We have already discussed what batches are in regards to neural networks. [Batch normalization](https://en.wikipedia.org/wiki/Batch_normalization) is exactly what it sounds like. Each layer in the network that does batch normalization re-centers and re-scales to a normal distribution so we don't suffer internal covariate shift during training. If you want to look at the equations done to achieve batch normalization at each layer please check out the wiki page. The key part is understanding that batch normalization is just normalizing the input so we don't have sporatic layers with unexpected inputs resulting in covariate shift.