# Physics 494/594
## Image Filters

In [None]:
# %load ./include/header.py
import numpy as np
import matplotlib.pyplot as plt
import sys
from tqdm import trange,tqdm
sys.path.append('./include')
import ml4s

%matplotlib inline
%config InlineBackend.figure_format = 'svg'
plt.style.use('./include/notebook.mplstyle')
np.set_printoptions(linewidth=120)
ml4s.set_css_style('./include/bootstrap.css')
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
π = np.pi

## Last Time

### [Notebook Link: 23_MNIST_Generalization.ipynb](./23_MNIST_Generalization.ipynb)

- Explored the robustness and generalizability of our classification model.
- Investigated translations and rotations of digits.

## Today

- Image filters: local feature detection and sampling

### Generalization of our Deep Neural Network for Digit Recognition

Last time we saw that while we could train a network to have less than 2% error rate, it failed to generalize to the simple case of translating or rotating digits.  This is mostly due to the fact that the MNIST data set has been pre-processed, such that all digits are in the center are are aligned.

We discussed building a set of synthetic training data by rotating and translating the digits.  This is a **good idea** and is often done in practice.  However, this can drastically increase the training time.

Recall that we input our `28x28` pixel images of digits as a flattened array of 784.  When doing this we are essentially throwing away spatial correlations in the image.  Convolutional Networks (*ConvNets*) solve both the time and correlation problem and are the state of the art for classification problems involving images.  The first step is to introduce what we mean by the term *convolution*.

#### Convolutions

Given a function $y(\boldsymbol{x})$ we can apply a convolution via a *kernel* $K(\boldsymbol{x})$ by integrating over our domain:

\begin{equation}
\tilde{y}(\boldsymbol{x}) = \int K(\boldsymbol{x}-\boldsymbol{x}^\prime) y(\boldsymbol{x}^\prime) d\boldsymbol{x}^\prime
\end{equation}

where $K(\boldsymbol{x})$ is usually a *local* function, i.e. it decays rapdily with distance.  If it is short-ranged enough, we can replace the integral with a sum over points.  Supose we intepret $y(\boldsymbol{x})$ as matrix of pixels $y_{ij}$, then we can write:

\begin{equation}
\tilde{y}_{ij} = \sum_{\mu,\nu} K_{\mu\nu} y_{i+\mu j+\nu} 
\end{equation}

where we have assumed unit grid spacing.  Here, $\mu$ and $\nu$ run over some small range known as the **receptive field size** or **kernel size** often labeled $F$. Let's explore the types of convolutions we can do on a simple image.

In [None]:
powerT = plt.imread('../data/power_T.png')[:,:,1]
powerT -= np.min(powerT)
powerT /= np.max(powerT)

# flip pixels
powerT = np.abs(1-powerT)
rows,cols = powerT.shape

plt.matshow(powerT, cmap='binary')
plt.xticks([])
plt.yticks([]);

#### Smoothing

We can apply a Gaussian at each point of the image

\begin{equation}
K_{\mu \nu} = \frac{1}{F^2}\mathrm{e}^{-\mu^2 - \nu^2}
\end{equation}

and lets use a kernel size of $F=5$, i.e. $\mu,\nu \in \{-2,-1,0,1,2\}$ in our convolution. We obviously need to worry about what we do at the boundary.  The simplest thing is just to pad with zeros.

In [None]:
y = np.pad(powerT,(1,1),mode='edge')
ỹ = np.zeros_like(powerT)
F = 5
for i in range(1,rows):
    for j in range(1,cols):
        Ky = 0.0
        for μ in range(-2,3,1):
            for ν in range(-2,3,1):
                Ky += np.exp(-μ**2-ν**2)*y[i+μ,j+ν]
        ỹ[i-1,j-1] = Ky/F**2
ỹ /= np.max(ỹ)  

plt.matshow(ỹ, cmap='binary')
plt.xticks([])
plt.yticks([]);

### Edge Detection

Edges can be found by modifying our above kernel to use an odd function:

\begin{equation}
K_{\mu\nu} = \frac{1}{F^2}\mathrm{e}^{-\mu^2 - \nu^2} \sin \left[\frac{\pi}{2} (\mu-\nu)\right]
\end{equation}

where we will choose $F=3$.

In [None]:
y = np.pad(powerT,(1,1),constant_values=0)
ỹ = np.zeros_like(powerT)
F = 3
for i in range(1,rows):
    for j in range(1,cols):
        Ky = 0.0
        for μ in range(-1,2,1):
            for ν in range(-1,2,1):
                Ky += np.exp(-μ**2-ν**2)*np.sin((π/2)*(μ-ν))*y[i+μ,j+ν]
        ỹ[i-1,j-1] = Ky/F**2
ỹ /= np.max(ỹ)     

plt.matshow(ỹ, cmap='binary')
plt.xticks([])
plt.yticks([]);

Why does this work?  The kernel acts as an approximation to the spatial derivative and is commonly used in image processing where it is known as the [Gabor filter](https://en.wikipedia.org/wiki/Gabor_filter). 

In [None]:
def gabor(x,y):
    return np.exp(-x**2-y**2)*np.sin(0.5*π*(x-y))

In [None]:
fig,ax = plt.subplots(1,2,figsize=(8,4))
X,Y = np.meshgrid(np.linspace(-2,2,100),np.linspace(-2,2,100), indexing='ij')
ax[0].imshow(gabor(X,Y), cmap='coolwarm', extent=[-2,2,-2,2], origin='lower')
ax[0].set_xlabel('x')
ax[0].set_ylabel('x')

G = np.array([[0,1,0],[1,0,-1],[0,-1,0]])
ax[1].imshow(G, cmap='coolwarm')
ax[1].set_xticks([])
ax[1].set_yticks([]);

### Pooling

Another important image processing technique is sub-sampling via either *max* or *average* pooling.  We downsample an image by replacing a group of pixels (`pool_size x pool_size`) with their maximum or average.  This changes the number of pixels in the output image. 

Let's take `pool_size = 10`

In [None]:
pool_size = 10
y = np.zeros([rows//pool_size,cols//pool_size])
for i in range(y.shape[0]):
    for j in range(y.shape[1]):
        y[i,j] = np.max(powerT[i*pool_size:(i+1)*pool_size,j*pool_size:(j+1)*pool_size])
        
plt.matshow(y, cmap='binary')
plt.xticks([])
plt.yticks([]);

### Up Sampling

We can perform the opposite action, and enlarge the image by copying a value onto a grid.  The image will look the same, (i.e. no change in features) but the number of pixels is now rescaled.

In [None]:
up_size = 10
y_up = np.zeros([up_size*y.shape[0],up_size*y.shape[1]])
for i in range(y.shape[0]):
    for j in range(y.shape[1]):
        y_up[i*up_size:(i+1)*up_size,j*up_size:(j+1)*up_size] = y[i,j]
        
plt.matshow(y_up, cmap='binary')
plt.xticks([])
plt.yticks([]);