# Image alignment

In this task, you will have to solve two image alignment problems: channel processing and face alignment. You can get **10 points** implementing all the passed functions (7.5 for the first part and 2.5 for the second one).

# Image channels processing and alignment (7.5 points)

## Problem review

Sergey Prokudin-Gorsky was the first color photographer in Russia, who made the color portrait of Leo Tolstoy. Each of his photographs is three black-and-white photo plates, corresponding to red, green, and blue color channels. Currently, the collection of his pictures is situated in the U.S. Library of Congress, the altered versions have proliferated online. In this task, you should make a programme which will align the images from the Prokudin-Gorsky plates and learn the basic image processing methods.

*The input image and the result of the alignment:*
<img src="http://cdn1.savepice.ru/uploads/2017/7/31/8e68237bfd49026d137f59283db18b29-full.png">

In [None]:
%pylab inline
import matplotlib.pyplot as plt 
import numpy as np

## Problem description

#### Input image loading

The input image is the set of 3 plates, corresponding to B, G, and R channels (top-down). You should implement the function $\tt{load}$\_$\tt{data}$ that reads the data and returns the list of images of plates.
$\tt{dir}$\_$\tt{name}$ is the path to the directory with plate images. If this directory is located in the same directory as this notebook, then default arguments can be used.

In [None]:
from pathlib import Path
import cv2

def load_data(dir_name = 'plates'):
    """
    Loads the plate data
    
    Parameters
    -----------
    dir_name : str
        Path relative to location of this notebook
    
    Returns
    -------
    data : list
        The loaded data images. Each image is loaded as a numpy ndarray
    """
    
    # Sort as we would like to load the images in a sorted fashion
    files = sorted(Path('.').joinpath(dir_name).glob('*.png'))
    
    data = [cv2.imread(str(f), cv2.IMREAD_GRAYSCALE) for f in files]

    return data

plates = load_data()

The dataset is a list of 2-dimensional arrays.

In [None]:
# The auxiliary function `visualize()` displays the images given as argument.
def visualize(imgs, format=None):
    plt.figure(figsize=(20, 40))
    for i, img in enumerate(imgs):
        if img.shape[0] == 3:
            img = img.transpose(1,2,0)
        plt_idx = i+1
        plt.subplot(3, 3, plt_idx)    
        plt.imshow(img, cmap=format)
    plt.show()

visualize(plates, 'gray')

#### The borders removal (1.5 points)
It is worth noting that there is a framing from all sides in most of the images. This framing can appreciably worsen the quality of channels alignment. Here, we suggest that you find the borders on the plates using Canny edge detector, and crop the images according to these edges. The example of using Canny detector implemented in skimage library can be found [here](http://scikit-image.org/docs/dev/auto_examples/edges/plot_canny.html).<br>

The borders can be removed in the following way:
* Apply Canny edge detector to the image.
* Find the rows and columns of the frame pixels. 
For example, in case of upper bound we will search for the row in the neighborhood of the upper edge of the image (e.g. 5% of its height). For each row let us count the number of edge pixels (obtained using Canny detector) it contains. Having these number let us find two maximums among them. Two rows corresponding to these maximums are edge rows. As there are two color changes in the frame (firstly, from light scanner background to the dark tape and then from the tape to the image), we need the second maximum that is further from the image border. The row corresponding to this maximum is the crop border. In order not to find two neighboring peaks, non-maximum suppression should be implemented: the rows next to the first maximum are set to zero, and after that, the second maximum is searched for.

#### Canny detector implementation (2.5 points)
You can write your own implementation of Canny edge detector to get extra points. <br>

Canny detection algorithm:
1. *Noise reduction.* To remove noise, the image is smoothed by Gaussian blur with the kernel of size $5 \times 5$ and $\sigma = 1.4$. Since the sum of the elements in the Gaussian kernel equals $1$, the kernel should be normalized before the convolution. <br><br>

2. *Calculating gradients.* When the image $I$ is smoothed, the derivatives $I_x$ and $I_y$ w.r.t. $x$ and $y$ are calculated. It can be implemented by convolving $I$ with Sobel kernels $K_x$ and $K_y$, respectively: 
$$ K_x = \begin{pmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{pmatrix}, K_y = \begin{pmatrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ -1 & -2 & -1 \end{pmatrix}. $$ 
Then, the magnitude $G$ and the slope $\theta$ of the gradient are calculated:
$$ |G| = \sqrt{I_x^2 + I_y^2}, $$
$$ \theta(x,y) = arctan\left(\frac{I_y}{I_x}\right)$$<br><br>

3. *Non-maximum suppression.* For each pixel find two neighbors (in the positive and negative gradient directions, supposing that each neighbor occupies the angle of $\pi /4$, and $0$ is the direction straight to the right). If the magnitude of the current pixel is greater than the magnitudes of the neighbors, nothing changes, otherwise, the magnitude of the current pixel is set to zero.<br><br>

4. *Double threshold.* The gradient magnitudes are compared with two specified threshold values, the first one is less than the second. The gradients that are smaller than the low threshold value are suppressed; the gradients higher than the high threshold value are marked as strong ones and the corresponding pixels are included in the final edge map. All the rest gradients are marked as weak ones and pixels corresponding to these gradients are considered in the next step.<br><br>

5. *Edge tracking by hysteresis.* Since a weak edge pixel caused from true edges will be connected to a strong edge pixel, pixel $w$ with weak gradient is marked as edge and included in the final edge map if and only if it is involved in the same blob (connected component) as some pixel $s$ with strong gradient. In other words, there should be a chain of neighbor weak pixels connecting $w$ and $s$ (the neighbors are 8 pixels around the considered one). You are welcome to make up and implement an algorithm that finds all the connected components of the gradient map considering each pixel only once.  After that, you can decide which pixels will be included in the final edge map (this algorithm should be single-pass, as well).

In [None]:
# NOTE: This is the reference canny detection algorithm, see own implementation below

from  skimage.feature import canny

def Canny_detector(img):
    """ Your implementation instead of skimage """     
    return canny(img, sigma=1.4)

canny_imgs = []
for img in plates:
    canny_img = Canny_detector(img)
    canny_imgs.append(canny_img)
    
visualize(canny_imgs, 'gray')

In [None]:
from scipy.signal import convolve2d

In [None]:
def reduce_noise(img, sigma=1.4):
    """
    Input image is smoothed by Gaussian blur with sigma and a filter size of 5x5
    
    Parameters
    ----------
    img : np.array, shape (rows, cols)
        The image to reduce the noise of
    
    Returns
    -------
    noise_reduce : np.array, shape (rows, cols)
        The image with noise reduction
    """
    # The Gaussian is being centered around the center of the mesh
    # Therefore, we can make matrices which counts the distances from the center in the respective directions
    # We first make the distance array
    x_dist = [2, 1, 0, 1, 2]
    y_dist = [2, 1, 0, 1, 2]
    
    # And makes meshgrids out of these
    x, y = np.meshgrid(x_dist, y_dist)

    gaussian_kernel = (1/(2*np.pi*sigma**2))*np.exp(-(x**2 + y**2)/(2*sigma**2))
    # Normalization of the Gaussian kernel
    norm_gaussian_kernel = gaussian_kernel/gaussian_kernel.sum()
    
    # We convolve using mode='same' such that the input dimension equals output dimension
    noise_reduced = convolve2d(img, norm_gaussian_kernel, mode='same')
    
    return noise_reduced

In [None]:
def get_gradients(img):
    """
    Returns the gradients with respect to x and y of the input image using Sobel kernels
    
    Parameters
    ----------
    img : np.array, shape (rows, cols)
        The image to take the derivative of
        
    Returns
    -------
    i_x : np.array, shape (rows, cols)
        The derivative of the image with respect to x
    i_y : np.array, shape (rows, cols)
        The derivative of the image with respect to y
    """
    
    k_x = np.array([[-1, 0, 1],
                    [-2, 0, 2],
                    [-1, 0, 1]])
    k_y = -k_x.T
    
    # We convolve using mode='same' such that the input dimension equals output dimension
    i_x = convolve2d(img, k_x, mode='same')
    i_y = convolve2d(img, k_y, mode='same')
    
    return i_x, i_y

In [None]:
def non_maximum_suppression(g_abs, i_x, i_y):
    """
    Performs non maximum suppresses the image
    
    That is: 
    1. Finds the closest points in the positive and negative gradient direction
    2. If the point under investigation is less in magnitude than one of its two neighbors:
       Set value of point under investigation to zero
    
    Parameters
    ----------
    g_abs : np.array, shape (rows, cols)
        The magnitude of the gradient to take the non-maximum supression of
    i_x : np.array, shape (rows, cols)
        The derivative of the image with respect to x
    i_y : np.array, shape (rows, cols)
        The derivative of the image with respect to y
        
    Returns
    -------
    non_max_suppressed_g_abs : np.array, shape (rows, cols)
        The non-maximum suppressed image
    """
    
    non_max_suppressed_g_abs = g_abs.copy()
    
    row_inds, col_inds = g_abs.shape
    
    # Numpy is row major, so we loop over the columns first
    for x_ind in range(col_inds):
        for y_ind in range(row_inds):
            # First we find the value of the point under consideration
            # NOTE: We check the input g_abs, as the non_max_suppressed_g_abs may have changed
            cur_val = g_abs[y_ind, x_ind]
            
            # Next we find the closest points in positive and negative gradient direction
            # NOTE: Clipping values to -1 and 1, as long interactions are of less importance
            x_steps = np.round(i_x[y_ind, x_ind]).clip(-1, 1).astype(int)
            y_steps = np.round(i_y[y_ind, x_ind]).clip(-1, 1).astype(int)
            
            pos_x_index = (x_ind + x_steps).clip(0, col_inds-1)
            pos_y_index = (y_ind + y_steps).clip(0, row_inds-1)

            neg_x_index = (x_ind - x_steps).clip(0, col_inds-1)
            neg_y_index = (y_ind - y_steps).clip(0, row_inds-1)
            
            # Then we find the value at the positive and negative position
            # NOTE: We check the input g_abs, as the non_max_suppressed_g_abs may have changed
            pos_dir_val = g_abs[pos_y_index, pos_x_index]
            neg_dir_val = g_abs[neg_y_index, neg_x_index]
            
            if cur_val < pos_dir_val or cur_val < neg_dir_val:
                non_max_suppressed_g_abs[y_ind, x_ind] = 0

    return non_max_suppressed_g_abs

In [None]:
def get_strong_and_weak_gradients(g_abs, high=90, low=80):
    """
    Returns the strong and weak gradients of an image
    
    A strong gradient is a gradient value which is above the high treshold.
    A weak gradient is a gradient value which is between the low and high treshold.
    Gradient values below the low threshold are neither.
    
    Notes
    -----
    The high treshold percentile is set high by default in order to capture the strongest gradients
    The low treshold percentile is set close to the high in order to suppress details
    
    Parameters
    ----------
    g_abs : np.array, shape (rows, cols)
        The magnitude of the gradient
    high : float
        The high treshold of the gradient (in percentiles)
    low : float
        The low threshold of the gradient (in percentiles)
        
    Returns
    -------
    strong_gradients : np.array, shape (rows, cols)
        Strong gradients of the image
    weak_gradients : np.array, shape (rows, cols)
        Weak gradients of the image    
    """
    
    high_tresh = np.percentile(g_abs, high)
    low_tresh = np.percentile(g_abs, low)
    
    strong_gradients = g_abs >= high_tresh
    weak_gradients = np.logical_and(g_abs >= low_tresh, g_abs < high_tresh)
    
    return strong_gradients, weak_gradients

In [None]:
def edge_track_image(strong_gradients, weak_gradients):
    """
    Combines strong gradient pixels with weak edge pixels caused from true edges.
    
    A weak edge pixel is considered to be caused by a true edge if any of the eigth 
    neighbors contains a strong gradient.
    
    Parameters
    ----------
    strong_gradients : np.array, shape (rows, cols)
        Strong gradients of the image
    weak_gradients : np.array, shape (rows, cols)
        Weak gradients of the image       
    
    Returns
    -------
    canny_img : np.array, shape (rows, cols)
        The image containing the canny edges
    """
    
    canny_img = strong_gradients.copy().astype(int)
    
    row_inds, col_inds = weak_gradients.shape

    # Numpy is row major, so we loop over the columns first
    for x_ind in range(col_inds):
        for y_ind in range(row_inds):
            # NOTE: A point cannot simultaneously be a strong and weak gradient
            if weak_gradients[y_ind, x_ind] and\
               strong_gradients[y_ind-1:y_ind+1, x_ind-1:x_ind+1].any():
                canny_img[y_ind, x_ind] = 1
                
    return canny_img

In [None]:
def Canny_detector(img):
    """
    Own implementation of the canny detection algorithm
    
    Parameters
    ----------
    img : np.array, shape (rows, cols)
        The image to find the Canny edges from
    
    Returns
    -------
    canny_img : np.array, shape (rows, cols)
        The image containing the canny edges
    """
    
    # 1. Noise reduction
    noise_reduced_img = reduce_noise(img)
    
    # 2. Calculating the gradients
    i_x, i_y = get_gradients(noise_reduced_img)
    # NOTE: One could use g_abs and theta for the non-maximum suppression,
    #       but using i_x and i_y uses suffices for our purposes
    g_abs = np.sqrt(i_x**2 + i_y**2)
    
    # 3. Non-maximum suppression
    suppressed_g = non_maximum_suppression(g_abs, i_x, i_y)
    
    # 4. Double threshold
    strong_gradients, weak_gradients = get_strong_and_weak_gradients(suppressed_g)
    
    # 5. Edge tracking by hysteresis
    canny_img = edge_track_image(strong_gradients, weak_gradients)
    
    return canny_img

canny_imgs = []
for img in plates:
    canny_img = Canny_detector(img)
    canny_imgs.append(canny_img)
    
visualize(canny_imgs, 'gray')

In [None]:
# NOTE: This is the reference canny detection algorithm, see own implementation below

def remove_borders(img, canny_img):
    """ Your implementation instead of the following one"""   
    dx = int(img.shape[1] * 0.05) 
    return img[dx : -dx, dx : -dx]


cropped_imgs = []
#crop borders
for i, img in enumerate(plates):
    cropped_imgs.append(remove_borders(img, canny_imgs[i]))

visualize(cropped_imgs, 'gray')

In [None]:
def plot_counts(top, bottom, left, right, fig_title):
    """
    Plot the counts of the input arrays
    
    Parameters
    ----------
    top : np.array, shape (edge_rows)
        The counts of the rows in the top of the image
    bottom : np.array, shape (edge_rows)
        The counts of the rows in the bottom of the image
    left : np.array, shape (edge_columns)
        The counts of the rows in the left of the image
    right : np.array, shape (edge_columnss)
        The counts of the rows in the right of the image
    fig_title : str
        The title of the figure
    """
    
    fig, (ax1, ax2, ax3, ax4) = plt.subplots(nrows=4)
    fig.suptitle(fig_title)
    ax1.bar(np.arange(len(top)), top)
    ax1.set_xlabel('Top rows (counted from the edge)')
    ax2.bar(np.arange(len(bottom)), bottom)
    ax2.set_xlabel('Bottom rows (counted from the edge)')
    ax3.bar(np.arange(len(left)), left)
    ax3.set_xlabel('Left rows (counted from the edge)')
    ax4.bar(np.arange(len(right)), right)
    ax4.set_xlabel('Right rows (counted from the edge)')
    plt.tight_layout()

In [None]:
def non_max_suppression_1d(array_1d):
    """
    Performs non-maximum suppression on the input array.
    
    Parameters
    ----------
    array_1d : np.array, shape(array_len)
        The array to perform non-max suppression on
    
    Returns
    -------
    array_1d_suppr : np.array, shape(array_len)
        The non-max suppressed array
    """
    
    array_1d_suppr = array_1d.copy()
    
    array_len = len(array_1d)
    
    # Non-maximum suppression on the interior points
    for i in range(1, array_len-1):
        if array_1d[i-1] >= array_1d[i] or array_1d[i+1] >= array_1d[i]:
            array_1d_suppr[i] = 0
            
    # Non-maximum suppression on the edge points
    if array_1d[1] >= array_1d[0]:
        array_1d_suppr[0] = 0
    if array_1d[-2] >= array_1d[-1]:
        array_1d_suppr[-1] = 0    
    
    return array_1d_suppr

In [None]:
def remove_borders(img, canny_img, i):
    """
    Removes the borders of an image depending on the Canny edges
    
    Parameters
    ----------
    img : np.array, shape (rows, cols)
        The original image
    canny_img : np.array, shape (rows, cols)
        The canny edges of the original image
    i : int
        The plate number

    Returns
    -------
    cropped_img : np.array, shape (rows, cols)
        The cropped image
    """   
    
    # We will search around the pixels within the 10 percentile of the edges 
    # (we see from the image above that cropping a exactly 5% leaves some unwanted black lines)
    search = 0.1
    
    top_pixels = np.round(img.shape[0]*search).astype(int)
    bottom_pixels = np.round(img.shape[0]*(1-search)).astype(int)
    left_pixels = np.round(img.shape[1]*search).astype(int)
    right_pixels = np.round(img.shape[1]*(1-search)).astype(int)
    
    # Take the sum of the respective rows or columns
    # NOTE: As we count from the edge (so that we can use negative indexing when cropping), 
    #       we reverse the bottom_sum and right_sum arrays
    top_sum = canny_img[:top_pixels, :].sum(axis=1)
    bottom_sum = canny_img[bottom_pixels:, :].sum(axis=1)[::-1]
    left_sum = canny_img[:, :left_pixels].sum(axis=0)
    right_sum = canny_img[:, right_pixels:].sum(axis=0)[::-1]

    # There should be at least two maximums corresponding to the two color changes: Scanner-tape, tape-image
    # Thus, we are after the maximum in the Canny edges which corresponds to the tape-image interface
    # By using non-maximum suppression we get the local maximas of the Canny image which we can use to
    # search for the intersection
    # As a sanity check, we plot the counts before and after max suppression
    plot_counts(top_sum, 
                bottom_sum, 
                left_sum, 
                right_sum,
                f'Before non-max suppression plate {i}')

    # Non-max suppression
    top_sum_suppr = non_max_suppression_1d(top_sum)
    bottom_sum_suppr = non_max_suppression_1d(bottom_sum)
    left_sum_suppr = non_max_suppression_1d(left_sum)
    right_sum_suppr = non_max_suppression_1d(right_sum)
    
    plot_counts(top_sum_suppr,
                bottom_sum_suppr, 
                left_sum_suppr, 
                right_sum_suppr, 
                f'After non-max suppression plate {i}')
   
    # NOTE: We observere in the images and the Canny images that there are couple of more lines than the
    #       scanner-tape and the tape-image interface, and one could argue that a different maxima could 
    #       have been used.
    #
    #       It was checked wheter the following gave better cropping:
    #       1. Another local maxima
    #       2. The nth highest value of the local maxima, see
    #       https://stackoverflow.com/questions/6910641/how-do-i-get-indices-of-n-maximum-values-in-a-numpy-array/27433395    #       
    #       3. Tweaking of the Canny lines
    #
    #       Indeed the overall cropping got better, but on the cost that the later alignment of the images
    #       got worse (as cropping just a little part of the origninal image gave a bad alignment)
    maxima = 2
    
    # NOTE: Where returns a tuple, hence the first zero indexing
    top = np.where(top_sum_suppr != 0)[0][maxima-1]
    bottom = np.where(bottom_sum_suppr != 0)[0][maxima-1]
    left = np.where(left_sum_suppr != 0)[0][maxima-1]
    right = np.where(right_sum_suppr != 0)[0][maxima-1]
     
    return img[top:-bottom, left:-right]

cropped_imgs = []
#crop borders
for i, img in enumerate(plates):
    cropped_imgs.append(remove_borders(img, canny_imgs[i], i))

In [None]:
visualize(cropped_imgs, 'gray')

#### Channels separation  (0.5 points)

The next step is to separate the image into three channels (B, G, R) and make one colored picture. To get channels, you can divide each plate into three equal parts.

In [None]:
def impose_components(img):
    """
    Imposes the components of a plate
    
    Parameters
    ----------
    img : np.array, shape (rows, cols)
        The image to impose the components of
    
    Returns
    -------
    rgb_img : np.array, shape (rows/3, cols, 3)
        The rgb image
    """
    
    # NOTE: We use the floor operator to avoid off-by-one errors
    row_split = np.floor(img.shape[0]/3).astype(int)
    
    # NOTE: We split red by 2*row_split:3*row_split to avoid off-by-one errors
    blue = img[0:row_split, :]
    green = img[row_split:2*row_split, :]
    red = img[2*row_split:3*row_split, :]

    rgb_img = np.stack((red, green, blue), axis=-1)
    
    return rgb_img

rgb_imgs = []
for cropped_img in cropped_imgs:
    rgb_img = impose_components(cropped_img)
    rgb_imgs.append(rgb_img)

visualize(rgb_imgs)

#### Search for the best shift for channel alignment (1 point for metrics implementation + 2 points for channel alignment)

In order to align two images, we will shift one image relative to another within some limits (e.g. from $-15$ to $15$ pixels). For each shift, we can calculate some metrics in the overlap of the images. Depending on the metrics, the best shift is the one the metrics achieves the greatest or the smallest value for. We suggest that you implement two metrics and choose the one that allows to obtain the better alignment quality:

* *Mean squared error (MSE):*<br><br>
$$ MSE(I_1, I_2) = \dfrac{1}{w * h}\sum_{x,y}(I_1(x,y)-I_2(x,y))^2, $$<br> where *w, h* are width and height of the images, respectively. To find the optimal shift you should find the minimum MSE over all the shift values.
    <br><br>
* *Normalized cross-correlation (CC):*<br><br>
    $$
    I_1 \ast I_2 = \dfrac{\sum_{x,y}I_1(x,y)I_2(x,y)}{\sum_{x,y}I_1(x,y)\sum_{x,y}I_2(x,y)}.
    $$<br>
    To find the optimal shift you should find the maximum CC over all the shift values.

In [None]:
def mse(i_1, i_2):
    """
    Returns the mean square error of i_1 and i_2
    
    Parameters
    ----------
    i_1 : np.array, shape (rows, cols)
        The first matrix
    i_2 : np.array, shape (rows, cols)
        The second matrix
        
    Returns
    -------
    mse_val : float
        The mean squared error between i_1 and i_2
    """
    
    w = i_1.shape[1]
    h = i_1.shape[0]
    
    mse_val = (1/(w*h))*((i_1-i_2)**2).sum()
    
    return mse_val

In [None]:
def cor(i_1, i_2):
    """
    Returns the cross correlation of i_1 and i_2
    
    Parameters
    ----------
    i_1 : np.array, shape (rows, cols)
        The first matrix
    i_2 : np.array, shape (rows, cols)
        The second matrix
        
    Returns
    -------
    cor_val : float
        The cross correlation between i_1 and i_2
    """
    
    cor_val = (i_1*i_2).sum()/(i_1.sum()*i_2.sum())
    
    return cor_val

In [None]:
def get_best_shift(ch_1, ch_2, mode, search_range=15):
    """    
    Finds the optimal shift between two channels by shifting ch_1 with respect to ch_2
    
    Parameters
    ----------
    ch_1 : np.array, shape (rows, cols)
        Channel 1 to shift against channel 2
    ch_2 : np.array, shape (rows, cols)
        Channel 2 which channel 1 is shifted against
    mode : 'mse' or 'cc'
        The optimisation mode (either mean squared error or cross correlation)
    search_range : int
        Number of shifts to try
        
    Returns
    -------
    best_shift_top : int
        The best shift of ch_1 to match ch_2 counted from the top of the padded channels
    best_shift_left : int
        The best shift of ch_1 to match ch_2 counted from the left of the padded channels
    """
    
    # Declare the score
    score = np.zeros((search_range*2, search_range*2))
    
    if mode == 'mse':
        get_score = mse
    elif mode == 'cc':
        get_score = cor
    else:
        raise RuntimeError('mode must be mse or cc')

    # We make baground images which we will fill the channels with
    ch_1_background = np.zeros((ch_1.shape[0] + 2*search_range, ch_1.shape[1] + 2*search_range))
    ch_2_background = np.zeros((ch_2.shape[0] + 2*search_range, ch_2.shape[1] + 2*search_range))
    
    # We will keep channel 2 steady and shift channel 1
    ch_2_steady = ch_2_background.copy()
    ch_2_steady[search_range:-search_range, search_range:-search_range] = ch_2

    for i in range(search_range*2):
        for j in range(search_range*2):
            cur_ch_1 = ch_1_background.copy()
            cur_ch_1[i:-search_range*2+i, j:-search_range*2+j] = ch_1
            score[i, j] = get_score(cur_ch_1, ch_2_steady)

    if mode == 'mse':
        # NOTE: We negate the mse, so that the max will give the best result
        score = -score
    
    best_shift_top, best_shift_left = np.unravel_index(score.argmax(), score.shape)
    
    return best_shift_top, best_shift_left

In [None]:
def get_best_image(rgb_img, mode, search_range=15):
    """
    Generates rgb images based on the best shift between the channels
    
    Parameters
    ----------
    rgb_img : np.array, shape (rows, cols, 3)
        The image to optimize
    mode : 'mse' or 'cc'
        The optimisation mode (either mean squared error or cross correlation)
    search_range : int
        Number of shifts to try
        
    Returns
    -------
    optimized_img : np.array, shape (rows+rshift, cols, 3)
        The image with the optimal amount of shift
    """
    
    red = rgb_img[:,:,0]
    green = rgb_img[:,:,1]
    blue = rgb_img[:,:,2]
    n_cols = red.shape[1]
    
    # In order to have a reference we will shift the channels with respect to the green channel
    
    # Get the best shift of the blue channel
    blue_top_shift, blue_left_shift = get_best_shift(blue, green, mode, search_range)
    # Get the best shift of the red channel    
    red_top_shift, red_left_shift = get_best_shift(red, green, mode, search_range)

    # Create backgrounds which fits the maximum shifted images
    best_red = np.zeros((red.shape[0] + 2*search_range, red.shape[1] + 2*search_range))
    best_green = np.zeros((green.shape[0] + 2*search_range, green.shape[1] + 2*search_range))
    best_blue = np.zeros((blue.shape[0] + 2*search_range, blue.shape[1] + 2*search_range))
    
    best_red[red_top_shift:-search_range*2+red_top_shift,
             red_left_shift:-search_range*2+red_left_shift] = red
    best_green[search_range:-search_range,
               search_range:-search_range] = green
    best_blue[blue_top_shift:-search_range*2+blue_top_shift,
              blue_left_shift:-search_range*2+blue_left_shift] = blue
    
    optimized_img = np.stack([best_red, best_green, best_blue], axis=-1).astype(int)
    
    return optimized_img

In [None]:
final_imgs_mse = []
for img in rgb_imgs:
    final_img = get_best_image(img, 'mse')
    final_imgs_mse.append(final_img)

visualize(final_imgs_mse)

In [None]:
final_imgs_cc = []
for img in rgb_imgs:
    final_img = get_best_image(img, 'cc')
    final_imgs_cc.append(final_img)

visualize(final_imgs_cc)

As one should not compare the numerical values of the error metrics, we do a visual comparison instead.
By visual inspection we observe that both the mean square error and the cross correlation gives a good result.

# Face Alignment (2.5 points)

In this task, you have to implement face normalization and alignment. Most of the face images deceptively seem to be aligned, but since many face recognition algorithms are very sensitive to shifts and rotations, we need not only to find a face on the image but also normalize it. Besides, the neural networks usually used for recognition have fixed input size, so, the normalized face images should be resized as well.

There are six images of faces you have to normalize. In addition, you have the coordinates of the eyes in each of the pictures. You have to rotate the image so that the eyes are on the same height, crop the square box containing the face and transform it to the size $224\times 224.$ The eyes should be located symmetrically and in the middle of the image (on the height).

Here is an example of how the transformation should look like.

<img src = "https://cdn1.savepice.ru/uploads/2017/12/13/286e475ef7a4f4e59005bcf7de78742f-full.jpg">

#### Get data
You get the images and corresponding eyes coordinates for each person. You should implement the  function $\tt{load}$\_$\tt{faces}$\_$\tt{and}$\_$\tt{eyes}$ that reads the data and returns two dictionaries: the dictionary of images and the dictionary of eyes coordinates. Eyes coordinates is a list of two tuples $[(x_1,y_1),(x_2,y_2)]$.
Both dictionaries should have filenames as the keys.

$\tt{dir}$\_$\tt{name}$ is the path to the directory with face images, $\tt{eye}$\_$\tt{path}$ is the path to .pickle file with eyes coordinates. If these directory and file are located in the same directory as this notebook, then default arguments can be used.

In [None]:
import pickle

def load_faces_and_eyes(dir_name = 'faces_imgs', eye_path = 'eyes.pickle'):
    """
    Loads the faces images and the eyes data
    
    Parameters
    ----------
    dir_name : str
        Path relative to location of this notebook
    eye_path : str
        File name relative to location of this notebook
    
    Returns
    -------
    faces : dict
        Dictionary of the faces on the form
        >>> {path: np.array}
    eyes : dict
        Dictionary of the eyes on the form
        >>> {path: list}
    """     
    files = sorted(Path('.').joinpath(dir_name).glob('*.jpg'))
    # NOTE: cv loads BGR, and not RGB, hence the reverse ordering
    faces = {f.name: cv2.imread(str(f), cv2.IMREAD_COLOR)[:,:,::-1] for f in files}
    
    pickle_path = Path('.').joinpath('eyes.pickle')
    with pickle_path.open('rb') as f:
        # The protocol version used is detected automatically, so we do not
        # have to specify it.
        eyes = pickle.load(f)
    
    return faces, eyes
    
faces, eyes = load_faces_and_eyes()

Here is how the input images look like:

In [None]:
visualize(faces.values())

You may make the transformation using your own algorithm or by the following steps:
1. Find the angle between the segment connecting two eyes and horizontal line;
2. Rotate the image;
3. Find the coordinates of the eyes on the rotated image
4. Find the width and height of the box containing the face depending on the eyes coordinates
5. Crop the box and resize it to $224\times224$

In [None]:
from skimage.transform import resize

def transform_face(image, eyes):
    """
    Rotates and crops so that eyes are horizontally aligned and in the middle of the image
    
    Parameters
    ----------
    image : np.array, shape (rows, cols)
        The image to transform
    eyes : list
        List of the eye coordinates
    
    Returns
    -------
    transfromed_image : np.array, shape(224, 224, 3)
        The transformed image
    """ 
    
    # 1. Find the angle between the segment connecting two eyes and horizontal line
    eye_1, eye_2 = eyes
    eye_1_x_pos = eye_1[1]
    eye_2_x_pos = eye_2[1]
    if eye_1_x_pos < eye_2_x_pos:
        left_eye = np.array(eye_1)
        right_eye = np.array(eye_2)
    else:
        left_eye = np.array(eye_2)
        right_eye = np.array(eye_1)
        
    horizontal_form_left_eye = np.array((left_eye[0], right_eye[1]))
    
    eyes_angle_deg = \
        np.arccos(np.linalg.norm(horizontal_form_left_eye-left_eye)/\
                  np.linalg.norm(right_eye-left_eye))*\
        180/np.pi
    
    eyes_angle_rad = \
        np.arccos(np.dot(right_eye, horizontal_form_left_eye)/\
                  (np.linalg.norm(right_eye)*np.linalg.norm(horizontal_form_left_eye)))
    
    # The method above only finds the angle between the vectors
    # We would like to measure the angle with respect to the horizontal line
    if left_eye[0] > right_eye[0]:
        eyes_angle_rad = -eyes_angle_rad
    
    # Convert to degrees
    eyes_angle_deg = eyes_angle_rad * 180/np.pi
    
    # 2. Rotate the image
    rot_image = rotate(image, eyes_angle_deg, resize=False)
    image_center = tuple(np.array(image.shape[1::-1]) / 2)
    rot_mat = cv2.getRotationMatrix2D(image_center, eyes_angle_deg, 1.0)
    rot_image = cv2.warpAffine(image, rot_mat, image.shape[1::-1], flags=cv2.INTER_LINEAR)
    
    # 3. Find the coordinates of the eyes on the rotated image
    # We use the rotation matrix to rotate the vectors
    # NOTE: We add one (unity) as the rotation matrix from cv is three dimensional
    stacked_eyes = np.vstack((left_eye, right_eye))
    eyes_w_ones = np.hstack((stacked_eyes, np.ones((stacked_eyes.shape[0], 1))))
    rot_eyes = np.round(rot_mat.dot(eyes_w_ones.T)).T.astype(int)

    # NOTE: The coordinates has been swapped during the transformation
    rot_left_eye = rot_eyes[0, :][::-1]
    rot_right_eye = rot_eyes[1, :][::-1]
    
    # 4. Find the width and height of the box containing the face depending on the eyes coordinates
    # Based on the findings from
    # https://upload.wikimedia.org/wikipedia/commons/0/06/AvgHeadSizes.png
    # We have that head to chin is at the 99 percentile for men is 10 cm
    # and that center of eye distance at the 99 percentile for men is 7.4 cm
    # This means that the box size should include most of the heads if we multiply the
    # center of eye distance of eyes in pixels with the head_eye_ratio
    # However, we add some pad as people are usually not in the 99 percentile
    pad = 3
    head_eye_ratio = pad+10/7.4
    eye_distance = np.linalg.norm(rot_right_eye - rot_left_eye)
    box_size = np.floor(eye_distance*head_eye_ratio).astype(int)
    half_size = (box_size/2).astype(int)
    center_eyes = rot_left_eye + ((rot_right_eye - rot_left_eye)/2).astype(int)
    
    # 5. Crop the box and resize it to 224 x 224
    aligned_image = rot_image[center_eyes[0]-half_size:center_eyes[0]+half_size,
                              center_eyes[1]-half_size:center_eyes[1]+half_size,
                              :]
    
    transformed_image = resize(aligned_image, (224, 224), mode='reflect', anti_aliasing=True)
    
    return transformed_image

In [None]:
transformed_imgs = []
for i in faces:
    img = faces[i]
    eye = eyes[i]
    transformed = transform_face(img, eye)
    transformed_imgs.append(transformed)
    
visualize(transformed_imgs)