# Computer Vision
## Exercise Sheet 5: Correlation-based Stereo Vision
### Erhardt Barth / Philipp Gruening / Christoph Linse / Manuel Laufer
Universität zu Lübeck, Institut für Neuro- und Bioinformatik

In case of questions, contact us via email: *{barth, gruening, linse, laufer} @inb.uni-luebeck.de*

## Note: Please insert the names of all participating students:

1. 
2. 
3. 
4. 
5. 


In [None]:
import sys, os
if 'google.colab' in sys.modules:
  if os.getcwd() == '/content':
    !git clone 'https://github.com/inb-luebeck/cs4250.git'
    os.chdir('cs4250')

In [None]:
import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

#### Cross-correlation and Autocorrelation
*Cross correlation* is a standard way to estimate the degree of similarity (correlation) between two signals. For a discrete series it is defined as 
$$\rho(dt)=\frac{\sum_{i}{[(x_i-\mu_{x})(y_{i+dt}-\mu_{y})]}}{\sqrt{\sum_{i}{(x_i-\mu_{x})^2}}\sqrt{\sum_{i}{(y_{i+dt}-\mu_{y})^2}}}$$ 
where $\rho$ denotes the *correlation coefficient*; $dt$ is the time shift, and $\mu_{x}$ and $\mu_{y}$ are the means of the two signals $x$ and $y$. The denominator normalizes the correlation coefficient such that $\rho \in [-1,1]$, the bounds indicate maximum correlation and $0$ means no correlation at all. The sums are only evaluated for indices where both $x_i$ and $y_{i+dt}$ exist.

When the correlation of a signal is computed against a temporally shifted version of itself, we call it *autocorrelation*, and define it as 
$$\rho(dt)=\frac{\sum_{i}{[(x_i-\mu_{x})(x_{i+dt}-\mu_{x})]}}{(x_i-\mu_{x})^2}.$$ 

Cross-correlation can be used to determine the delay between two signals. In order to do this, we shift the second signal across a range of time shifts $[-dt, dt]$ and cross-correlate it with the first signal. The point of maximum correlation corresponds to the signal delay. 

Write a Python function to calculate the cross-correlation between two signals for the range of delays $[-dt,dt]$.

The `ultrasound.npy` data file from the archive contains two ultrasound signals. Plot the two signals in one plot. Using the above algorithm, cross-correlate them to find the signal delay (using a maximum time shift of 100). Plot the values of the correlation coefficient for the given delay range.

In [None]:
def cross_corr_seq(X, Y, dt):
    # returns the cross-correlation sequence over 
    # the time shift range [-dt,dt]

    rho_seq=[]
    for t in np.arange(-dt, dt):
        rho = cross_corr(X, Y, t)
        rho_seq.append(rho)

    return np.array(rho_seq)

def cross_corr(X, Y, dt):
    # TODO: compute the correlation coeff. for time shift dt
    rho = -2.

    return rho

In [None]:
X =np.load('data/exercise_5/ultrasound.npy', allow_pickle=True)
Y = X.tolist()['Y'][0]
X = X.tolist()['X'][0]

_, ax = plt.subplots(1)
plt.plot(X, '-b')
plt.plot(Y, '-r')
ax.set_title('The two ultrasound signals')

# TODO: compute cross correlation
dt=42
rho_seq=cross_corr_seq(X, Y, dt)
max_idx = np.argmax(rho_seq)
max_rho = rho_seq.max()

# TODO: compute lag value
lag = 42 
print('The signal lag is {}.'.format(lag))

_, ax = plt.subplots(1)
ax.plot(np.arange(-dt,dt),rho_seq)
ax.set_title(
    'cross-correlation sequence over the lag range [{},{}]'.format(
    -dt, dt)
)

_, ax = plt.subplots(1)
plt.plot(X, '-b')
plt.plot(Y[lag:], '-r')
ax.set_title('Two signals with adjusted lag')
print('Signal difference after lag-adjustment:')
print((X[:-lag]-Y[lag:]).sum())


#### Correlation-based stereo algorithms

In this exercise, we will deal with the first problem of stereo vision:  the *correspondence problem*. For each image point in the left image, we want to find the corresponding point in the right image which is the projection of the same 3D-point.

**Keep in mind**: x denotes the horizontal axis of an image, in a numpy matrix this is the second axis (column x). Accordingly, y is the vertical axis in an image, which is the first axis in a numpy matrix (row y).

Our basic assumption is that corresponding image regions are similar, i.e. correlated. For each image pixel in the left image we are searching for its best match in the right image (or vice versa). Matching only single pixels results in too many false positives, so we choose a neighborhood window around the pixel and correlate it with all candidate blocks in the right image to find its best match (*block matching*). We assume rectified images, i.e. the epipolar lines are aligned, so we only need to search along the horizontal direction.


Possible similarity measures for block matching:

* Sum of Squared Differences (SSD): $$D(x,y,dx,dy)=\sum_{(i,j)\in W_{x,y}}{[I_l(i,j)-I_r(i-dx, j-dy)]^2}$$
* Normalized Cross-correlation (NCC): $$D(x,y,dx,dy)=\frac{\sum_{(i,j) \in W_{x,y}}{I_l(i,j) I_r(i-dx, j-dy)} } {\sqrt{\sum_{(i,j)\in W_{x,y}}{I_{l}^{2} (i,j)} \sum_{(i,j)\in W_{x,y}}{I_{r}^{2}{(i-dx, j-dy)} } } }$$

where $W_{x,y}$ is the square window of a certain size centered around pixel $(x,y)$, $I_l$ and $I_r$ are the left and right intensity images, and $(dx, dy)$ are the horizontal and vertical *disparities* (shifted amounts). Note that $dx$ is zero as we are only searching for horizontal shifts.

The goal is to find for each pixel $(x,y)$ the disparity $(0,dy)$ that either minimizes the error (sum of squared differences) or maximizes the similarity (cross-correlation). 
In order to do this, we need to search over a range of disparities up to an allowed maximum disparity. 
The output is the so-called `disparity map`: a map where pixel intensities describe the relative depth of points within a scene.

Implement functions `stereo_corr_...(left, right, win_size, max_disp)` which returns the disparity map `disp_map` for the stereo image pair `left` and `right`, given a correlation window size `win_size` and an upper limit on the allowed disparity range `max_disp`. Implement both the SSD and NCC-based block matching and match from left to right (i.e., for each window in the left image, search in the right image, so that the disparity map is with respect to the left image). 

Note: When coded as nested `for` loops in python, this can be very slow. Be creative about how you code this. Using **convolution** (e.g. `cv2.blur`) is one possibility.

Use the stereo image pair `left.jpg` and `right.jpg` in the archive to test your algorithm. You may assume that the images are rectified. Visualize the resulting disparity map with the `plt.imshow` command.

Experiment with the following and explain the effects:

* try out different window sizes (e.g. `win_size` 3, 5, 9, 11),
* try out different values of maximum disparity (e.g. 10, 16),
* compare the results obtained with SSD and NCC.

**Task**: Find an example where NCC clearly outperforms SDD.

In [None]:
def stereo_corr_ssd(left, right, win_size, max_disp, use_convolution=False):
    """Computes the disparity map (from left to right) for a stereo image pair subject 
    to a maximum allowed disparity using Sum of Squared Differences as 
    similarity measure.
    It assumes rectified images (search only in the horizontal direction).

    In: 
       left, right: the left and right images in the stereo pair
       win_size: correlation window size
       max_disp: upper bound on the allowed disparity
    Out:
       disparity_map: disparity map of the same size as the input
    """

    not_same_size = len([1 for x,y in zip(left.shape, right.shape) if x!=y]) > 0
    if not_same_size:
        raise ValueError('The images should have the same size.')
    
    height, width = left.shape[0], left.shape[1]

    # TODO: compute squared diff-based disparity map
    
    if use_convolution:
        pass
    else:
        pass


    for d in range(max_disp):
        if use_convolution:
            pass
        else:
            pass
    
    
    disparity_map = 42*np.ones((height, width))
    return disparity_map              
        

In [None]:
def stereo_corr_NCC(left, right, win_size, max_disp):
    """Computes the disparity map (from left to right) for a stereo image pair subject 
    to a maximum allowed disparity using Normalized Cross-Correlation as 
    similarity measure.
    It assumes rectified images (search only in the horizontal direction).

    In: 
       left, right: the left and right images in the stereo pair
       win_size: correlation window size
       max_disp: upper bound on the allowed disparity
    Out:
       disparity_map: disparity map of the same size as the input
    """
    
    not_same_size = len([1 for x,y in zip(left.shape, right.shape) if x!=y]) > 0
    if not_same_size:
        raise ValueError('The images should have the same size.')
    
    height, width = left.shape[0], left.shape[1]
    
    # TODO: compute correlation-based disparity map
    
    disparity_map = 13*np.ones((height, width))
    return disparity_map            
        

In [None]:
# load images
left = cv2.cvtColor(cv2.imread('data/exercise_5/left.jpg'), cv2.COLOR_RGB2GRAY).astype('float32')/255.
right = cv2.cvtColor(cv2.imread('data/exercise_5/right.jpg'), cv2.COLOR_RGB2GRAY).astype('float32')/255.

# TODO: define parameters
win_sizes_ = [3]
max_disps_ = [5]

for win_size in win_sizes_:
    for max_disp in max_disps_:
        disparity_map_ssd = stereo_corr_ssd(left, right, win_size, max_disp)
        disparity_map_NCC = stereo_corr_NCC(left, right, win_size, max_disp)
        
        _, ax = plt.subplots(1)
        ax.imshow(disparity_map_ssd)
        ax.set_title('SSD: win_size: {}, max_disp: {}'.format(win_size, max_disp))
        
        _, ax = plt.subplots(1)
        ax.imshow(disparity_map_NCC)
        ax.set_title('NCC: win_size: {}, max_disp: {}'.format(win_size, max_disp))
        plt.show()