# AR PROJECT 


### Michael Albarello and Matteo Nestola

## Introduction

Augmented Reality, whose acronym is AR, is an immersive technology capable of enhancing the sensory experience of users through technological tools. In other words, this technology allows to merge real and virtual elements to intensify the user's perception of the reality. 

In our specific case, we were demanded to superimpose a layer containing the course’s logo and an author’s name on top of the cover of a well-known Computer Vision book in order to appear real to the observer. In particular, the professor provided us the files:

- Input video sequence to be augmented;
- First frame of the input sequence;
- Binary mask that identifies the pixels belonging to the book in the reference frame;
- Image containing the augmented reality layer;
- Binary mask of the augmented reality layer.

In the following sections we will explain step by step how we obtained the final outcome starting from the previous files.

## Libraries 
First of all, we had to import the appropriate libraries to make use of the most important computer vision tools. 

In [2]:
import numpy as np
import cv2

## Input data 
All of the given media files listed in the introduction, have been stored in the following python variables:

* `ref_frame`, which is the first frame of the video and it is used as anchor to find correspondences with all the other frames;

* `video`, which is the media file that will be augmented by extracting each frame and adding on top of each of them the transformed augmented layer;  

* `aug_layer`, which is the layer that will be added onto the original video; 

* `aug_layer_mask`, which is the binary mask where white represents the area of interest and black the background that will be converted to transparent;

* `ref_mask`, which is the binary mask of the reference frame;


In [3]:
#video to augment
video = cv2.VideoCapture('./Data/Multiple_View.avi')

if (video.isOpened() == False):
    print("Error opening video stream or file")

#reference frame and associated mask
ref_frame = cv2.imread('./Data/ReferenceFrame.png')
ref_mask = cv2.imread('./Data/ObjectMask.PNG') 

#augmented layer and associated mask
aug_layer = cv2.imread('./Data/AugmentedLayer.PNG')
aug_layer_mask = cv2.imread('./Data/AugmentedLayerMask.PNG')

Error opening video stream or file


## Resizing and reshaping the augmented layer's mask
Now we resized the `aug_layer_rgba` to be of the same size of the `reference_rgba`, so that the two images would fit perfectly one on top of the other. Then we reshaped the augmented layer mask in order to build a mask to select only the words and logo and not their background.

In [None]:
#resizing the mask

h_r, w_r, channels = ref_frame.shape

aug_layer_resized = aug_layer[0:0 + h_r, 0:0 + w_r]
aug_layer_mask_resized = aug_layer_mask[0:0 + h_r, 0:0 + w_r]

cv2.imshow('aug_layer_resized', aug_layer_resized)


#reshaping the mask 

# diff = cv2.bitwise_and(ref_frame, aug_layer_mask_resized)         #
layer_gray = cv2.cvtColor(aug_layer_resized, cv2.COLOR_BGR2GRAY)

# Creating kernel
kernel = np.ones((5, 5), np.uint8)
kernel2 = np.ones((5, 5), np.uint8)

#Binary Morphology Closing (dilation -->erosion)
layer_gray = cv2.dilate(layer_gray, kernel)
layer_gray = cv2.erode(layer_gray, kernel2)

_, im_gray_th_otsu = cv2.threshold(layer_gray, 128, 255, cv2.THRESH_OTSU)
aug_layer_mask = cv2.cvtColor(im_gray_th_otsu, cv2.COLOR_GRAY2BGR)


## Defining SIFT object
At this point, we created a SIFT object in order to detect and compute the salient points initially of the reference frame and successively of all the rest of the frames of the video:

In [None]:
sift = cv2.xfeatures2d.SIFT_create()

kp_reference = sift.detect(ref_frame)                                  #detects salient points 

kp_reference, des_reference = sift.compute(ref_frame, kp_reference)    #for each salient point, computes the descriptor

# Functions to iterate
From now on, we will start to iterate the same procedure for the totality of frames of the video. To make things clearer, we decided to divide the code in blocks, each one inside of a different function.

## Detecting and computing the current frame's salient points 
By using the `computeFrames()` function, we simply detect and compute the salient points of the current frame of the video.

In [None]:
def computeFrame():
    kp_currentFrame = sift.detect(current_frame)
    kp_currentFrame, des_currentFrame = sift.compute(current_frame, kp_currentFrame)
    return kp_currentFrame, des_currentFrame

## Finding corrispondences by using FLANN's matching algorithm
Once salient points and their associated descriptors have been found, we will have to find matches between the two images of the same scene. A common way to compute feature matching is by using FLANN (Fast Library for Approximate Nearest Neighbors), which is a library that contains a collection of algorithms optimized for fast nearest neighbor search in large datasets. 

In [None]:
def flann_algorithm():
    
    #creating the FLANN object and setting the algorithm
    
    FLANN_INDEX_KDTREE = 1
    index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
    search_params = dict(checks=50)
    flann = cv2.FlannBasedMatcher(index_params, search_params)  # create matcher
    
    #finding matches between the descriptors of the reference frame and the ones of the current frame.
    
    matches = flann.knnMatch(des_reference, des_currentFrame, k=2)  #k defines the number of best matches to consider
    
    return matches

## Validating matches
The best matches are found by calculating the distance between the descriptors and selecting the k matches with smaller distance, indeed it is common that a descriptor has multiple similar good matches. To filter the matches, Lowe (SIFT paper) proposed to use a distance ratio test to try to eliminate false matches: The distance ratio between the two nearest matches of a considered keypoint is computed and it is a good match when this value is below the 0.7 threshold. 
In the following function, we check if the distance ratio is not verified so that the current match will be discarded. Once all the ambiguous matches have been filtered, it is possible to draw the keypoints to see how they change throughout the frames. 

In [None]:
def filter_and_draw_kp():

#filtering keypoints to keep only good matches, by discarding the ones that don't verify Lowe's threshold.
    
    good = []
    matchesMask = [[0, 0] for i in range(len(matches))]

    for i, (m, n) in enumerate(matches):   
        if m.distance < 0.7 * n.distance:
            matchesMask[i] = [1, 0]
            good.append(m)
    
#Drawing results for current frame

    draw_params = dict(matchColor=(255, 0, 255),
                       singlePointColor=(0, 255, 0),
                       matchesMask=matchesMask,
                       flags=cv2.DrawMatchesFlags_DEFAULT)
    
    kp_matches = cv2.drawMatchesKnn(current_frame, kp_currentFrame, ref_frame, kp_reference, matches, None, **draw_params)

    cv2.imshow('kp correspondences', kp_matches)
    
        
    return matchesMask, good

## Computing the homography
At this point, we have to find the *homography* that maps all salient points of the reference frame (`src_pts`) to the corresponding salient points of the frame we are considering at current iteration (`dst_pts`). In order to do so, we used the `cv2.findHomography()` function, which takes as parameters the source and destination points (in `CV_32FC2` or `Point2f` format) and the algorithm used to compute the homography, in our case the `cv2.Ransac` method.

In [None]:
def find_homography():
    
#casting and reshaping the source and destination points in order to use the findHomography() function

    src_pts = np.float32([kp_reference[m.queryIdx].pt for m in good]).reshape(-1, 1, 2)   
    dst_pts = np.float32([kp_currentFrame[m.trainIdx].pt for m in good]).reshape(-1, 1, 2)     

    H, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
    matchesMask = mask.ravel().tolist()

    return H

## Applying the homography to the corners of the reference frame
With the following function we define the four corners of the reference frame and we transform them using the previously found homography obtaining the projection area where the augmented layer will be superimposed.

            

In [None]:
def get_perspectivePts():    
    
    pts = np.float32([[0, 0], [0, h_r - 1], [w_r - 1, h_r - 1], [w_r - 1, 0]]).reshape(-1, 1, 2)
    dst = cv2.perspectiveTransform(pts, H)
    
    return dst

## Computing final homography
Now we are able to compute the homography that transforms the four corners of the augmented layer into the previous four transformend corners. This final homography is capable of projecting the augmented layer in a way that is compatible with the perspective of the current frame. 


In [None]:
def get_last_homography():
        
    
    pts_layer = np.float32([[0, 0], [0, h_l - 1], [w_l - 1, h_l - 1], [w_l - 1, 0]]).reshape(-1, 1, 2)
    H = cv2.getPerspectiveTransform(pts_layer, dst)
    return H

## The warping process
At this point we can use the previously generated homography function in order to compute the correct warping of the augmented layer and its related mask. Then we created a new mask where `True` corresponds to the black pixels of the `warped_aug_layer_mask` so that we can substitute all of these selected pixels with the pixels of the same coordinates of the current frame. 

In [None]:
def warping_process():
    
    warped_aug_layer = cv2.warpPerspective(aug_layer_resized, H, (w_r, h_r))
    #cv2.imshow("warped augmented layer", warped_aug_layer)

    warped_aug_layer_mask = cv2.warpPerspective(aug_layer_mask, H, (w_l, h_l))
    #cv2.imshow("warped augmented layer mask", warped_aug_layer_mask)

    #creating a boolean mask instead of a binary image
    warped_aug_layer_mask = np.equal(warped_aug_layer_mask, 0) #True for all black pixel of the mask
    

    #substituting the black pixels of the mask with the pixels of the current frame
    warped_aug_layer[warped_aug_layer_mask] = current_frame[warped_aug_layer_mask]


    cv2.imshow('Frame', warped_aug_layer)

# Iteration of the frames of the video
In the following code, we execute all of the previously defined functions, obtaining the augmented reality video. 

In [10]:
while True:
    
#Reading current iteration's frame

    success, current_frame = video.read()   #success is true, if frame correctly read

    if not success:
        print("Can't read frame. Exiting loop...")
        break
        
        
#Detecting and computing the salient points of the current frame
    
    kp_currentFrame, des_currentFrame = computeFrame()
    
    
#Finding corrispondences by using FLANN's matching algorithm
    
    matches = flann_algorithm()    
    
    
#Filtering keypoints and showing the matches of each iteration    

    matchesMask, good = filter_and_draw_kp()
        
        
#Checking if the keypoints are at least four. 
  
    MIN_MATCH_COUNT = 4
    
    if len(good) > MIN_MATCH_COUNT:
        
        
#Finding the src_pts and dst_pts, then compute the current homography
        
        H = find_homography()
        
        
#Applying the homography to the corners of the reference frame, 
           
        dst = get_perspectivePts()
        
        

# trasformazione di prospettiva tra currentFrame e reference
        
        h_l, w_l, _ = aug_layer_resized.shape
    
        
# 14) Fornendo i corner dell'augmented layer (resized) e i corner del currentFrame, calcolo la nuova homography
            
        H = get_last_homography()
        
# 15) Warping dell'Augmented_Layer

        warping_process()
        
        if cv2.waitKey(1) == ord('q'):
            break

    else:
        print("Not enough matches are found - {}/{}".format(len(good), MIN_MATCH_COUNT))
        matchesMask = None
        break
        
# 18) Una volta completati i frame del video termino le finestre
    
    # When everything done, release the capture
    
video.release()
cv2.destroyAllWindows()

Can't read frame. Exiting loop...


## Conclusions

As we have seen, the outcome is valid, making the augmented reality very realistic. This result has been achieved thanks to a better use of the involved variables and algorithms:

* Otsu's algorithm in order to compute the binarization of the mask;
* Flann's based matcher for the kd-tree algorithm to select k best matches between keypoints;
* Lowe's ratio between the distances of the best two matches to achieve a better matching process.
