# Computer Vision

<div class="alert alert-block alert-info"> <strong>Name: </strong> [ Write your surname.name between the brackets (like that surname.name) ]  
</div>

#### Welcome to this exercise which introduces Computer Vision!

<img style="float:right;" src="./img/pinhole_model.png" width= 50% />

<br>
While photogrammetry has over 100 years of tradition, the past several years has seen Computer Vision applied in the geospatial domain.
<br> <br>


Newer technologies often come with their own jargon (terms, definitions and language). 

This exercise is not meant to be exhaustive but serves as a gentle introduction to the concepts and potential of Computer Vision.

<div class="alert alert-block alert-info"> <strong> Lets first define the core difference between these two methods. </strong>  <br> <br>

**Computer Vision**: Involves the interpretation and understanding of visual data often involving real-time decision-making by machines. Enable machines to interpret and understand visual data. _Medical imaging and autonomous navigation_  
**Photogrammetry**: reconstruct 3D structures and spatial relationships. Accurately measure and reconstruct the three-dimensional shape. _Remote sensing and mapping_</div>

**In order for realise the goal of Computer Vision it is often necessary to harvest 3D data from imagery**. While this is also the aim of photogrammetry; the processes differ slightly.

|Aspect|Traditional Photogrammetry|Computer Vision|
|---|---|---|
|Stereo|Primarily relies on **_stereo matching_** to create 3D models|Combines stereo vision with **_advanced techniques_** like Structure from Motion Multi-view Stereo (SfM-MVS) and SLAM (Simultaneous Localization and Mapping)|
|Calibration|**_Precise sensor calibration_** for accurate results|May involve more **_automated_** calibration procedures|
|Processing Speed|Time-consuming manual processes|Often performs real-time or **_near-real-time processing_** thanks to powerful hardware and **_optimized algorithms_**.|
|Data Volume|Deals with smaller datasets of images.|Analyzes large datasets of images, videos, and 3D point clouds.|
|Data Availability|Data acquisition and processing may be costly and time-consuming.|Access to data is becoming more abundant and affordable, thanks to the proliferation of digital cameras and sensors|
|Accuracy|High accuracy _(or rather the **confidence** in the quality)_ making it suitable for geospatial applications|Accuracy varies depending on the application and the quality of data and algorithms|


<div class="alert alert-danger">
  <strong>REQUIRED!</strong> 
  
You are required to insert your outputs and any comment into this document. 
    
The **aim** of this exercise is for you to execute this Notebook on **_a series of photographs you have captured yourself_**. The document you submit should therefore contain the existing text in addition to:   
        

 - Plots and other outputs from exec the code cells  
 - Discussion of your plots and other outputs as well as conclusions reached.  
 - This should also include any hypotheses and assumptions made as well as factors that may affect your conclusions.
</div>

In [None]:
#- load the magic
import numpy as np
import cv2

import matplotlib.pyplot as plt
import matplotlib.image as mpimg

In [3]:
#- if you find it challenging to install and import the libraries on a local machine; give colab a go. 
from google.colab import drive
import os

drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#- path (main folder)
dir = os.path.join(".../collinearity/")

<div class="alert alert-block alert-success"> <strong> For this exercise we will harvest 3D data from imagery with Computer Vision. </strong>  
<br><br>
We will do so in twice. 
<br>  
    
&nbsp;&nbsp;&nbsp;&nbsp;**1. Traditional stereo-vision** _(two photographs)_  
&nbsp;&nbsp;&nbsp;&nbsp;**2. Structure-from-Motion (SfM)**
<!-- &nbsp;&nbsp;&nbsp;&nbsp;**2.** extend the SfM process with **Multi-view Stereo (MVS)** _(and show how the method accomodates many images)_-->

In the process we will work through such concepts as **_Feature Matching, Essential and Fundamental matrices, Disparity, Triangulation and Bundle Adjustment_**.
</div>

In [30]:
#- path (to the data)
input_dir = os.path.join(dir, 'stereo')

### 1. Stereo-vision

Before we _really_ get into **Computer Vision**; it might be helpful to introduce _'algorithm-based'_ stereo photogrammetry. Modern **Computer Vision** is infact already very mature with completely automated workflows that harvest 3D information via Deep Learning _(neural networks)_. 

We therefore take a step back and start with the basics to understand how these procedures execute a solution. Two images, of the same feature, taken from **_two different viewpoints_**.

In [None]:
img_list = sorted(os.listdir(input_dir))
images = []
for img in img_list:
    if '.jpg' in img.lower() or '.png' in img.lower():
        images = images + [img]
i = 0

print(images)

<table><tr>
<td> <img src="./data/stereo/im0.png" alt="Drawing" style="width: 550px;"/> </td>
<td> <img src="./data/stereo/im0.png" alt="Drawing" style="width: 550px;"/> </td>
</tr></table>

#### 1. a. Feature Matching

In order to harvest 3D data we need the location and orientation of the cameras; at the moment the image was captured. 

Like traditional **Photogrammerty** which needs _**some**_ variables _(ground control, location and orientation of photographs)_ to solve for the **collinearity equation**; **Computer Vision** also needs a few known variables.  

Even so; modern **Computer Vision** can recover the geometry and pose of the camera _(the relative orientation)_ from the imagery itself. It can do so through feature detection and matching algorithms which work in two steps:

&nbsp;&nbsp;&nbsp;&nbsp;i) find features _(known as **keypoints**)_ in images and  
&nbsp;&nbsp;&nbsp;&nbsp;ii) **match** corresponding features _(connect the features that match and disgard those that don't)_

In [50]:
#- feature matching
def PointMatchingAKAZE(img1, img2, path):

    akaze = cv2.AKAZE_create()
    kp1, des1 = akaze.detectAndCompute(img1, None)
    kp2, des2 = akaze.detectAndCompute(img2, None)

    img4 = cv2.drawKeypoints(img1, kp1, None, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
    cv2.imwrite(os.path.join(path, 'keypoints.jpg'), img4)

    #- match features
    bf = cv2.BFMatcher(cv2.NORM_HAMMING)
    matches = bf.knnMatch(des1, des2, k=2)

    #- ratio test
    good = []
    for m, n in matches:
        if m.distance < 0.70 * n.distance:
            good.append(m)

    pts1 = np.float32([kp1[m.queryIdx].pt for m in good]).reshape(-1,1,2)
    pts2 = np.float32([kp2[m.trainIdx].pt for m in good]).reshape(-1,1,2)

    #- cv2.drawMatchesKnn expects list of lists as matches.
    #img3 = cv2.drawMatches(img1, kp1, img2, kp2, good[:50], None, flags=2)
    img3 = cv2.drawMatches(img1, kp1, img2, kp2, good, None, flags=2)
    cv2.imwrite(os.path.join(path, 'imL_x_imR.jpg'), img3)

    return pts1, pts2, good

**keypoints**

<img src="./data/stereo/keypoints.jpg" alt="Drawing" style="width: 1000px;"/> </td>

**matches**

<img src="./data/stereo/imL_x_imR.jpg" alt="Drawing" style="width: 1000px;"/> </td>

<div class="alert alert-block alert-warning"><b>TASK! </b>  </div>

- **You are expected to insert the result** _(the `keypoints` and `imL_x_imR.jpg`)_ **of your own dataset in the `cells`** _(double-click the image and change the file)_ **above and discuss the result.** 

<div class="alert alert-block alert-info"><b>HINT!</b> Open the image in another application. Zoom and look at the number of features and the respective matches. </div> 

{ click in this cell and write your answer here }

In [43]:
#- sift
def PointMatchingSIFT(img1, img2, path):
    #Matches using SIFT
    sift = cv2.xfeatures2d.SIFT_create()
    kp1, des2 = sift.detectAndCompute(img1, None)
    kp2, dse2 = sift.detectAndCompute(img2, None)
    
    img4 = cv2.drawKeypoints(img1, kp1, None)
    cv2.imwrite("keypoints.jpg", img4)
    
    bf = cv2.BFMatcher_create()
    matches = bf.match(des1, des2)

    img3 = cv2.drawMatches(img1, kp1, img2, kp2, matches[:200], None, flags=2)
    cv2.imwrite(os.path.join(path, 'imL_x_imR.jpg'), img3)
    
    return kp1, kp2, matches

<div class="alert alert-block alert-warning"><b>QUESTION! </b>  </div>

- **This exercise does not use `Scale-invariant feature transform (SIFT)` to match features but executes the `AKAZE` algorithm** _(you are expected to change the `code` and test both solutions)_.
  
    **Discuss the difference between these two methods. Mention their strenghts and weaknessess. A note about intellectual property is expected. Your answer cannot be more than 75 words.**

{ click in this cell and write your answer here }

#### 1. b. Camera calibration

**While we can calibrate a camera with `opencv` we use existing calibration information.** 

You will need to create your own for the **TASK** at the end of this exercise so lets see what it contains.  The `calib.txt` is structured:

\begin{equation}
\begin{pmatrix}
  fx       & 0   & cx   \\
  0       & fy   & cy   \\
  0       & 0   & 1     \\
\end{pmatrix}
\end{equation}

where `fx = f * resX (the number of columns) / sensorSizeX` and `fy = f * resY (the number of rows) / sensorSizeY` with `f = focal length`.  
`cx` and `cy` represent the principle point of the image and, with modern digital cameras, can be calculated with `cx = resX (the number of columns) / 2` and `cy = resY (the number of rows) / 2`.

In [32]:
#- calib.
def findCalibrationMat():
    #with open(input_dir+'intrinsic.txt') as f: #os.path.join(input_dir, 'im3.jpg'))
    with open(os.path.join(dir, 'calib.txt')) as f:
        lines = f.readlines()
    return np.array(
        [l.strip().split(' ') for l in lines],
        dtype=np.float32
    )

#### 1. c. Fundamental and Essential matrices _(relative orientation)_

In **Photogrammetry** we locate and orient _**one**_ image and are able to calculate a 3D coordinate via the **collinearity equation**. When we do intersect **image rays** from more than one image; we do so from already oriented imagery. 

<img style="float:right;" src="./img/e_f-matrices.png" width= 40% />

In order to harvest 3D information; we need to know the relationship between the images (_i.e.: the relative orientation)_.

In Computer Vision we call this the **essential and fundamental matrices**. These matrices solve for the geometry and pose of the cameras at the moment the images were captured.

The previous step (feature matching) in **Computer Vision** thus serves two purposes. We recover the _relative orientation_ of all images simultaneously (essential matrix) and describe the relationship between images in the same scene. And use these to map points of one image to lines in another (fundamental matrix). 

Think of an **essential and fundamental matrices** as generalizations of the **collinearity equations** that account for the relative geometry between multiple cameras. They allow for the reconstruction of 3D structures by considering the relationship between corresponding points in multiple cameras.

In [14]:
def FindEssentialMat(kp1, kp2, matches, K):
    
    #- essential matrix
    E, mask = cv2.findEssentialMat(kp1, kp2, K, method = cv2.RANSAC, prob = 0.999, threshold = 0.4, mask = None)
    kp1 = kp1[mask.ravel() ==1]
    kp2 = kp2[mask.ravel() ==1]
    #- fundamental matrix
    #fundamental_matrix, inliers = cv.findFundamentalMat(kp1, kp2, cv.FM_RANSAC)
    
    #- essential matrix to rotation and translation
    _, R, t, mask = cv2.recoverPose(E, kp1, kp2, K)
    
    return E, R, t

<div class="alert alert-block alert-warning"><b>EXTRA!</b> </div> 

- **While beyond the scope of this exercise; the overal aim of the Fundamental and Essential matrices is to recover the Epipolar Geometry and the Epipolar Line. These _narrow_ the search space to recovery the Disparity** _(the horizontal difference)_ **between pixels.**

  **You are welcome to write 100 words on Epipolar Geometry and Epipolar Line and what it does. The choice is yours. Its up to you.**

{ click in this cell and write your answer here }

#### 1. d. Disparity


<img style="float:right;" src="./img/Stereo-Disparity-in-Cameras.png" width= 40% />
<br><br>

We see **Disparity** and _perceive depth_. 

Choose an object 5-meters away from you. Close your left eye... open your left eye.  
Now close you right eye. Alternate. Left eye. Right eye.  
The object will appear to move _(horizontally)_. left / right. 
<br><br>
We see _**horizontal displacement**_ and interpret the phenomenon as _3D vision_.

Modern **Computer Vision** harvests 3D information in _exactly the same_ way as the human eye sees. 

Because we have correctly oriented photographs we know the _perceived shift in location_ of any feature. We use this knowledge to create a depth _(disparity)_ map that decribes the distance from the camera.
<br><br><br><br>
We do however need the dimensions (**_width and height_**) of the image.

In [None]:
w = 2964
h = 2000

In [34]:
#def rectifyToDisparity(imgL, imgR, K, D, R, t, path):
def rectifyToDisparity(imgL, imgR, K, D, R, t, w, h, path)
    
    R1, R2, P1, P2, Q, a, b = cv2.stereoRectify(K, D, K, D, (w, h), R, t)
    map1, map2 = cv2.initUndistortRectifyMap(K, D, R1, P1, (w, h), cv2.CV_16SC2)
    imgLrec = cv2.remap(imgL, map1, map2, cv2.INTER_CUBIC)
    #cv2.imwrite(os.path.join(path, 'imgLrectfy.jpg'), imgLrec)
    
    map3, map4 = cv2.initUndistortRectifyMap(K, D, R2, P2, (w, h), cv2.CV_16SC2)
    imgRrec = cv2.remap(imgR, map3, map4, cv2.INTER_CUBIC)
    #cv2.imwrite(os.path.join(path, 'imgRrectfy.jpg'), imgRrec)
    
    max_disparity = 199
    min_disparity = 23
    num_disparities = max_disparity - min_disparity
    window_size = 5
    stereo = cv2.StereoSGBM_create(minDisparity = min_disparity, numDisparities = num_disparities, blockSize = 5,
                                 uniquenessRatio = 5, speckleWindowSize = 5, speckleRange = 5, disp12MaxDiff = 2,
                                 P1 = 8*3*window_size**2, P2 = 32*3*window_size**2)
    
    stereo2 = cv2.ximgproc.createRightMatcher(stereo)
    
    lamb = 8000
    sig = 1.5
    visual_multiplier = 1.0
    wls_filter = cv2.ximgproc.createDisparityWLSFilter(stereo)
    wls_filter.setLambda(lamb)
    wls_filter.setSigmaColor(sig)
    
    disparity = stereo.compute(imgLrec, imgRrec)
    disparity2 = stereo2.compute(imgRrec, imgLrec)
    disparity2 = np.int16(disparity2)
    
    filteredImg = wls_filter.filter(disparity, imgL, None, disparity2)
    _, filteredImg = cv2.threshold(filteredImg, 0, max_disparity * 16, cv2.THRESH_TOZERO)
    filteredImg = (filteredImg / 16).astype(np.uint8)
    cv2.imwrite(os.path.join(path, 'dsprty.jpg'), filteredImg)
    
    return filteredImg#, Q

**Disparity image**

In [None]:
#- look at the disparity image
img = mpimg.imread(input_dir + '/dsprty.jpg')
print(repr(img))

In [None]:
#- plot
imgplot = plt.imshow(img, cmap="Greys_r")
plt.colorbar()
plt.show()

<div class="alert alert-block alert-warning"><b>QUESTION! </b>  </div>

<img style="float:right;" src="./img/plt_dsprtyPlot.png" width= 20% />

- **Why is the structure of a Disparity image** _(every pixel is a z)_ **exactly the same as a Digital Elevation Model (DEM)? What does this say about what we have just done?**

<div class="alert alert-block alert-info"><b>BONUS!</b> Change the colour palette from gray-scale. </div> 

{ click in this cell and write your answer here }

#### 1. e. Project to 3D

We are now at the final stage of the stereo-vision process. We want to harvest the values, from the **disparity** map, and convert the _depth_ into 3D information _(a point cloud we can render in a 3D environment)_.

In order to do so we need additional information. We need the distance _(the baseline)_ between the cameras (the eyes).

In [None]:
baseline = 193.001

In [16]:
#def Reprojection3D(image, disparity, f, b):
def Reprojection3D(image, disparity, f, b, w, h, doff):
    #- Q. you might need to... 
    #Q = np.array([[1, 0, 0, -2964/2], [0, 1, 0, -2000/2],[0, 0, 0, f],[0, 0, -1/b, -124.343/b]])
    Q = np.array([[1, 0, 0, -w/2], [0, 1, 0, -h/2],[0, 0, 0, f],[0, 0, -1/b, -doff/b]])
    
    points = cv2.reprojectImageTo3D(disparity, Q)
    mask = disparity > disparity.min()
    colors = image
    
    out_points = points[mask]
    out_colors = image[mask]
    
    verts = out_points.reshape(-1,3)
    colors = out_colors.reshape(-1,3)
    verts = np.hstack([verts, colors])
    
    ply_header = '''ply
        format ascii 1.0
        element vertex %(vert_num)d
        property float x
        property float y
        property float z
        property uchar blue
        property uchar green
        property uchar red
        end_header
        '''
    
    with open(input_dir + '/stereo.ply', 'w') as f:
        f.write(ply_header %dict(vert_num = len(verts)))
        np.savetxt(f, verts, '%f %f %f %d %d %d')
        
def StereoCV(w, h, b):
    #- the function
    img1 = cv2.imread(os.path.join(input_dir, 'im0.png'))
    img2 = cv2.imread(os.path.join(input_dir, 'im1.png'))

    pts1, pts2, matches = PointMatchingAKAZE(img1, img2, path = input_dir)

    K = findCalibrationMat()
    #K = np.array([[3979.911, 0, 1369.115], [0, 3979.911, 1019.507], [0, 0, 1]], dtype = np.float32)
    D = np.zeros((5,1), dtype = np.float32)

    E, R, t = FindEssentialMat(pts1, pts2, matches, K)

    P1 = np.zeros((3,4))
    P1 = np.matmul(K, P1)
    P2 = np.hstack((R, t))
    P2 = np.matmul(K, P2)

    #filteredImg, Q = rectifyToDisparity(img1, img2, K, D, (2964, 2000), R, t, input_dir)
    #filteredImg, Q = rectifyToDisparity(img1, img2, K, D, R, t, input_dir)
    filteredImg, Q = rectifyToDisparity(img1, img2, K, D, R, t, w, h, input_dir)

    f = K[0][0]/2
    baseline = b/2
    doff = (baseline / f) * w # horizontal disparity or shift between the optical centers of the left and right cameras in terms of pixels
    #Reprojection3D(img1, filteredImg, f, baseline)
    Reprojection3D(img1, filteredImg, f, baseline, w, h, doff)

    cv2.destroyAllWindows()

In [17]:
#- execute
StereoCV(w, h, b)

<img src="./data/stereo/result.png" alt="Drawing" style="width: 1000px;"/> </td>

<div class="alert alert-block alert-warning"><b>TASK! </b>  </div>

- **Capture two photographs of any object and execute this exercise. You will need to determine the distance between the cameras** _(baseline)_ **and create a `calib.txt`.**

   **In no more than 50 words; discuss the result** _(the stereo.ply)_. **You are expected to comment on the quality, its potential use and how the solution can be improved**.

{ click in this cell and write your answer here }

<div class="alert alert-danger">
  <strong>REQUIRED!</strong> 
  
Don't forget. Besides the contents on this Notebook your project folders need to be submitted also.
</div>

____

## 2. Structure-from-Motion (SfM)

<img style="float:right;" src="./img/global_sfm.png" width= 50% />
<br>

**Structure-from-Motion** (SfM) has had an enormous impact on the geospatial community and radically transformed the types of products the geomatics practitioner **_serves clients_**. 

These range from foundational datasets (Orthomosaic and elevation models) and include value-added products such as immersive 3D environments.
<br><br>
The current trajectory of technology and demand for more interactive realistic 3D representations of the world (which is a key geomatics skill) means a basic understanding how these products are created is a decisive advantage.

<div class="alert alert-block alert-success"> <strong> In the second part of this exercise we will work-through a typical SfM pipeline. </strong>  
<br><br>
    
**The aim** is to equip the user with the necessary knowledge to understand the underlying process; readily available geospatial software executes at the click of a button. 
<br><br>
We will also revisit and strengthen our knowledge of feature matching, triangulation (space intersection) and bundle adjustment. 
</div>

From **Part 1.** of this exercise we have seem that we are able to recover the _**relative orientation**_ (pose and geometry of the cameras: the **essential and fundamental matrices**) and **keypoints that match** across images. 

This is essentially the goal of **SfM**. The orientation of the cameras, at the moment the image was captured, and a sparse point cloud of tiepoints (keypoints) that represent matching features. **SfM** is the preparation before dense reconstruction with **Multi-view Stereo**.

Unlike **photogrammetry**, the **SfM** approach does not require prior knowledge (camera location and orientation, or control point information). And while essential for geomatics applications; scaling and transformation, from an arbritary relative coordinate system to a local coordinate system, are not necessary but can be introduced later if desired. The typical **SfM** pipeline executes as follows:

&nbsp;&nbsp;&nbsp;&nbsp;**a) Feature matching**;  
&nbsp;&nbsp;&nbsp;&nbsp;**b)** Estimate the **essential matrix and recover the geometry and pose** (rotation and translation) of the camera;   
&nbsp;&nbsp;&nbsp;&nbsp;**c) Triangulation** _(of 2D points into a 3D world)_ with their respective reprojection errors;  
&nbsp;&nbsp;&nbsp;&nbsp;**d) Refinement** of the rotation and translation parameters (above) via PnP  
&nbsp;&nbsp;&nbsp;&nbsp;**e) Iteratively add a new image into the pipeline**; feature match, estimate pose, triangulate and refine **with a final least squares bundle adjstment**.

In [None]:
#- path (to the data)
input_dir = os.path.join(dir, 'sfm')
output_dir = os.path.join(dir, 'sfm', 'result')

In [None]:
img_list = sorted(os.listdir(input_dir))
images = []
for img in img_list:
    if '.jpg' in img.lower() or '.png' in img.lower():
        images = images + [img]
i = 0
images.sort()
print(images)

#### 1. a. Feature Matching

In [None]:
def PointMatchingOpticalFlow(img1, img2, path, filename1 = None, filename2 = None):
    # Initialize the FAST detector and BRIEF descriptor
    fast = cv2.xfeatures2d.StarDetector_create()
    brief = cv2.xfeatures2d.BriefDescriptorExtractor_create()

    # Detector
    kp = fast.detect(img1,None)
    # Descriptor
    kp1, des1 = brief.compute(img1, kp)

    # Detector
    kp = fast.detect(img2,None)
    # Descriptor
    kp2, des2 = brief.compute(img2, kp)

    # FLANN matching
    FLANN_INDEX_KDTREE = 0
    index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
    search_params = dict(checks = 50)

    flann = cv2.FlannBasedMatcher(index_params, search_params)
    matches = flann.knnMatch(np.float32(des1),np.float32(des2), k=2) # Use NP.FLOAT32 for ORB, BRIEF, etc

    # store all the good matches as per Lowe's ratio test.
    good_matches = []
    for m,n in matches:
      if m.distance < 0.7 * n.distance:
        good_matches.append(m)
    
    #if len(good_matches)>10:
    p1 = np.float32([ kp1[m.queryIdx].pt for m in good_matches ])#.reshape(-1,1,2)
    p2 = np.float32([ kp2[m.trainIdx].pt for m in good_matches ])#.reshape(-1,1,2)
    
    img3 = cv2.drawMatches(img1, kp1, img2, kp2, good_matches, None)
    #if save:
        #cv2.imwrite(path + "_x_"+filename2+".jpg", img3) #os.path.join(input_dir, 'im3.jpg'))
    #cv2.imwrite(os.path.join(path, 'im2_x_im4.jpg'), img3)
    cv2.imwrite(os.path.join(path, filename1 + "_x_" + filename2 + ".jpg"), img3)

    return p1, p2 

In [None]:
#- execute. 
#- PointMatchingAKAZE is the same functions as Part 1.

#pts0, pts1 = PointMatchingAKAZE(img0, img1, output_dir, images[i][:-4], images[i+1][:-4])
pts0, pts1 = PointMatchingOpticalFlow(img0, img1, output_dir, images[i][:-4], images[i+1][:-4])

<div class="alert alert-block alert-warning"><b>QUESTION! </b>  </div>

- **Both the `Optical Flow` and `AKAZE` functions are available for feature detection and matching.** You should test both. **In no more than 75 words discuss the difference between these two algorithms, their major strengths and weaknesses and their typical usecases** _(when / where would you execute the solution)_.

#### 1. - Camera calibration

In [None]:
#- same function as part 1.

K = findCalibrationMat()
posearr = K.ravel()

#### 1. b. Essential matrix and pose _(relative orientation)_

In [None]:
#- define empty rotation matrices for each left-right image and fill them later
R_t_0 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
R_t_1 = np.empty((3, 4))

#- set left image calibration - rotation
P1 = np.matmul(K, R_t_0)
Pref = P1
P2 = np.empty((3, 4))

Xtot = np.zeros((1, 3))
colorstot = np.zeros((1, 3))

# Finding essential matrix
E, mask = cv2.findEssentialMat(pts0, pts1, K, method=cv2.RANSAC, prob=0.999, threshold=0.4, mask=None)
pts0 = pts0[mask.ravel() == 1]
pts1 = pts1[mask.ravel() == 1]

# The pose obtained is for second image with respect to first image
_, R, t, mask = cv2.recoverPose(E, pts0, pts1, K)                   #- finding the pose
pts0 = pts0[mask.ravel() > 0]
pts1 = pts1[mask.ravel() > 0]
R_t_1[:3, :3] = np.matmul(R, R_t_0[:3, :3])
R_t_1[:3, 3] = R_t_0[:3, 3] + np.matmul(R_t_0[:3, :3], t.ravel())

#- set left image calibration - rotation
P2 = np.matmul(K, R_t_1)

#### 1. c. Triangulation

In [None]:
# A function, for triangulation, given the image pair and their corresponding projection matrices
def Triangulation(P1, P2, pts1, pts2, K, repeat):
    if not repeat:
        points1 = np.transpose(pts1)
        points2 = np.transpose(pts2)
    else:
        points1 = pts1
        points2 = pts2

    cloud = cv2.triangulatePoints(P1, P2, points1, points2)
    cloud = cloud / cloud[3]

    return points1, points2, cloud

# Triangulation is done for the first image pair. The poses will be set as reference and used for incremental SfM
pts0, pts1, points_3d = Triangulation(P1, P2, pts0, pts1, K, repeat=False)

<div class="alert alert-block alert-warning"><b>TASK / QUESTION! </b>  </div>

- **Execute this NoteBook on a series of photographs you have captured yourself** _(no more than 15)_. **This Notebook must therefore contain the output from you own dataset.**

_images:_

- **pinhole model**: https://kornia.readthedocs.io/en/latest/geometry.camera.pinhole.html
- **epipolar geometry**: https://cw.fel.cvut.cz/b181/_media/courses/b3b33vir/exteroceptive_sensors.pdf
- **disparity**: https://www.e-consystems.com/blog/camera/technology/what-is-a-stereo-vision-camera-2/
- **global sfm**: http://theia-sfm.org/sfm.html