# Computer Vision

#### Welcome to this exercise which introduces Computer Vision!

<img style="float:right;" src="./img/pinhole_model.png" width= 50% />

<br>
While photogrammetry has over 100 years of tradition, over the past several years, advances in technology has seen the application of Computer Vision applied in the geospatial domain.
<br> <br>


Newer technologies often come with their own jargon (terms, definitions and language). This exercise is not meant to be exhaustive but serves as a gentle introduction to the concepts and potential of Computer Vision.

<div class="alert alert-block alert-info"> <strong> Lets first define the core difference between these two methods. </strong>  <br> <br>

**Computer Vision**: interpretation and understanding of visual data often involving real-time decision-making by machines. Enable machines to interpret and understand visual data. Medical imaging and autonomous navigation|<br><br>
**Photogrammetry**: reconstruct 3D structures and spatial relationships. Accurately measure and reconstruct the three-dimensional shape. Remote sensing and mapping.   
</div>

**In order for realise the goal of Computer Vision it is often necessary to harvest 3D data from imagery**. While this is also the aim of photogrammetry; the processes differ slightly.he result.
</div>

|Aspect|Traditional Photogrammetry|Computer Vision|
|---|---|---|
|Stereo|Primarily relies on **_stereo matching_** to create 3D models|Combines stereo vision with **_advanced techniques_** like Structure from Motion Multi-view Stereo (SfM-MVS) and SLAM (Simultaneous Localization and Mapping)|
|Calibration|**_Precise sensor calibration_** for accurate results|May involve more **automated_** calibration procedures|
|Processing Speed|Time-consuming manual processes|Often performs real-time or **_near-real-time processing_** thanks to powerful hardware and **_optimized algorithms_**.|
|Data Volume|Deals with smaller datasets of images.|Analyzes large datasets of images, videos, and 3D point clouds.|
|Data Availability|Data acquisition and processing may be costly and time-consuming.|Access to data is becoming more abundant and affordable, thanks to the proliferation of digital cameras and sensors|


<div class="alert alert-danger">
  <strong>REQUIRED!</strong> 
  
You are required to insert your outputs and any comment into this document. The document you submit should therefore contain the existing text in addition   
    
    

 - Plots and other outputs from exec the code
    nks
 - Discussion of your plots and other outputs as well as conclusions r. 
    hed.
 - This should also include any hypotheses and assumptions made as well as factors that may affect your conclusions.
</div>

In [None]:
!pip3 install open3d

#- _ https://github.com/Vedang-101/Structure-From-Motion/blob/main/Python/main.py , https://github.com/Vedang-101/Structure-From-Motion/blob/main/Python/New%20Code/SFM.py _

In [2]:
#- load the magic
import cv2
import numpy as np
import math
import os
import glob

In [3]:
from google.colab import drive
import os

drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
#import os
dir = os.path.join("drive/My Drive/UCTJuly-Nov2023/APG3012S-RSandP/assignments/collinearity/data/sfm/")#data/sfm")

<div class="alert alert-block alert-success"> <strong> For this exercise we will harvest 3D data from imagery with Computer Vision. </strong>  
<br><br>
We will do so in two stages. 

&nbsp;&nbsp;&nbsp;&nbsp;**1. Structure-from-Motion (SfM)** _(with two images to compare with traditional photogrammetry)_  
&nbsp;&nbsp;&nbsp;&nbsp;**2.** extend the SfM process with **Multi-view Stereo (MVS)** _(and show how the method accomodates many images)_

In the process we will work through such concepts as Feature Matching and Disparity
</div>

**For this exercise we will work through the well known [(Christoph Strecha's) Fountain-P11 dataset](https://certis.enpc.fr/demos/stereo/Data/Fountain11/index.html).**

In [30]:
#input_dir = "./data/sfm/fountain/two/"
input_dir = os.path.join(dir, 'two')
#output_dir = "./data/sfm/fountain/two/result/"
output_dir = os.path.join(dir, 'two', 'result')
format_img = ".jpg"

<table><tr>
<td> <img src="./data/sfm/two/im2.jpg" alt="Drawing" style="width: 550px;"/> </td>
<td> <img src="./data/sfm/two/im4.jpg" alt="Drawing" style="width: 550px;"/> </td>
</tr></table>

## 1. Structure-from-Motion (SfM)

In [50]:
def PointMatchingOpticalFlow(img1, img2, path, save=False):
    #matches using optical flow
    ffd = cv2.FastFeatureDetector_create()
    left_keypoints = ffd.detect(img1, None)
    right_keypoints = ffd.detect(img2, None)
    left_points = KeyPointsToPoints(left_keypoints)
    right_points = np.zeros_like(left_points)

    #Checking if images are in greyscale
    prevgray = img1
    gray = img2
    if len(img1.shape) == 3:
        prevgray = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
        gray = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)

    right_points, vstatus, verror = cv2.calcOpticalFlowPyrLK(prevgray,gray,left_points,right_points)
    #Filterout points with high error
    right_points_to_find = []
    right_points_to_find_back_index = []
    for i in range(0, len(vstatus)):
        if(vstatus[i] and verror[i] < 12.0):
            right_points_to_find_back_index.append(i)
            right_points_to_find.append(right_points[i])
        else:
            vstatus[i] = 0

    found_in_imgpts_j = []
    right_points_to_find_flat = np.array(right_points_to_find).reshape(len(right_points_to_find), 2)
    right_features = KeyPointsToPoints(right_keypoints)
    right_features_flat = right_features.reshape(len(right_features), 2)

    #Look around each of point in right image for any features that were detected in its area and make a match
    matcher = cv2.BFMatcher_create(cv2.NORM_L2)
    nearest_neighbours = matcher.radiusMatch(right_points_to_find_flat, right_features_flat, 2.0)#THIS IS THE NEW LINE added(Sarthak)
    #nearest_neighbours = cv2.BFMatcher().radiusMatch(right_features_flat, right_points_to_find_flat, 2.0)
    matches = []
    # print(len(nearest_neighbours))

    for i in range(0, len(nearest_neighbours)):
        _m = None
        if len(nearest_neighbours[i]) == 1:
            _m = nearest_neighbours[i][0]
        elif len(nearest_neighbours[i]) > 1:
            if (nearest_neighbours[i][0].distance / nearest_neighbours[i][1].distance) < 0.7:
                _m = nearest_neighbours[i][0]
            else:
                #did not pass ratio test
                pass
        else:
            #no match
            pass

        #prevent duplicates
        if _m != None:
            if found_in_imgpts_j.count(_m.trainIdx) == 0:
                #back to original indexing of points for <i_idx>
                _m.queryIdx = right_points_to_find_back_index[_m.queryIdx]
                matches.append(_m)
                right_points_to_find_back_index.append(_m.trainIdx) #Added this LINE(Sarthak)

    img3 = cv2.drawMatches(img1, left_keypoints, img2, right_keypoints, matches, None)

    if save:
        #cv2.imwrite(path + "_x_"+filename2+".jpg", img3) #os.path.join(input_dir, 'im3.jpg'))
        cv2.imwrite(os.path.join(path, 'im3_x_im4.jpg'), img3)
    return left_keypoints, right_keypoints, matches

def KeyPointsToPoints(keypoints):
  out = []
  for kp in keypoints:
    out.append([[kp.pt[0], kp.pt[1]]])
  res = np.array(out, dtype=np.float32)
  return res

<img src="./data/sfm/result_two/im3_x_im4.jpg" alt="Drawing" style="width: 1000px;"/> </td>

In [47]:
def PairStructureFromMotion():
    img1 = cv2.imread(os.path.join(input_dir, 'im3.jpg'))#"../im3.jpg")
    img2 = cv2.imread(os.path.join(input_dir, 'im4.jpg'))
    #kp1, kp2, matches = PointMatchingSURF(img1, img2)
    kp1, kp2, matches = PointMatchingOpticalFlow(img1, img2, save=True, path = output_dir)
    K = findCalibrationMat()
    E = FindEssentialMat(kp1, kp2, matches, K)

    P0 = np.float32([[1,0,0,0],
                     [0,1,0,0],
                     [0,0,1,0]])

    P1 = FindPMat(E)
    print(P1)

    ply = []
    error, ply = TraingulatePoints(kp1, kp2, matches, K, P0, P1, img1, ply)
    print("Mean Error = ", error)

    out = PLY(output_dir)
    out.insert_header(len(ply), "fountain")
    for i in range(0,len(ply)):
      out.insert_point(ply[i][0],ply[i][1],ply[i][2],ply[i][3],ply[i][4],ply[i][5])


    cv2.destroyAllWindows()

In [43]:
def PointMatchingSURF(img1, img2, save = False, path = None):
    #Match with SURF
    surf = cv2.xfeatures2d.SURF_create()
    keypoints1, descriptors1 = surf.detectAndCompute(img1, None)
    keypoints2, descriptors2 = surf.detectAndCompute(img2, None)

    img4 = cv2.drawKeypoints(img1, keypoints1, None)

    cv2.imwrite("KP.jpg", img4)
    bf = cv2.BFMatcher_create()

    matches = bf.match(descriptors1, descriptors2)

    img3 = cv2.drawMatches(img1, keypoints1, img2, keypoints2, matches[:200], None, flags=2)

    if save:
        cv2.imwrite('03' + "_x_" + '04' +".jpg", img3)

    return keypoints1, keypoints2,matches

<div class="alert alert-block alert-warning"><b>QUESTION! </b>  </div>

- **This exercise does not use `Scale-invariant feature transform (SURF)` to match features but executes the `Optical Flow` algorithm** _(you can change the code and see what happens)_.
  
    **Discuss the difference between these two methods. Mention their strenghts and weaknessess. A note about intellectual property is expected. Your answer cannot be more than 150 words.**

{ click in this cell and write your answer here }

In [32]:
def findCalibrationMat():
    #with open(input_dir+'intrinsic.txt') as f: #os.path.join(input_dir, 'im3.jpg'))
    with open(os.path.join(dir, 'intrinsic.txt')) as f:
        lines = f.readlines()
    return np.array(
        [l.strip().split(' ') for l in lines],
        dtype=np.float32
    )

In [14]:
def FindEssentialMat(kp1, kp2, matches, K):
    imgpts1 = []
    imgpts2 = []
    for i in range(0, len(matches)):
        imgpts1.append([[kp1[matches[i].queryIdx].pt[0], kp1[matches[i].queryIdx].pt[1]]])
        imgpts2.append([[kp2[matches[i].trainIdx].pt[0], kp2[matches[i].trainIdx].pt[1]]])

    F = cv2.findFundamentalMat(np.array(imgpts1, dtype=np.float32), np.array(imgpts2, dtype=np.float32), method=cv2.FM_RANSAC, ransacReprojThreshold=0.1, confidence=0.99)
    E = np.matmul(np.matmul(np.transpose(K), F[0]), K)
    return E

In [34]:
def checkCoherentRotation(R):
    if(math.fabs(np.linalg.det(R))-1.0 > 1e-07):
        print("Not a coherent rotational Matrix")
        return False
    else:
        print("Coherent Rotational Matrix found")
        return True

def FindPMat(E):
    _, u, vt = cv2.SVDecomp(E, flags=cv2.SVD_MODIFY_A)
    W = np.float32([[0,-1,0],
                    [1,0,0],
                    [0,0,1]])
    R = np.matmul(np.matmul(u, W), vt)
    t = u[:, 2]

    P = np.float32([])
    if checkCoherentRotation(R):
        P = np.float32([[R[0][0], R[0][1], R[0][2], t[0]],
                        [R[1][0], R[1][1], R[1][2], t[1]],
                        [R[2][0], R[2][1], R[2][2], t[2]]])
    else:
        P = None
    return P

In [16]:
def TraingulatePoints(pt_set1, pt_set2, matches, K, P, P1, img1, ply, _current = None):
    Kinv = np.linalg.inv(K)
    reproj_error = []
    for i in range(0, len(matches)):
        kp = pt_set1[matches[i].queryIdx].pt
        u = np.float32([[[kp[0]], [kp[1]], [1]]])
        um = np.matmul(Kinv, u)
        u = um[0]

        kp1 = pt_set2[matches[i].trainIdx].pt
        u1 = np.float32([[[kp1[0]], [kp1[1]], [1]]])
        um1 = np.matmul(Kinv, u1)
        u1 = um1[0]

        #Triangulate
        X = LinearLSTriangulation(u, P, u1, P1)

        #Calculate reprojection error
        X1 = [[X[0][0]],
              [X[1][0]],
              [X[2][0]],
              [1]]
        xPt_img = np.matmul(np.matmul(K, P1), X1)

        xPt_img_ = np.float32([[xPt_img[0]/xPt_img[2], xPt_img[1]/xPt_img[2]]])
        reproj_error.append(np.linalg.norm(xPt_img_-kp1))
        reproj_error.append(1.0)

        #print(kp[0], kp[1])
        bgr = img1[int(kp[1]),int(kp[0])]

        if _current is not None:
            _current.add_entry((X[0],X[1],X[2]), (kp1[0], kp1[1]))

        #x = X[0] y = Y[1] z = X[2]
        ply.append([X[0],X[1],X[2],bgr[0],bgr[1],bgr[2]])
    me = np.mean(reproj_error)
    return me, ply

In [17]:
def LinearLSTriangulation(u, P, u1, P1):
    A = np.float32([[u[0]*P[2][0]-P[0][0], u[0]*P[2][1]-P[0][1], u[0]*P[2][2]-P[0][2]],
                    [u[1]*P[2][0]-P[1][0], u[1]*P[2][1]-P[1][1], u[1]*P[2][2]-P[1][2]],
                    [u1[0]*P1[2][0]-P1[0][0], u1[0]*P1[2][1]-P1[0][1], u1[0]*P1[2][2]-P[0][2]],
                    [u1[1]*P1[2][0]-P1[1][0], u1[1]*P1[2][1]-P1[1][1], u1[1]*P1[2][2]-P1[1][2]]])
    A = np.reshape(A, [4, 3])
    B = np.float32([[-(u[0]*P[2][3]-P[0][3])],
                    [-(u[1]*P[2][3]-P[1][3])],
                    [-(u1[0]*P1[2][3]-P1[0][3])],
                    [-(u1[1]*P1[2][3]-P1[1][3])]])
    B = np.reshape(B, [4, 1])
    _, X = cv2.solve(A,B,flags=cv2.DECOMP_SVD)
    return X

In [36]:
ply_header = '''ply
format ascii 1.0
element vertex %(vert_num)d
property float x
property float y
property float z
property uchar red
property uchar green
property uchar blue
end_header
'''


class PLY:
	def __init__(self, results_dir):
		self.dir = results_dir
		self.name = None

	def insert_header(self, point_cloud_size, Name):
		self.name = self.dir + Name + '.ply'

		with open(self.name, 'wb') as file:
			file.write((ply_header % dict(vert_num=point_cloud_size+1)).encode('utf-8'))
			file.write('0 0 0 255 0 0\n'.encode('utf-8'))

	def insert_point(self, x, y, z, b, g, r):

		with open(self.name, 'ab') as file:
			file.write((str(x[0]) + ' ').encode('utf-8'))
			file.write((str(y[0]) + ' ').encode('utf-8'))
			file.write((str(z[0]) + ' ').encode('utf-8'))
			file.write((str(r) + ' ').encode('utf-8'))
			file.write((str(g) + ' ').encode('utf-8'))
			file.write((str(b) + '\n').encode('utf-8'))

In [51]:
#- execute
PairStructureFromMotion()

Coherent Rotational Matrix found
[[ 0.6421624  -0.06788093 -0.7635572  -0.4350088 ]
 [ 0.01265835  0.99687475 -0.07797722 -0.02798646]
 [-0.7664641  -0.04040866 -0.64101475 -0.89999115]]
Mean Error =  13976.266673563205


## 2. Multi-view Stereo (MVS)

<div class="alert alert-block alert-warning"><b>TASK / QUESTION! </b>  </div>

- **Execute this NoteBook on a series of photographs you have captured yourself** _(no more than 15)_. **This Notebook must therefore contain the output from you own dataset.**

_images:_

- **pinhole model**: https://kornia.readthedocs.io/en/latest/geometry.camera.pinhole.html