# Stereovision

![Suzanne](main.png)

Stereovision is a discipline that deals with the reconstruction of 3D information from images. For the reconstruction of a point, several images of this point are needed. These images must be taken from different points of view. The key step of the reconstruction, which is often problematic, is to identify the images of the point to be reconstructed in each view.

## Epipolar Geometry

Epipolar geometry involves two cameras. The epipolar geometry describes the geometric properties between two views of the same scene and depends only on the intrinsic parameters of the cameras and their relative positions. It provides, in particular, the epipolar constraint, which will be very useful to produce the matches between views.

## The Fondamental Matrix

![Epipolar Geometry - Sanyam Kapoor](epipolar.png)

Let us imagine that we have two images, right and left, of the world space. Let's take a point $\vec{x}$ in the right image space. The point $\vec{X}$ of the world space, of which $\vec{x}$ is the image, can be anywhere on the line passing through $\vec{x}$ and the optical center of the right camera. We will call this line the back-projected ray of $\vec{x}$. Let us note $\vec{x}'$ the image of $\vec{X}$ in the left image space. The locus of $\vec{x}'$ is therefore the image line of the back-projected ray of $\vec{x}$. This line is called the epipolar line and is denoted $\vec{l}'$. The epipolar line passes through the epipole $\vec{e}'$, image of the optical center of the right camera.

In 2D projective geometry, a line with equation $ax+by+c = 0$ is represented by a vector with three components $(a, b, c)^T$ defined to within one factor. Thus, we have the following relationship:

>The point $\vec{x}$ belongs to the line $\vec{l}$ if and only if $x^T\vec{l} = 0$.

Moreover, in 2D projective geometry, the following remarkable relations are valid:

- The intersection of two lines $l$ and $l'$ is given by $x = l \times l'$,
- The line passing through two points $x$ and $x'$ is given by $l = x \times x'$.

Note that the vector product can be written as a product of matrix $x \times y = [x]_\times y$ where

$$[x]_\times = \begin{pmatrix} 0 & −x3 & x2 \\ x3 & 0 & −x1 \\ −x2 & x1 & 0 \end{pmatrix}$$

To find the equation of the epipolar line in the left image space, we just need to find the coordinates of two points of this line. The first is the image $P'\vec{C}$ of the optical center $\vec{C}$ of the right camera where $P'$ is the projection matrix of the left camera. The second is $P'P^{+}\vec{x}$ where $P^{+}$ is the pseudo inverse of the projection matrix $P$ of the right camera. The epipolar line thus has the equation $l' = [P'\vec{C}]_\times{}P'P^{+}\vec{x} = F\vec{x}$ with $F = [P'\vec{C}]_\times{}P'P^{+}$. $F$ is called fundamental matrix.

Since the epipolar line $\vec{l}' = F\vec{x}$ is the locus of $\vec{x}'$, $\vec{x}'$ therefore belongs to $\vec{l}'$ which leads to the epipolar constraint :

>**The fundamental matrix is such that for any pair of points corresponding $\vec{x} \leftrightarrow \vec{x}'$ in the two images, we have $\vec{x}'^{T}F\vec{x} = 0$.**

## Computation of the fundamental matrix

The fundamental matrix $F$ has seven degrees of freedom. It has nine components but these are defined to within one scale factor, which removes one degree of freedom. Moreover, the matrix $F$ is a singular matrix ($det(F) = 0$) which gives us seven degrees of freedom. So we need at least seven correspondences to compute $F$. The equation $x'^{T}_iFx_i = 0$ and the seven correspondences allow us to write a system of equations of the form $Af = 0$, where $f$ is the vector which contains the components of the matrix $F$. Let us assume that $A$ is a 7×9 matrix of rank 7. The general solution of $Af = 0$ can be written $\alpha f_1 + (1-\alpha) f_2$ where $f_1$ and $f_2$ are two particular independent solutions of $Af = 0$. We then use the singularity constraint $det(\alpha F_1 + (1 - \alpha)F_2) = 0$ to determine $\alpha$. Since the singularity constraint gives rise to a third degree equation, we may have one or three solutions for $F$.

## OpenCV

In practice you will use the OpenCV library. In python, you have access to its functions through the `cv2` module.

You can find help with the calibration and reconstruction functions on the site https://docs.opencv.org/4.0.0/d9/d0c/group__calib3d.html

## Goal

In the zip of the statement you will find two sequences of images taken by two cameras during the scanning of an object by a laser plane.

![Laser](scanRight/scan0010.png)

You will also find shots of a checkerboard in different positions that will help you calibrate your cameras.

![Damier](chessboards/c2Right.png)

The goal is to reconstruct the scanned object in 3D.

In [2]:
import numpy as np
import cv2
import glob
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from mpl_toolkits.mplot3d import Axes3D

#Etape 1 
#Recuper les données de nos deux caméras

#Prepa
objp = np.zeros((7*7,3), np.float32) # tableau rempli de (0,0,0) * le nombre d'intersection
objp[:,:2] = np.mgrid[0:7,0:7].T.reshape(-1,2) # Crée toutes les coord des intersections du damier (numéroter en fonction colonne et rangée)

# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane. (pixels)

def getCam(images):
    criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
    # CRIT_EPS : type criteria, 30 : nb max d'iteration, 0.001 : la précision


    for fname in images:
        img = cv2.imread(fname)
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #en gris

        # F° qui trouve les coins
        ret, corners = cv2.findChessboardCorners(gray, (7,7), None) # Corners = pixels des intersections des carrés blancs et noirs de l'image 
        # If found, add object points, image points (after refining them)
        if ret == True:
            objpoints.append(objp)
            corners2 = cv2.cornerSubPix(gray, corners, (11,11), (-1,-1), criteria) # augmente la precision
            imgpoints.append(corners)

            # Draw and display the corners
            # img = cv2.drawChessboardCorners(img, (7,7), corners2, ret)
            # cv2.imshow('img',img)
            # cv2.waitKey(1000)

    # cv2.destroyAllWindows()
    
    
    ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, criteria)
    # RMS error, matrice intrinseque de la camera, Distortion, vecteur rotation, vecteur translation
    return ret, mtx, dist, rvecs, tvecs, corners2

#Load images
images = glob.glob('chessboards/c4*.png')

#Appel F°
ret, mtx, dist, rvecs, tvecs, corners = getCam(images)
 

#matrice rotation, on utilise Rodrigues pour obtenir la matrice de rotation (3x3) à partir du vecteur de rotation (3x1)
rmatRight = cv2.Rodrigues(rvecs[0])[0]
rmatLeft = cv2.Rodrigues(rvecs[1])[0]
 
 
#matrice translation -> On colle le vecteur translation a la matrice  de rotation, elle passe de 3x3 a une 3x4 (LxC)
rotMatRight = np.concatenate((rmatRight,tvecs[0]), axis=1)
rotMatLeft = np.concatenate((rmatLeft,tvecs[1]), axis=1)

 
#matrice camera (cf Cours 1) -> MatriceIntrinsèque . MatriceRota (3x4 )
camLeft = mtx @ rotMatLeft
camRight = mtx @ rotMatRight

#matrice intrinsèque 
KLeft = camLeft[:, :3] @ np.linalg.inv(rmatLeft)
KRight = camRight[:, :3] @ np.linalg.inv(rmatRight)

# Normalize intrinsic matrix K ====> Change rien ?
KLeft = KLeft / KLeft[2, 2]
KRight = KRight / KRight[2, 2]

# matrice centre de projection -> (4x1 homogene) coordonnées de la camera
posCamWorldCenterLeft = np.linalg.inv(np.concatenate((rotMatLeft,[[0,0,0,1]]), axis=0)) @ np.transpose([[0,0,0,1]])
posCamWorldCenterRight = np.linalg.inv(np.concatenate((rotMatRight,[[0,0,0,1]]), axis=0)) @ np.transpose([[0,0,0,1]])


def crossMat(v):
    # Soit                          V = [[xxx][yyyy][zzzz]]
    v = v[:,0]    # qui donne donc  V = [xxx yyyy zzzz]               
            # Et on  retourne   ([[0000  -zzzz    yyyy]
            #                   [zzzz   0000   -xxxx]
            #                   [-yyyy  xxxx    0000]])

    return np.array([[0,-v[2],v[1]] , [v[2],0,-v[0]] , [-v[1],v[0],0]])


def matFondamental(camLeft,centerRight,camRight):
    # pseudo inverse de la matrice camRight, qu'on multiplie par camLeft (= MatriceIntrinsèque @ MatriceRota)
    # qu'on multiplie ensuite par (camLeft @centerRight) qui représente ...
    # Et on fait finalement le cross product (produit vectoriel) de la matrice colonne resultat
    return np.array(crossMat(camLeft @ centerRight) @ camLeft @ np.linalg.pinv(camRight))

def matEssentiel(F, K1, K2):
    # E = K2^T * F * K1
    E = K2.T @ F @ K1
    # Normalize 
    E = E / np.linalg.norm(E)
    return E

Fondamental = matFondamental(camLeft,posCamWorldCenterRight,camRight)
Essential = matEssentiel(Fondamental, KLeft, KRight)

#print(Fondamental)
#print(np.linalg.matrix_rank(Fondamental))
#print(Essential)

In [22]:
#Etape 2

# 1. For each pixel in first image find corresponding epipolar line in second images
# 2. Examine all corresponding pixels on epipolar line and pick best match
# 3. Triangulate matches to get depth information

#Stereo rectification : on met les deux caméra à même "hauteur" => leur epilines vont coincider == scanlines
#Step 1 : Find projective transformation HLeft and HRight (3x3) such that epipoles e and e' are mapped to the infinite point [1,0,0]^T

def normalizedEpipole(F):
    # Calculate left and right epipoles
    eLeft = np.cross(F[:, 0], F[:, 1])
    eRight = np.cross(F[:, 1], F[:, 0])
    
    # Normalize epipoles
    eLeft = eLeft / np.linalg.norm(eLeft)
    eRight = eRight / np.linalg.norm(eRight) 
    return eLeft, eRight

eLeft, eRight = normalizedEpipole(Fondamental)
# print(e_left)
# print(e_right)

def projectiveTransformationH(e, R):
    # transformation projective H qui fait correspondre l'épipole normalisé e à [1,0,0]^T 
    # (un point consideré comme un point infini) avec l'utilisation de la matrice de rotation
    #UTILISE DEF CROSS-MAT
    RPrime = R @ np.array([[0, -e[2], e[1]], [e[2], 0, -e[0]], [-e[1], e[0], 0]])
    
    # Calculate translation vector t
    t = np.array([[1], [0], [0]]) - RPrime @ e
    
    # Calculate projective transformation matrix H
    H = np.hstack((RPrime, t))
    
    print(H)
    #JAI PAS DU 3x3 car je pense camLeft pas centré ??
    H = H[:, :3].reshape((3,3))

    return H

eLeftVect = np.array(eLeft)
eLeftVect = eLeftVect.transpose()
# print(eLeft)
# print(eLeftVect)
HLeft = projectiveTransformationH(eLeftVect, rmatLeft)
# print(HLeft)

def projectiveTransformationHR(H, R):
    HRight = H @ R
    return HRight

HRight = projectiveTransformationHR(HLeft, rmatLeft.T)
# print(HRight)


def applyHToLeftRightImage():
    return

#Etape 3 
#Correspondance search
def correspondanceSearch():
    #4techniques differentes
    return

#Etape 4 
#Depth from disparity
def depth():

    # depth = z = (DistanceBaseline*f)/(x-x')
    # Avec f = distance camera centre avec plan image
    return

[[ 1.42146707e-02  2.13881367e-01  8.38852043e-03  1.00000000e+00
   1.00000000e+00  1.00000000e+00]
 [ 3.48622151e-03  5.40045872e-02 -9.98529146e-01 -5.13797982e-20
  -6.04440308e-20 -1.06046087e-17]
 [ 6.46749350e-02  9.73109047e-01  5.35646020e-02 -5.13797982e-20
  -6.04440308e-20 -1.06046087e-17]]
[[ 0.01421467  0.21388137  0.00838852]
 [ 0.00348622  0.05400459 -0.99852915]
 [ 0.06467493  0.97310905  0.0535646 ]]
[[ 8.15988828e-19  2.14491883e-01 -3.30079004e-03]
 [-2.14491883e-01 -1.94573466e-18 -9.76720194e-01]
 [ 3.30079004e-03  9.76720194e-01  2.29849170e-19]]
