# "Stereo Vision - PART 2"

> "Calibration and Summary"

- toc: true
- branch: master
- badges: false
- comments: true
- categories: [Computer Vision]
- hide: false
- search_exclude: false
- image: images/post-thumbnails/sv2.png
- metadata_key1: notes
- metadata_key2: 


# Calibration of the Stereo

In the section above we assumed the stereo is calibrated that means we know the how they are aligned with respect to each other. 

Suppose we take a photo of effiel tower on a iphone. Then another person take a same photo with a slight different angle from samsung android phone. Is it possible to compute Z depth information and hence reocover the 3d structure of the image?  The answer turns out to be yes. 

![](https://abhisheksreesaila.github.io/blog/images/stereo/Uncalibrated-stereo.png "Uncalibrated Stereo")

Every digital camera embeds certain metadata within the image such as the focal length etc. which can be read as internal parameters. All we need to compute are the external parameters.


In practice if there are 2 camera taking a shot at the same picture at 2 different angles, if we know the internal parameters of each camera, then we can calculate the alignment ourselves and hence compute the depth.  that is what we will explore in this section

Consider the above picture.  It is identical to the one in the earlier section except that left and right cameras have their own coordinate system $(x_l,y_l, z_l)$ and $ (x_r, y_r, z_r)$ respectively. 

Our goal is to compute the "translation" and "rotation" of one camera w.r.t the other.



## Epipolar Geometry

![](https://abhisheksreesaila.github.io/blog/images/stereo/epipolar_geo.png "Epipolar Plane")


- The highlighted triangle is "Epipolar Plane". Its the plane formed by the scene point (P) and camera origins $ o_l $  and $ o_r $ is called epipolar plane

- $e_l$ and $e_r$ are the projection of camera's origin on the left and right image planes respectively.  They are also called epipoles

- Every scene point will have it own epipolar plane.


## Now why do we care about epipolar geometry?

> Our goal is to find a equation such that we can calculate t, R (translation, Rotation)

![](https://abhisheksreesaila.github.io/blog/images/stereo/epipolar_cons.png "Epipolar Constraint")



### Epipolar Constraint

Consider a vector perpendicular to $X_l$ (highlighted in pink). Lets call it N

From linear algebra,
 - N = Cross Product between t and $X_l$
 - N = t X $X_l$....(1)
 Also, 
- $X_l$ * N = 0 (dot product of N and $X_l$ is 0).....(2)

Hence from (1) and (2)

(t X $X_l$) * $X_l$ = 0  

> This is the epipolar constraint. 

$X_l$ is a vector composed of elements $(x_l, y_l, z_l)$ and $x_l = R x_r + t$ (from the perspective projection)
Where t = position of right camera w.r.t to left; R = orientation of right camera w.r.t to left. At the end you will end up with 

> $X_l$ E $X_r$ = 0   ...(1)
 
-  E is a 3x3 matrix called the Essential Matrix 
 
 But we notice $X_l$ and $X_r$ stil exists! Our goal is to find these values.  So using perspective projection, 
 
$ u = f_x * x_l/z_l + O_x $  ;   $ v = f_y * y_l/z_l + O_y $  Where $ f_x and f_x $ are focal lengths measured in pixels
 
Substituting for $x_l$ in equation (1) and expressing in matrix form, we get rid of $x_l$  and $y_l$. but $z_l$ remains!  But $z_l$ can never be 0, since it the depth. In common man terms, the world exists infront of the camera, so world coordinate will have some value of "z", hence z <> 0. Using these concepts we arrive at 

> $U_l  K^{-1}_l E K^{-1}_r U_r$ = 0  

> $U_l$ F $U_r$ = 0   

where $U_l$ =  $[u_l, v_l, 1]$ 

and $U_r$ = $$ \begin{bmatrix} u_r \\  v_r \\  1 \end{bmatrix} $$

Where F is called fundamental matrix. I have intentionally skipped the math but for those mathematically inclined check out explanation [here](https://www.youtube.com/watch?v=6kpBqfgSPRc)
 
###  How does this work in practice?


1. Suppose we are given the "F" matrix, we can easily get "E" since we "K" is given to us


2. Once you get E from step 1, then a technique called "[singular value decomposition](https://keisan.casio.com/exec/system/15076953160460)" we can decompose it into "t" and "R"


## Finding correspondence

In the previous sectionn, we said given a point $u_l, v_l$ finding a matching point $u_r v_r$ is a 1D search problem i.e. we have to search only in 1 direction, horizontally. But wait! where?  Can epipolar geometry help in the telling me the section the image to search? 

Fortunately the answer is yes! There is only other component of EPIPOLAR geometry to the rescue! Epipolar line.

Imagine looking at the second camera origin, all the points on the $X_l$ will fall on the image plane as shown in red. also, this red line will intersect with the epipolar plane. This intersection is the epipolar line. In other words, The projection of all the points on the vector $X_l$ will lie on a line called EPIPOLAR line.

![](https://abhisheksreesaila.github.io/blog/images/stereo/epipolar-line.png "Epipolar line")

![](https://abhisheksreesaila.github.io/blog/images/stereo/epipolarline2.png [source](https://en.wikipedia.org/wiki/Epipolar_geometry))


![](https://abhisheksreesaila.github.io/blog/images/stereo/epipolar-formation.gif "Epipolar line Formation- Animated")


From the last section, we have

> $U_l$ F $U_r$ = 0  

Expanding..



$$


\begin{bmatrix} u_{l} \\  v_{l} \\  1  \end{bmatrix}
 
\begin{bmatrix} f_{11} & f_{12} & f_{13} \\  f_{21} & f_{22} & f_{23}  \\  f_{31} & f_{32} & f_{33}  \end{bmatrix}

\begin{bmatrix} u_r \\  v_r \\  1 \end{bmatrix}

=  0

$$

Multiplying...

$(f_{11}u_l +f_{12}v_l + f_{31})u_r + (f_{21}u_l +f_{22}v_l + f_{32})v_r + (f_{13}u_l +f_{23}v_l + f_{33})= 0 $

Simplified to...

$Au_r + Bv_r + C = 0$ is a simple linear equation of the line that has all the projection points of $X_l$

> This is the equation for the epipolar line


## Summary

1. Assume camera matrix K is known
2. Find a few correspondence between 2 images (using SIFT, ORB or by hand). A minimum of 8 corresponding points are sufficient. 
3. Using points from step #2, find the t and R matrices. At this point it is said to be calibrated. 
4. Now that cameras are calibrated, for every point in one image we can find a corresponding point. Turns out it is a 1D search problem along the epipolar line.
5. Compute depth by Traingulation. 

# References

[Stereo Vision](https://www.youtube.com/watch?v=hUVyDabn1Mg)

