# Epipolar Geometry: Understanding 3D from Multiple Views

In this set of lecture notes, we delve into the fascinating world of epipolar geometry, a crucial concept in computer vision and 3D scene reconstruction. Epipolar geometry plays a pivotal role in understanding the relationships between multiple cameras, 3D points, and their 2D projections.

## The Challenge of Ambiguity

![Ambiguity in 3D Scene Interpretation](images/figure1_eg.png)

Before we delve into epipolar geometry, it's essential to acknowledge the intrinsic ambiguity in mapping 3D scenes to 2D images. Figure 1 illustrates how a single image can lead to ambiguous interpretations of the 3D world. Multiple views of the same scene become invaluable in resolving these ambiguities.

## Epipolar Geometry

![Epipolar Geometry](images/figure2_eg.png)
**Epipolar geometry** is the key to unraveling these ambiguities when multiple cameras are involved. Figure 2 provides an overview of the essential elements of epipolar geometry. It involves two cameras observing the same 3D point, with their respective image plane projections labeled `p` and `p'`. The baseline connects the camera centers `O1` and `O2`, defining the **epipolar plane**. The points where the baseline intersects the image planes are the **epipoles** (`e` and `e'`). The lines formed by the intersection of the epipolar plane and the image planes are the **epipolar lines**.

## Parallel Image Planes

![Parallel Image Planes](images/figure3_eg.png)
In some scenarios, the image planes may be parallel, as shown in Figure 3. In such cases, the epipoles `e` and `e'` will be located at infinity. Importantly, the epipolar lines become parallel to one of the axes of each image plane. This parallel configuration has practical implications, particularly in image rectification.

## Leveraging Epipolar Geometry

In real-world situations, we typically know the camera locations, orientations, and camera matrices, but not the exact 3D location `P`. Using this knowledge, we can define the epipolar plane and determine the epipolar lines. By understanding epipolar geometry, we can establish a strong constraint between pairs of images without knowing the full 3D scene structure.

## Mapping Points and Epipolar Lines

To map points and epipolar lines across views, we use camera projection matrices `M` and `M'`. These matrices map 3D points to their respective 2D image plane locations. Assuming the world reference system is associated with the first camera, and the second camera is offset by a rotation `R` and translation `T`, we can define the camera projection matrices as:

```
M = K [I | 0]
M' = K' [R | T]
```

This setup allows us to map points and epipolar lines seamlessly across different views, contributing to a deeper understanding of the 3D world from multiple camera perspectives.

Epipolar geometry is a fundamental concept in computer vision, enabling us to resolve ambiguities and extract valuable information about 3D scenes from 2D images. These lecture notes lay the foundation for further exploration of epipolar geometry's applications and implications in the field of computer vision.

## The Essential Matrix: Unveiling the Epipolar Constraint

In the context of canonical cameras where both camera matrices are identity matrices (`K = K' = I`), the camera projection matrices simplify to:

```
M = [I | 0]
M' = [R | T]
```

This simplification implies that the location of `p'` in the reference system of the first camera is `Rp' + T`. Since `Rp' + T` and `T` both lie in the epipolar plane, their cross product, `T × (Rp')`, gives a vector that is normal to the epipolar plane. Therefore, `p`, which also lies in the epipolar plane, must be normal to `T × (Rp')`. This relationship can be expressed as the dot product:

```
pᵀ · [T × (Rp')] = 0
```

Utilizing a compact expression for the cross product, represented as `[T×]`, this equation becomes:

```
pᵀ [T×] Rp' = 0
```

The matrix `E = [T×]R` is termed the **Essential Matrix**, leading to a concise expression of the epipolar constraint:

```
pᵀEp' = 0
```

The Essential Matrix is a 3x3 matrix with 5 degrees of freedom, having a rank of 2 and being singular.

The Essential Matrix plays a vital role in computing the epipolar lines corresponding to `p` and `p'`. Specifically, `l' = Eᵀp` provides the epipolar line in the image plane of camera 2, while `l = Ep'` offers the epipolar line in the image plane of camera 1. Moreover, the Essential Matrix satisfies the property that its dot product with the epipoles equals zero: `Eᵀe = Ee' = 0`. This property arises because, for any point `x` (excluding the epipole `e`) in the image of camera 1, the corresponding epipolar line in the image of camera 2, denoted as `l' = Eᵀx`, contains the epipole `e'`. Thus, `e'` satisfies `e'ᵀ(Eᵀx) = (e'ᵀEᵀ)x = 0` for all `x`, resulting in `Ee' = 0`. Similarly, `Eᵀe = 0`.

The Essential Matrix serves as a foundational component in epipolar geometry, enabling the derivation of epipolar constraints and facilitating the understanding of correspondences between points and epipolar lines across two cameras. Its compact representation encapsulates essential geometric relationships in multiple view geometry.

## The Fundamental Matrix: Unveiling Correspondences Across Multiple Views

In scenarios where cameras are not necessarily canonical, the projection matrices `M` and `M'` are defined as:

```
M = K [I | 0]
M' = K' [R | T]
```

To derive a more general expression, consider `pc = K⁻¹p` and `p'c = K'⁻¹p'` as the projections of point `P` onto the image planes of the corresponding cameras if they were canonical. In the canonical case, we had the epipolar constraint:

```
pᵀc [T×]Rp'c = 0
```

Substituting `pc` and `p'c`, we get:

```
pᵀ K⁻ᵀ [T×]RK'⁻¹p' = 0
```

The matrix `F = K'⁻ᵀ[T×]RK⁻¹` is known as the **Fundamental Matrix**, similar to the Essential Matrix discussed earlier. However, the Fundamental Matrix additionally encodes information about the camera matrices `K`, `K'`, and the relative transformation `R` and `T` between the cameras. It is also instrumental in computing the epipolar lines associated with `p` and `p'` when the camera matrices and transformation are unknown. Notably, the Fundamental Matrix contains 7 degrees of freedom, unlike the Essential Matrix, which has 5 degrees of freedom.

The Fundamental Matrix serves as a bridge between corresponding points in two images. If we know the Fundamental Matrix, we can easily establish a constraint (the epipolar line) for a point in one image and its corresponding point in the other image, even without knowing the actual 3D structure or the camera parameters.

### The Eight-Point Algorithm

To estimate the Fundamental Matrix, we often rely on the Eight-Point Algorithm, assuming we have at least 8 pairs of corresponding points between two images. Each correspondence `(pi, p'i)` provides an epipolar constraint, which can be expressed as a system of homogeneous equations. These equations can be compactly represented as:

```
Wf = 0
```

Where `W` is an `N x 9` matrix derived from `N ≥ 8` correspondences, and `f` represents the values of the Fundamental matrix we seek.

To find the solution, we employ Singular Value Decomposition (SVD) since `W` is rank-deficient. SVD yields an estimate of the Fundamental matrix `Fˆ`, which may have full rank. However, we know that the true Fundamental matrix has rank 2. Therefore, we aim to find the best rank-2 approximation of `Fˆ`. This is achieved by solving the optimization problem:

```
minimize ||F - Fˆ||F
subject to det(F) = 0
```

This problem can be solved using SVD once more. The resulting `F` provides a rank-2 approximation of the Fundamental Matrix.

### The Normalized Eight-Point Algorithm

In practice, the standard Eight-Point Algorithm may suffer from imprecisions, especially when the correspondences involve points from different regions of the image. This issue arises due to the ill-conditioned nature of the matrix `W`. To mitigate this problem, the Normalized Eight-Point Algorithm is often employed.

Normalization involves translating and scaling the image coordinates so that they meet two criteria: (1) the origin of the new coordinate system is at the centroid of the image points, and (2) the mean square distance of the transformed image points from the origin is 2 pixels.

By normalizing the coordinates, constructing `W`, and subsequently denormalizing the computed `F`, the Normalized Eight-Point Algorithm can significantly improve the accuracy of the estimated Fundamental Matrix in real-world applications.

### References
Figures 1, 2, 3: Standford CS231A notes on epipolar geometry