# Single View Metrology

In this lecture, we will explore the intriguing world of computer vision, focusing on the concept of recovering three-dimensional (3D) structure from a single 2D image. This process involves leveraging our understanding of the intrinsic and extrinsic properties of cameras, allowing us to extract valuable information about the 3D world from the 2D perspective captured by the camera. 

We will begin by discussing various transformations in 2D space, such as isometric, similarity, affine, and projective transformations. These transformations serve as the mathematical foundation for understanding how points and objects in the 3D world are projected onto a 2D image. By comprehending these transformations, we can better grasp the potential information that can be gleaned from a single image.

## Transformations in 2D

### Isometric Transformations

Isometric transformations are the transformations that preserve distances. The most basic form of an isometry involves a combination of rotation (R) and translation (t). In mathematical terms, an isometric transformation can be represented as:

```
[x']   [R t]   [x]
[y'] = [0 1] * [y]
[1 ]   [0 0]   [1]
```

Here, `(x', y', 1)` represents the point achieved after the isometric transformation, where `R` is a rotation matrix and `t` is a translation vector.

### Similarity Transformations

Similarity transformations extend isometric transformations by introducing scaling. These transformations preserve shape, including the ratio of lengths and angles. Mathematically, similarity transformations can be denoted as:

```
[x']   [S R t]   [x]
[y'] = [0 0 1] * [y]
[1 ]   [0 0 0]   [1]
```

The matrix `S` represents scaling, which maintains the similarity of shapes. Isometric transformations are a specific case of similarity transformations when the scaling factor `s` is equal to 1.

### Affine Transformations

Affine transformations preserve points, straight lines, and parallelism. These transformations are represented as:

```
[x']   [A t]   [x]
[y'] = [0 1] * [y]
[1 ]   [0 0]   [1]
```

Here, `A` is a linear transformation, and `t` is a translation vector. Affine transformations are more general than similarity transformations as they do not require equal scaling in all dimensions.

### Projective Transformations (Homographies)

Projective transformations, or homographies, are even more general transformations that map lines to lines but do not necessarily preserve parallelism. In homogeneous coordinates, projective transformations are represented as:

```
[x']   [A t]   [x]
[y'] = [v b] * [y]
[1 ]   [0 1]   [1]
```

These transformations introduce additional degrees of freedom with the inclusion of the vector `v`. Despite not preserving parallelism, projective transformations maintain collinearity of points, making them suitable for mapping lines to lines. 

It's worth noting that the cross ratio of four collinear points remains invariant under projective transformations. The cross ratio is computed as shown above and serves as a useful tool for characterizing projective transformations.

In the subsequent sections, we will delve deeper into the application of these transformations and explore how they enable us to recover 3D structure from 2D images, furthering our understanding of the world of computer vision.

## Points and Lines at Infinity

Understanding the concepts of points and lines at infinity is crucial in the field of computer vision as they play a fundamental role in various transformations and projective geometry. In this section, we will explore the definitions and properties of points and lines at infinity.

### Lines in 2D

In 2D, a line can be represented using homogeneous coordinates as a vector `l = [a, b, c]ᵀ`. This vector defines the line as:

```
ax + by + c = 0
```

Here, `-a/b` captures the slope of the line, and `-c/b` represents the y-intercept. In other words, the coefficients `a`, `b`, and `c` encode essential information about the line.

#### Intersection of Lines

Two lines `l` and `l'` intersect at a point `x`. The intersection point `x` can be defined as the cross product between the vectors `l` and `l'`. Mathematically, if `x = l × l'`, then `x` is orthogonal to both `l` and `l'`, satisfying the intersection constraints.

#### Parallel Lines and Points at Infinity

When dealing with parallel lines, the conventional understanding is that these lines never intersect. However, in the framework of homogeneous coordinates, we can redefine this scenario as the intersection of lines at infinity. 

In homogeneous coordinates, a point at infinity is represented as `[x, y, 0]ᵀ`. To obtain Euclidean coordinates, we divide all coordinates by the last coordinate, which in this case is zero. This division results in a point at infinity.

Let's consider two parallel lines `l` and `l'`. In homogeneous coordinates, if we compute the point of intersection using the cross product, we find that:

```
l × l' ∝ [0 0 -1]ᵀ
```

This vector `[0 0 -1]ᵀ` corresponds to a point at infinity. Therefore, the intuition that two parallel lines intersect at infinity is confirmed.

Furthermore, all parallel lines with the same slope `-a/b` pass through the same ideal point at infinity. This property is mathematically expressed as:

```
lᵀx∞ = [a b c]ᵀ[x, y, 0] = 0
```

### Lines and Points at Infinity in Projective Transformations

Projective transformations, represented by the matrix `H`, are fundamental in computer vision. When applying a projective transformation to a point at infinity `p∞`, the transformed point `p'` is calculated as:

```
p' = Hp∞ = [v b 1]ᵀ
```

Notice that the last element of `p'` may become non-zero, indicating that a projective transformation generally maps points at infinity to points that are no longer at infinity.

However, this is not the case for affine transformations. Affine transformations, represented by matrices `[A t; 0 1]`, map points at infinity to points that remain at infinity:

```
p' = Hp∞ = [A t; 0 1][x, y, 0]ᵀ = [A t; 0 1][x∞, y∞, 0]ᵀ = [A t; 0 1][0, 0, 0]ᵀ = [0 0 0 1]ᵀ
```

This demonstrates that affine transformations preserve points at infinity.

Now, let's consider the projective transformation of a line `l` to obtain a new line `l'`. In projective geometry, lines are still mapped to lines. The projective transformation is expressed as `l' = H⁻ᵀl`. 

Similar to points at infinity, the projective transformation of a line at infinity does not necessarily map to another line at infinity. The concept of lines at infinity becomes a valuable tool in computer vision, allowing us to represent parallel lines' intersection points and understand how different transformations affect points and lines in both Euclidean and homogeneous coordinates.

## Vanishing Points and Lines in 3D

In the realm of 3D geometry and computer vision, the concepts of vanishing points and lines play a pivotal role in understanding the relationships between 3D space and 2D images. In this section, we delve into the definitions and properties of vanishing points and lines in 3D, which are fundamental for tasks such as camera calibration and structure from motion.

### Planes in 3D

In three-dimensional space, planes are introduced as essential geometric entities. A plane can be represented as a vector `[a, b, c, d]ᵀ`, where `[a, b, c]` forms a normal vector to the plane, and `d` represents the distance from the origin to the plane along the direction of the normal vector. Formally, a plane is defined by the equation:

```
ax + by + cz + d = 0
```

Here, `(a, b, c)` is a unit vector pointing outward from the plane, and `d` determines the plane's position relative to the origin.

### Lines in 3D

Lines in three dimensions are more complex to represent than in 2D, as they have four degrees of freedom. These degrees of freedom encompass an intercept location and slopes in each of the three dimensions. Representing lines in 3D space requires additional mathematical complexity, which is beyond the scope of this discussion.

### Points at Infinity in 3D

Points at infinity in three dimensions are defined similarly to their 2D counterparts. They are the intersection points of parallel lines in 3D space. However, when a projective transformation is applied to one of these points at infinity `x∞`, it results in a point `p∞` in the image plane that is no longer at infinity in homogeneous coordinates. This transformed point `p∞` is termed a **vanishing point**. Vanishing points serve as valuable tools in computer vision and are used to understand the relationships between 3D geometry and 2D images.

### Relationship Between Vanishing Points and Camera Parameters

A critical relationship exists between parallel lines in 3D, their corresponding vanishing points in an image, and the camera parameters `K` (intrinsic matrix), `R` (rotation matrix), and `T` (translation vector). Specifically, if we define `d` as the direction of a set of 3D parallel lines in the camera's reference system, then these lines intersect at a point at infinity, and the projection of this point in the image results in the vanishing point `v`, given by:

```
v = Kd
```

This equation allows us to relate the vanishing points in an image to the directions of parallel lines in 3D space.

### Horizon Lines and Plane Orientation

The concept of a **horizon line** arises from vanishing points and lines. The horizon line is the projective transformation of the line at infinity (`l∞`) onto the image plane. It passes through the corresponding vanishing points in the image. The horizon line provides an intuitive way to understand properties of the image that may not be immediately apparent mathematically. For example, it helps us recognize that lines on the ground, even if they are not parallel in image coordinates, are indeed parallel in the 3D world.

Moreover, the horizon line enables us to compute valuable information about the orientation of planes in 3D space. Given a plane with normal vector `n` in 3D and its associated horizon line `lhoriz` in an image, we can establish the following relationship:

```
n = KT lhoriz
```

This equation allows us to estimate the orientation of a plane in 3D space if we can identify the horizon line associated with it in an image.

### Angle Between Lines and Planes

The relationships between vanishing points and lines extend to angles between lines and planes in 3D. Suppose we have two pairs of parallel lines in 3D with directions `d1` and `d2`, associated with vanishing points `v1` and `v2`. The angle `θ` between `d1` and `d2` can be determined using the cosine rule:

```
cosθ = d1·d2 / (||d1|| * ||d2||) = pv1T ωv1 * pv2T ωv2
```

Here, `ω = (KKT)⁻¹`.

We can also apply this concept to compute the angle `θ` between two planes by utilizing the angles between their associated normal vectors and vanishing lines:

```
cosθ = n1·n2 / (||n1|| * ||n2||) = lT1 ω⁻¹l2
```

These relationships between angles, lines, planes, and vanishing points enable us to extract valuable geometric information from 3D scenes and their 2D representations, contributing significantly to computer vision and image understanding.

## Estimating Camera Parameters from Vanishing Points

In this section, we explore a practical example of single-view metrology, demonstrating how we can estimate camera parameters from vanishing points in an image. This technique is valuable in computer vision and 3D scene reconstruction when we have limited information.

### Example Setup

![Example Setup](images/example_svm1.png)

Consider an example setup depicted in the figure above, where we have two perpendicular planes within a 3D scene. We can identify parallel lines on each of these planes in an image, allowing us to estimate two vanishing points, denoted as `v1` and `v2`. Importantly, we know that these two planes are mutually perpendicular in 3D.

### Using Vanishing Points

From Equation 12, we know that the angle between the directions of `v1` and `v2` can be expressed as `v1ωv2 = 0`. However, the matrix `ω` depends on the camera matrix `K`, which may be unknown at this point. This equation provides only one constraint, and estimating `K` requires five degrees of freedom. Therefore, having just `v1` and `v2` is insufficient for a complete estimation of the camera parameters.

### Adding More Vanishing Points

![Adding More Vanishing Points](images/example_svm2.png)

To overcome this limitation, let's assume that we can find another vanishing point `v3` corresponding to another mutually orthogonal plane, as shown in the figure. Now, we have three pairs of vanishing points: `v1` with `v2`, `v1` with `v3`, and `v2` with `v3`. Each of these pairs provides a constraint, resulting in three constraints in total.

### Leveraging Assumptions

To fully estimate the camera matrix `K`, we need a total of five constraints. However, we can take advantage of common assumptions in camera calibration: zero skew and square pixels. With these assumptions, we can add two more constraints, reaching the required five constraints.

### Solving for ω

Given the five constraints on `ω`, we can solve for it. While there are four variables in the definition of `ω`, we can only determine it up to scale. Therefore, we effectively have three variables, which can be solved. Once we obtain `ω`, we can proceed to compute the camera matrix `K`.

### Camera Calibration and 3D Reconstruction

With `K` known, we have successfully calibrated the camera using only a single image. Once we have `K`, we can perform various tasks in 3D reconstruction and scene understanding. For instance, we can compute the orientation of the planes identified in the scene. This illustrates the power of single-view metrology, where a single image can provide a wealth of information about the captured scene, including camera parameters and 3D geometry.

This example showcases how geometric principles and common assumptions in camera calibration can enable us to extract valuable information from a single image, facilitating the understanding of 3D scenes from 2D representations.