# 1 Pinhole Camera

The pinhole camera model is a mathematical abstraction used in computer vision and photogrammetry to describe how a 3D world point is projected onto a 2D image plane. It assumes an idealized camera with no lens distortion, where light rays pass through a single point, known as the camera center or optical center, before hitting the image plane.

This model simplifies the imaging process by ignoring lens effects such as distortion, focusing only on the geometry of projection. The key components of the pinhole camera include:

  * **Intrinsic parameters**: Define the internal characteristics of the camera, such as focal length, skew, and the principal point.
    * These are represented by the camera calibration matrix $K$.
  * **Projection process**: Maps 3D points from world or camera space to 2D image coordinates.
  * **Normalized camera space**: A simplified, unitless coordinate system used for geometric reasoning.

This section explores both forward projection, which describes how 3D points are projected into the image plane, and backward projection, which traces 2D image points back into 3D normalized space. Additionally, the relationship between physical camera parameters (e.g., focal length in millimeters) and their pixel-based counterparts is discussed.

## 1.1 Forward Projection

In forward projection, the camera calibration matrix $K$ maps a 3D point $X$, defined in camera coordinates, into a 2D image coordinates $x$ (homogeneous). This process can be expressed as

$$x = K \cdot X $$

, where 

$$ K = 
\begin{bmatrix}
f_x & s & p_x \\
0 & f_y & p_y \\
0 & 0 & 1 \\
\end{bmatrix}
$$

The parameters $f_x$ and $f_y$ represent the horizontal and vertical focal lengths, $p_x$ and $p_y$ define the optical axis, also known as the principal point, and $s$ is the skew factor. Typically we expect the pixels not to be slanted, so we can set $s=0$, thus obtaining

$$ K = 
\begin{bmatrix}
f_x & 0 & p_x \\
0 & f_y & p_y \\
0 & 0 & 1 \\
\end{bmatrix}
$$

## 1.2 Backward Projection

The backward projection involves mapping image coordinates $x$ back into normalized camera space by using $K^{-1}$ as follows

$$
X_{unit} = K^{-1} \cdot x
$$

This process undoes the intrinsic effects of the camera, bringing the image point back to a normalized 3D coordinate system centered at the camera, where:

  * The focal length is 1
  * The principal point is at the origin
  * The result is expressed in a unitless, normalized space 

The result $X_{unit}$ represents a **direction vector** in the normalized camera space.

## 1.3 Metric vs. Pixel Space in Camera Calibration

The physical camera quatities like focal length and sensor size are measured in metric units (e.g. millimeters). For example:

  * The focal length f of a lens might be 50mm, and the sensor dimensions might be a few millimeters wide and tall

However, in most applications, we are concerned with projecting 3D world coordinates into 2D image coordinates measured in pixels, or back-projecting 2D image coordinates, expressed in pixels, into 3D coordinates. Consequently, the camera calibration matrix $K$ is expressed in pixels units.

The relationship between metric and pixel units is defined using the **pixel pitch**, which is the physical size of a single pixel on the sensor. For example, if the sensor's pixel pitch is $p$ (in mm/pixel), the focal length in pixels can be computed as

$$
f_{pixels} = \dfrac{f_{mm}}{p}
$$

Most camera calibration software express the camera calibration matrix $K$ in pixel units and thus the physical size of the sensor, or pixel pitch, is not required during calibration.

## 1.4 Relationship Between Focal Length and Field of View

The focal length is directly related to the field of view (FoV), as illustrated in Figure 1.

<figure align="center">
    <img src="./images/focal_length_vs_hfov.png" width="400">
    <figcaption>Figure 1: Focal length vs. field of view.</figcaption>
</figure>

The relationship between focal length and field of view is given by

$$
tan \left(\dfrac{\theta}{2} \right) = \dfrac{w/2}{f}
$$

, where 

* $\theta$ is the field of view
* $w$ is the width of the image plane
* $f$ if the focal length

Both $w$ and $f$ can be expressed either in pixels or metric units, depending on the context.