# Computer Vision Laboratory - Volvo Car Ghent 
### Training material for project : Quality Vision Inspection VCG Final Assembly


# Table of Contents

1. **Introduction to Camera Geometry**
   - 1.1. What is a Camera?
   - 1.2. Central Projection
     - 1.2.1. Optical Center
     - 1.2.2. Focal Plane
     - 1.2.3. World Point to Image Point
     - 1.2.4. Important Concepts
   - 1.3. Homogeneous Coordinates
     - 1.3.1. Converting to Cartesian Coordinates
     - 1.3.2. Why Use Homogeneous Coordinates?
     - 1.3.3. Scaling in Homogeneous Coordinates

2. **Projection Matrix**
   - 2.1. Simple Projection Matrix
   - 2.2. Generalized Projection (Camera Rotation and Translation)

3. **Lesson 2: Exercises with matrices**
   - 4.1. Setting up a 2D cube
   - 4.2. Homework


## 1. What is a Camera : 
#### Camera Geometry
A camera captures the 3D world around us (called **world space**) and transforms it into a 2D image (called **image space**). In computer vision, one of the foundational tasks is understanding how a camera captures the 3D world and translates it into a 2D image. This process is governed by **camera geometry**, which involves projecting points in a 3D space onto a 2D image plane. By learning this, we can build systems that extract useful information from images, such as object locations, distances, and orientations.

A camera is essentially a device that captures light from the 3D world and projects it onto a 2D surface (the camera's sensor or film) to create an image. While modern digital cameras are sophisticated pieces of technology, their fundamental behavior can be described using basic geometric principles. We can model a camera mathematically to better understand how it forms an image from the surrounding world.

## 2. Central Projection: 
#### How a Camera Projects 3D Points onto a 2D Image

The basic mathematical model of a camera involves a process called **central projection**. Here's an intuitive breakdown:

1. **Imagine a Pinhole Camera**: 
   Think of a simple pinhole camera (a box with a tiny hole). When light from objects in the scene enters through the pinhole, it projects onto a flat surface (like film or a digital sensor) inside the box, forming an image. The pinhole acts like the camera’s **optical center**.

2. **Optical Center**: 
   The **optical center** is the point where all light rays entering the camera converge. It’s the origin of the camera’s coordinate system. In a real camera, this corresponds to the point where light rays bend through the camera lens.

3. **Focal Plane**:
   The flat surface where the image forms is called the **focal plane**. It lies at a fixed distance, $f$, from the optical center along the **z-axis**. This distance $f$ is known as the **focal length**, which controls how much the camera zooms in or out on the scene.

4. **World Point to Image Point**:
   Consider a 3D point in the world, represented as $\vec{X} = (x, y, z)^T$. This point is in what we call **world space**—the space outside the camera. To create an image, we need to project this 3D point onto the 2D **image plane** (which lies in **image space**).

   To do this, we imagine a line passing through the point $\vec{X}$ and the optical center. The point where this line intersects the image plane is the **image point** $\vec{x}$. This is the point where the 3D object gets projected onto the image sensor or film.

5. **Mathematical Formulation**:
   The relationship between the 3D point and its corresponding 2D image point is given by:

   $$\vec{x} = \left( \frac{fx}{z}, \frac{fy}{z}, f \right)^T$$

   Here’s what this means:

   - $x$ and $y$ are the 3D coordinates of the world point.
   - $z$ is the depth of the point (i.e., how far the point is from the camera along the $z$-axis).
   - $f$ is the focal length of the camera.
   
   The $x$ and $y$ coordinates in the image space are scaled by the focal length and the depth ($z$) of the world point. This means that the farther away an object is from the camera (larger $z$), the smaller it appears in the image.

6. **Why Drop the Third Component?**:
   The image is a 2D representation, so we ignore the third component, $f$. Therefore, the image point is:

   $$\left(\begin{array}{c} x \\ y \\ z \end{array}\right) \rightarrow \left(\begin{array}{c} \frac{fx}{z} \\ \frac{fy}{z} \end{array}\right)$$

   This gives us the coordinates of the point in the 2D image.

#### Important Concepts:

- **Principal Axis**: The line perpendicular to the image plane that passes through the optical center. This is often aligned with the $z$-axis in the camera's coordinate system.
- **Principal Point**: The point on the image plane where the principal axis intersects the plane.
- **Principal Plane**: A plane parallel to the image plane that passes through the optical center.

![Central Projection](../Labo_Computer_Vision/images/CameraRealXYZ.jpg "Simple projection")
![Central Projection](../Labo_Computer_Vision/images/Projection-from-world-coordinate-to-image-plane-Then-the-3D-camera-coordinates-can-be.png "Coordinates, planes and axis projection") 
---

### Homogeneous Coordinates: Simplifying Transformations

In computer vision, we often use **homogeneous coordinates** to make transformations like projection easier to handle mathematically. Homogeneous coordinates introduce an extra dimension to each point, which allows us to represent transformations (like scaling and projection) using simple matrix multiplication.

1. **What are Homogeneous Coordinates?**:
   A 3D point in homogeneous coordinates is represented as a 4D vector: $(X, Y, Z, T)^T$. The extra dimension, $T$, allows us to perform transformations more conveniently.

2. **Converting to Cartesian Coordinates**:
   To convert from homogeneous coordinates back to the usual Cartesian coordinates, we divide each component by the last component $T$:

   $$\begin{pmatrix}X \\ Y \\ Z \\ T\end{pmatrix} \rightarrow \begin{pmatrix}X/T \\ Y/T \\ Z/T\end{pmatrix}$$

   For example, if a point is represented as $(10, 20, 30, 2)^T$ in homogeneous coordinates, its Cartesian coordinates are $(5, 10, 15)^T$ after dividing by 2.

3. **Why Use Homogeneous Coordinates?**:
   Homogeneous coordinates are useful because they allow us to represent translations and projections as matrix operations. This makes it much easier to perform these transformations when working with computer vision algorithms.

4. **Scaling in Homogeneous Coordinates**:
   Another important property of homogeneous coordinates is that they are **defined up to a scalar factor**. In other words, multiplying all components of a homogeneous vector by a constant $k$ doesn’t change the point it represents:

   $$\begin{pmatrix}X \\ Y \\ Z \\ T\end{pmatrix} \Leftrightarrow \begin{pmatrix}kX \\ kY \\ kZ \\ kT\end{pmatrix}$$

   This means that $(10, 20, 30, 2)^T$ and $(5, 10, 15, 1)^T$ represent the same point in Cartesian coordinates.

---

### Projection Matrix: Transforming 3D Points to 2D Image Points

Now that we have a better understanding of homogeneous coordinates, we can express the projection of 3D points onto a 2D plane using a **projection matrix**. This is a convenient way to represent the transformation that occurs when a 3D point in world space is projected onto the 2D image plane.

1. **Simple Projection Matrix**:
   In the simplest case, where the camera is aligned with the world axes and positioned at the origin, the projection matrix looks like this:

   $$\begin{equation}
   \left(\begin{array}{c} x \\ y \\ z \\ 1 \end{array}\right) \rightarrow \left(\begin{array}{c} fx \\ fy \\ z \end{array}\right) = \left(\begin{array}{cccc} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end{array}\right) \left(\begin{array}{c} x \\ y \\ z \\ 1 \end{array}\right)
   \end{equation}$$

   - The left matrix is the **projection matrix**, which performs the transformation.
   - The right column vector represents the 3D point in homogeneous coordinates.

   This projection matrix applies the following operations:
   
   - The $x$ and $y$ coordinates are scaled by the focal length $f$.
   - The $z$ coordinate remains unchanged.
   
   As a result, the 3D point is projected onto the 2D image plane.

---

### Generalized Projection: Dealing with Camera Rotation and Translation

In real-world situations, a camera is not always perfectly aligned with the world coordinate axes. To handle cases where the camera is rotated or positioned away from the origin, we use a more general form of the projection matrix.

1. **Rotation and Translation**:
   To account for the camera’s orientation and position, we introduce:
   
   - A **rotation matrix** $R$, which represents the orientation of the camera relative to the world.
   - A **translation vector** $C$, which represents the position of the camera’s optical center in the world.

2. **Generalized Projection Matrix**:
   The full projection matrix that accounts for the camera’s rotation and translation is:

   $$\left(\begin{array}{cccc} f & 0 & 0 & 0 \\ 0 & f & 0 & 0 \\ 0 & 0 & 1 & 0 \end{array}\right)\left(\begin{array}{cc} R & -RC \\ 0 & 1 \end{array}\right)$$

   - $R$ is a $3 \times 3$ matrix representing the camera’s rotation.
   - $C$ is a vector representing the position of the camera.

3. **Explanation**:
   - The first matrix handles the projection, scaling the $x$ and $y$ coordinates by the focal length.
   - The second matrix accounts for the camera's position and orientation in the world.

---

### Conclusion

In this detailed explanation, we’ve walked through the fundamentals of camera geometry, covering how cameras project 3D points from the world onto a 2D image plane. By understanding central projection, homogeneous coordinates, and the projection matrix, we can model the behavior of a camera and work with images more effectively in computer vision applications. This knowledge forms the foundation for more advanced tasks, such as reconstructing 3D scenes from 2D images or using cameras in robotics and augmented reality.