
### **UNIT 4:**

* Stereo and Multi-view Reconstruction
* Structure-from-Motion
* Projection Matrices
* Camera Calibration
* Epipolar Geometry
* Fundamental and Essential Matrices
* Disparity Maps
* Optical Flows
* Volumetric Shape Reconstruction

  * From Window-based to Regularization-based Stereo
* Loss Functions

---


---

# **UNIT 4: Stereo and Multi-view Reconstruction**

---

## **1. Stereo and Multi-view Reconstruction**

**Concept:**
Stereo vision and multi-view reconstruction are techniques that recover **3D information** from **2D images** captured from different viewpoints.

* **Stereo Vision:** Uses **two cameras** (like human eyes) to estimate depth.
* **Multi-view Reconstruction:** Extends stereo vision to **many cameras or frames** (like video frames) to reconstruct 3D structure of a scene.

**Goal:**
To compute **depth (Z-coordinate)** of points and build a **3D model** of the scene.

**Key Steps:**

1. **Image capture** – Take images from different viewpoints.
2. **Feature matching** – Find corresponding points in different images.
3. **Epipolar geometry** – Use geometry of two cameras to constrain matches.
4. **Triangulation** – Compute 3D coordinates using matched points.
5. **Reconstruction** – Build full 3D scene or object.

**Applications:**

* 3D mapping
* Robotics and autonomous driving
* Virtual/augmented reality
* Medical imaging and 3D scanning

---

## **2. Structure-from-Motion (SfM)**

**Definition:**
Structure-from-Motion (SfM) is the process of **estimating 3D structure** and **camera motion** simultaneously from a sequence of 2D images.

**Steps:**

1. **Feature Detection:** Detect keypoints (SIFT, SURF, ORB, etc.)
2. **Feature Matching:** Match keypoints between multiple images.
3. **Camera Pose Estimation:** Compute relative position and orientation of cameras.
4. **Triangulation:** Compute 3D coordinates of matched points.
5. **Bundle Adjustment:** Optimize all camera parameters and 3D points to minimize re-projection error.

**Output:**

* Sparse 3D point cloud
* Camera positions and orientations

**Applications:**

* 3D scene reconstruction
* Drone mapping
* AR/VR environment creation

---

## **3. Projection Matrices**

**Definition:**
A **projection matrix** defines how a **3D world point** is projected onto a **2D image plane**.

**Mathematical Form:**
$$
s
\begin{bmatrix}
u \ v \ 1
\end{bmatrix}
=============

P
\begin{bmatrix}
X \ Y \ Z \ 1
\end{bmatrix}
$$

where

* $(X, Y, Z)$ = 3D world coordinates
* $(u, v)$ = 2D image coordinates
* $s$ = scale factor
* $P$ = 3×4 projection matrix

**Camera Projection Matrix:**
$$
P = K [R | t]
$$

* **K** = intrinsic parameters (focal length, optical center)
* **R** = rotation matrix
* **t** = translation vector

---

## **4. Camera Calibration**

**Purpose:**
Camera calibration finds the **intrinsic** and **extrinsic** parameters of the camera.

**A. Intrinsic Parameters (inside camera):**

* Focal length ($f_x, f_y$)
* Optical center ($c_x, c_y$)
* Skew (angle between x and y axes)
* Distortion coefficients (lens distortion)

**Intrinsic matrix (K):**
$$
K =
\begin{bmatrix}
f_x & \gamma & c_x\
0 & f_y & c_y\
0 & 0 & 1
\end{bmatrix}
$$

**B. Extrinsic Parameters (camera pose):**

* Rotation (R)
* Translation (t)

They define how the camera is placed in the world.

**Calibration Process:**

1. Capture images of a known pattern (like chessboard).
2. Detect corners in all images.
3. Solve for K, R, and t using optimization (e.g., Zhang’s method).

**Applications:**

* 3D reconstruction
* Robotics
* AR/VR alignment

---

## **5. Epipolar Geometry**

**Definition:**
Epipolar geometry is the **geometric relationship** between **two cameras** that view the same 3D scene.

**Key Terms:**

* **Epipole (e, e’):** Projection of one camera center onto the other’s image plane.
* **Epipolar Line:** Line along which the matching point in the other image must lie.
* **Epipolar Plane:** Plane passing through both camera centers and a 3D point.

**Property:**
A point in one image corresponds to an **epipolar line** in the other image.
This reduces matching search from 2D → 1D.

---

## **6. Fundamental and Essential Matrices**

### **A. Fundamental Matrix (F):**

Defines the epipolar constraint between two images.

Equation:
$$
x'^T F x = 0
$$
where

* $x$ = point in first image (in homogeneous form)
* $x'$ = corresponding point in second image

**F** is a 3×3 matrix of rank 2 and is valid for **uncalibrated cameras**.

**Computation:**
Estimated using **8-point algorithm** or **normalized 8-point algorithm**.

---

### **B. Essential Matrix (E):**

Used for **calibrated cameras** (known intrinsic parameters).

Relation between Fundamental and Essential matrices:
$$
E = K'^T F K
$$
**E** also satisfies:
$$
x'^T E x = 0
$$

**Decomposition of E:**

* Extracts rotation (R) and translation (t) between the two cameras.
* Used for stereo calibration and pose estimation.

---

## **7. Disparity Maps**

**Concept:**
Disparity represents the difference in the horizontal position of corresponding points in the left and right images.

$$
\text{Disparity} = x_L - x_R
$$

**Depth Estimation:**
Depth is inversely proportional to disparity:
$$
Z = \frac{fB}{d}
$$
where

* $f$ = focal length
* $B$ = baseline distance between cameras
* $d$ = disparity

**Disparity Map:**
A grayscale image showing disparity values for every pixel.
Brighter areas = closer objects, darker = farther away.

**Applications:**

* Depth estimation
* 3D modeling
* Self-driving vehicles

---

## **8. Optical Flow**

**Definition:**
Optical flow measures **motion of pixels** between two consecutive frames in a video.

**Assumption:**

* Brightness of a pixel remains constant during motion.

**Optical Flow Equation:**
$$
I_x u + I_y v + I_t = 0
$$
where

* $I_x, I_y$ = spatial derivatives
* $I_t$ = temporal derivative
* $(u, v)$ = pixel velocity components (flow vector)

**Methods:**

1. **Lucas–Kanade Method:** Local (block-based) approach using least squares.
2. **Horn–Schunck Method:** Global approach using smoothness constraints.

**Applications:**

* Motion detection
* Object tracking
* Video stabilization

---

## **9. Volumetric Shape Reconstruction**

**Concept:**
Builds a **3D volumetric model** from multiple 2D images or depth maps.

**Approaches:**

1. **Voxel-based (volume element) Reconstruction:**

   * Divide space into small cubes (voxels).
   * Label each voxel as inside or outside the object using silhouettes or depth data.
2. **Surface-based Reconstruction:**

   * Uses point clouds and fits surfaces (like meshes) through them.
   * Example: Poisson surface reconstruction.

**Applications:**

* 3D scanning
* CAD modeling
* Medical CT/MRI reconstruction

---

### **From Window-based to Regularization-based Stereo**

**1. Window-based Stereo:**

* Matches image patches (windows) between left and right images.
* Uses metrics like SAD (Sum of Absolute Differences) or SSD (Sum of Squared Differences).
* Simple but may fail near edges or texture-less areas.

**2. Regularization-based Stereo:**

* Adds smoothness or prior constraints to improve accuracy.
* Uses energy minimization:
  $$
  E(D) = E_{data}(D) + \lambda E_{smooth}(D)
  $$

  * **E_data:** matching cost
  * **E_smooth:** penalizes large disparity jumps
  * **λ:** controls smoothness strength
* Solved using graph cuts, belief propagation, or dynamic programming.

---

## **10. Loss Functions in Reconstruction**

Loss functions evaluate how well the estimated image or depth matches the ground truth.

**Common types:**

1. **Photometric Loss:**
   $$
   L_{photo} = |I_1(x) - I_2(x + d(x))|
   $$
   Measures brightness difference between matched pixels.

2. **Smoothness Loss:**
   Encourages smooth disparity/depth maps:
   $$
   L_{smooth} = |\nabla D(x)|
   $$

3. **Structural Similarity (SSIM) Loss:**
   Measures perceptual similarity between images.

4. **Depth Consistency Loss:**
   Ensures reconstructed depth aligns across views.

**Total Loss:**
$$
L = \alpha L_{photo} + \beta L_{smooth} + \gamma L_{SSIM}
$$

---

## **Summary Table**

| **Topic**                 | **Purpose / Use**                      |
| ------------------------- | -------------------------------------- |
| Stereo Vision             | Depth from two cameras                 |
| Structure-from-Motion     | 3D + camera motion from multiple views |
| Projection Matrix         | Maps 3D world to 2D image              |
| Camera Calibration        | Find intrinsic & extrinsic parameters  |
| Epipolar Geometry         | Geometric link between two cameras     |
| Fundamental Matrix        | For uncalibrated stereo pairs          |
| Essential Matrix          | For calibrated stereo pairs            |
| Disparity Map             | Depth representation                   |
| Optical Flow              | Pixel motion tracking                  |
| Volumetric Reconstruction | Build 3D models                        |
| Regularization Stereo     | Smooth, accurate depth                 |
| Loss Functions            | Optimization in 3D reconstruction      |

---
