# Stereo Calibration

The function `cv::stereoCalibrate` in OpenCV compute the extrinsic parameters (rotation matrix $ R $ and translation vector $ T $) between the two cameras, as well as the intrinsic. 

- **Rotation matrix $ R = R_{c_1}^{c_2}$**: Relative orientation  of **Camera 1** expressed in **Camera 2**'s coordinate frame.
- **Translation vector $ T = T_{c_1}^{c_2}$**: Relative position (translation)  of **Camera 1** expressed in **Camera 2**'s coordinate frame.


So, if you have a 3D point $ P_1 $ in the coordinate frame of Camera 1, you can transform it to the coordinate frame of Camera 2 using the following equation:

$
P_2 = R \cdot P_1 + T
$

Where:
- $ P_1 $ is the 3D point in Camera 1's frame.
- $ P_2 $ is the corresponding 3D point in Camera 2's frame.
- $ R $ and $ T $ are the rotation and translation between the two camera frames.

```cpp
double rms = cv::stereoCalibrate(
    std::vector<std::vector<cv::Point3f>>& objectPoints,  // 3D points in the world coordinate
    std::vector<std::vector<cv::Point2f>>& imagePoints1,  // 2D points in the first camera image
    std::vector<std::vector<cv::Point2f>>& imagePoints2,  // 2D points in the second camera image
    cv::Mat& cameraMatrix1,  // Intrinsic matrix for the first camera
    cv::Mat& distCoeffs1,    // Distortion coefficients for the first camera
    cv::Mat& cameraMatrix2,  // Intrinsic matrix for the second camera
    cv::Mat& distCoeffs2,    // Distortion coefficients for the second camera
    cv::Size& imageSize,     // Size of the images
    cv::Mat& R,              // Rotation matrix between cameras
    cv::Mat& T,              // Translation vector between cameras
    cv::Mat& E,              // Essential matrix (optional)
    cv::Mat& F               // Fundamental matrix (optional)
    OutputArray 	perViewErrors,
    int 	flags = CALIB_FIX_INTRINSIC,
    TermCriteria 	criteria = TermCriteria(TermCriteria::COUNT+TermCriteria::EPS, 30, 1e-6) 
);
```





Refs: [1](https://www.cs.cmu.edu/~16385/s17/Slides/13.1_Stereo_Rectification.pdf), [2](https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html#ga3207604e4b1a1758aa66acb6ed5aa65d)



<img src="images/virtual_stereo.png" />
<img src="images/virtual_stereo_ray.png" />




# Image Rectification
Image rectification is transforming an image of a scene into a view that is aligned with a desired coordinate system. The goal of rectification is to remove the effects of camera perspective, rotation, and lens distortion, so that the resulting image has a uniform scale and appears to be captured from a front-facing perspective. 


In the following:
 
- The camera rotating around the `z` axis.
- The virtual image plane at `5°` degree is red and at `90°` is green. 
- The rectified images are in the blue virtual image plane. 
- The virtual plane must be parallel to the stereo baseline (orange). 


|   |   |
|---|---|
|<img src="images/image_rectification_1.png" alt="" />    |<img src="images/image_rectification_8.png" alt="" />  |
|<img src="images/image_rectification_20.png" alt="" />   | <img src="images/image_rectification_30.png" alt="" />  |

Refs:  [1](https://www.cs.cmu.edu/~16385/s17/Slides/13.1_Stereo_Rectification.pdf), [2](https://people.scs.carleton.ca/~c_shu/Courses/comp4900d/notes/rectification.pdf)
[3](https://www.andreasjakl.com/understand-and-apply-stereo-rectification-for-depth-maps-part-2/)

[code](../scripts/image_rectification.py)

# Stereo Rectification

If cameras are calibrated:
```cpp
cv::stereoRectify	(	InputArray 	cameraMatrix1,
InputArray 	distCoeffs1,
InputArray 	cameraMatrix2,
InputArray 	distCoeffs2,
Size 	imageSize,
InputArray 	R,
InputArray 	T,
OutputArray 	R1,
OutputArray 	R2,
OutputArray 	P1,
OutputArray 	P2,
OutputArray 	Q,
int 	flags = CALIB_ZERO_DISPARITY,
double 	alpha = -1,
Size 	newImageSize = Size(),
Rect * 	validPixROI1 = 0,
Rect * 	validPixROI2 = 0 
)		
```



---

#### **`R1` (3×3 Rectification Rotation Matrix for the First Camera):**

- **Purpose**: `R1` is a rotation matrix that transforms points from the unrectified coordinate system of the first camera to the rectified coordinate system of the first camera.
- **Technical Explanation**:
  - A *rectified coordinate system* aligns the epipolar lines (lines along which corresponding points between stereo images lie) to be parallel to the image rows.
  - This transformation simplifies stereo correspondence and disparity estimation.
  - In essence, `R1` changes the *basis* of points in the 3D space for the first camera so that its coordinate system aligns with the rectified one.
- **Use Case**: After applying `R1` to 3D points, they are in the rectified coordinate system for the first camera.

---

#### **`R2` (3×3 Rectification Rotation Matrix for the Second Camera):**

- **Purpose**: Similar to `R1`, `R2` performs a transformation for the second camera.
- **Technical Explanation**:
  - `R2` aligns the second camera's coordinate system to a rectified frame, ensuring its epipolar lines are also parallel and aligned with those of the first camera.
  - This matrix ensures that corresponding points in the second image align row-wise with those in the first image.
- **Use Case**: After applying `R2`, the 3D points in the second camera's coordinate system are rectified.

---

#### **`P1` (3×4 Rectified Projection Matrix for the First Camera):**

- **Purpose**: `P1` projects 3D points in the rectified coordinate system of the first camera into its 2D image plane.
- **Technical Explanation**:
  - A projection matrix maps 3D points $(X, Y, Z)$ to 2D image points $(u, v)$ using the intrinsic camera parameters and extrinsic parameters (rotation and translation).
  - `P1` includes the rectified camera matrix (intrinsics of the rectified system) and any additional translation or baseline offset due to rectification.
  - `P1` takes points from the rectified coordinate system and produces the corresponding 2D pixel coordinates in the first rectified image.
- **Use Case**: Used for projecting rectified 3D points into the rectified image from the first camera.

---

#### **`P2` (3×4 Rectified Projection Matrix for the Second Camera):**

- **Purpose**: Similar to `P1`, `P2` projects 3D points into the 2D image plane of the second camera in the rectified coordinate system.
- **Technical Explanation**:
  - `P2` contains the intrinsic parameters of the rectified second camera and includes the stereo baseline between the two cameras.
  - This matrix accounts for the rectification process and ensures that corresponding points between images are horizontally aligned.
- **Use Case**: Used for projecting rectified 3D points into the rectified image from the second camera.

---

#### **`Q` (4×4 Disparity-to-Depth Mapping Matrix):**

- **Purpose**: `Q` maps disparity (difference in horizontal pixel coordinates of a point in the two images) to depth information.
- **Technical Explanation**:
  - Disparity values $d = u_1 - u_2$ (where $u_1$ and $u_2$ are the horizontal coordinates of a point in the first and second rectified images) are used to compute depth $Z$:
    $
    Z = \frac{f \cdot B}{d}
    $
    where:
    - $f$: Focal length of the rectified cameras.
    - $B$: Baseline (distance between the two cameras).
  - `Q` encodes this relationship in a single transformation matrix, allowing for efficient computation of 3D points using the `reprojectImageTo3D` function.
- **Use Case**: Converts disparity maps into 3D point clouds by mapping disparity and pixel coordinates to 3D space.


Refs: [1](https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html#ga617b1685d4059c6040827800e72ad2b6)

#### **1. Using `R1` or `R2`:**

- **What You Have**: Points in the original, unrectified 2D image coordinates (e.g., $(u, v)$) for one or both cameras.
- **How It's Used**:
  - `R1` is part of the rectification process. It transforms 3D rays corresponding to the unrectified image points into a rectified coordinate system.
  - Internally, the rectification process uses camera calibration parameters (intrinsic and extrinsic) to compute the rectified images. You typically don't directly apply `R1` unless you are manually transforming rays or doing advanced tasks like custom 3D geometry adjustments.

- **Practical Use**:
  - When stereo images are rectified, you typically get the rectified images as output. After that, the alignment of epipolar lines is already ensured.
  - Once the images are rectified, you don't usually need to apply `R1` explicitly.

---

#### **2. Using `P1` and `P2`:**

- **What You Have**:
  - 3D points in the rectified coordinate system (e.g., derived from disparity and `Q`).
  - Or, you may have 2D image points in the rectified images and want to work backward to check their 3D projections.

- **How It's Used**:
  - `P1` projects rectified 3D points back to the 2D image plane for the first rectified image. Similarly, `P2` does this for the second rectified image.
  - For stereo depth estimation, `P1` and `P2` are mostly used indirectly, as they help compute rectified disparity maps.
  
- **Practical Use**:
  - If you're computing reprojection errors, for example, you might project estimated 3D points back into 2D images using `P1` or `P2` and compare with observed 2D points.
  - If you are generating synthetic stereo pairs from a 3D model, you might use `P1` and `P2` to project the model's points into rectified images.

---

#### **3. Using `Q`:**

- **What You Have**:
  - A disparity map, where each pixel's value represents the disparity $d$ between the corresponding points in the rectified images.
  - $d = u_{\text{left}} - u_{\text{right}}$, where $u_{\text{left}}$ and $u_{\text{right}}$ are the horizontal coordinates of corresponding points in the rectified images.

- **How It's Used**:
  - `Q` is used to convert disparity into 3D points in the real-world coordinate system:
    $
    [X, Y, Z, W]^\top = Q \cdot [u, v, d, 1]^\top
    $
    where:
    - $u, v$: Pixel coordinates in the rectified image.
    - $d$: Disparity value.
    - $W$: Homogeneous scaling factor (typically $W = 1$).

  - After the multiplication, divide by $W$ to get 3D coordinates:
    $
    X' = X/W, \quad Y' = Y/W, \quad Z' = Z/W
    $

- **Practical Use**:
  - This is the core of depth estimation. `Q` allows you to take a disparity map and compute a 3D point cloud for the scene.
  - For example, if you use `cv::reprojectImageTo3D`, OpenCV internally applies `Q` to compute the 3D points.

---

#### **Why Don't You Usually Use `R1`, `P1`, or `Q` Explicitly?**

1. **Rectification (`R1` and `R2`)**:
   - The rectification process is typically handled by functions like `cv::stereoRectify` or `cv::initUndistortRectifyMap`.
   - After rectification, you work with rectified images directly, where epipolar geometry is simplified.

2. **Projection Matrices (`P1` and `P2`)**:
   - These are mostly useful for reprojection tasks (e.g., projecting 3D points back to 2D for validation).
   - In most stereo vision workflows, you work with disparity maps and depth directly, not raw 3D projection.

3. **Disparity-to-Depth Mapping (`Q`)**:
   - `Q` is essential but is often used indirectly by functions like `cv::reprojectImageTo3D`, which compute 3D point clouds from disparity maps.

---

#### **Summary of Typical Workflow**:

1. **Calibration**: Compute camera parameters (intrinsic, extrinsic, and distortion coefficients).
2. **Rectification**: Use `R1`, `R2`, `P1`, and `P2` to rectify images.
3. **Disparity Estimation**: Compute a disparity map from the rectified stereo images.
4. **3D Reconstruction**: Use `Q` with the disparity map to compute the 3D point cloud.

This abstraction ensures that you rarely need to manually manipulate matrices like `R1`, `P1`, or `Q`. Instead, high-level functions handle these computations for you.

If cameras are not calibrated:
```cpp
cv::stereoRectifyUncalibrated	(	InputArray 	points1,
InputArray 	points2,
InputArray 	F,
Size 	imgSize,
OutputArray 	H1,
OutputArray 	H2,
double 	threshold = 5 
)		
```

Refs: [1](https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html#gaadc5b14471ddc004939471339294f052)


#### **Disparity Definition:**
Disparity $ d $ is given by the horizontal difference between the $ x $-coordinates of a corresponding point in the left and right images:
$
d = x - x'
$
Where:
- $ x $ = x-coordinate of the point in the left image.
- $ x'$ = x-coordinate of the corresponding point in the right image.

Distance between the centers of the two camera lens is $BD = BC + CD$. 

The triangles are similar,

- $ACB$ and $BFE$
- $ACD$ and $DGH$

${\displaystyle {\begin{aligned}{\text{disparity=}}d&=EF+GH\\&=BF({\frac {EF}{BF}}+{\frac {GH}{BF}})\\&=BF({\frac {EF}{BF}}+{\frac {GH}{DG}})\\&=BF({\frac {BC+CD}{AC}})\\&=BF{\frac {BD}{AC}}\\&={\frac {k}{z}}{\text{, where}}\\\end{aligned}}}$


Where $k$ is the distance between the two cameras times the distance from the lens to the image.

$k=BFBD$=$f \times \text{ Baseline}$

${\displaystyle d={\frac {k}{Z}}}$

${\displaystyle d={\frac {f \times \text{ Baseline}}{Z}}}$

${\displaystyle x-x'={\frac {f \times \text{ Baseline}}{Z}}}$

$Z = \frac{f \cdot B}{d=x-x'}  $



<img src="images/images_depth_to_displacement_relationship.png" width="30%" height="30%" />
<img src="images/stereo_depth.jpg" width="30%" height="30%" />

```cpp
void cv::triangulatePoints	(	InputArray 	projMatr1,
InputArray 	projMatr2,
InputArray 	projPoints1,
InputArray 	projPoints2,
OutputArray 	points4D 
)	
```

Points are in homogeneous rectified coordinated, meaning in first (left) camera


```
  // Convert points from homogeneous to 3D (divide by w)
  std::vector<cv::Point3f> triangulatedPoints;
  for (int i = 0; i < points4D.cols; ++i) {
    cv::Point3f point;
    point.x = points4D.at<float>(0, i) / points4D.at<float>(3, i);
    point.y = points4D.at<float>(1, i) / points4D.at<float>(3, i);
    point.z = points4D.at<float>(2, i) / points4D.at<float>(3, i);
    triangulatedPoints.push_back(point);
  }
```



Ref: [1](https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html#gad3fc9a0c82b08df034234979960b778c)

# Undistort and RectifyMap

```cpp
initUndistortRectifyMap()
void cv::initUndistortRectifyMap	(	InputArray 	cameraMatrix,
InputArray 	distCoeffs,
InputArray 	R,
InputArray 	newCameraMatrix,
Size 	size,
int 	m1type,
OutputArray 	map1,
OutputArray 	map2 
)	
```

The function computes the joint undistortion and rectification transformation and represents the result in the form of maps for remap. The undistorted image looks like original, **as if it is captured with a camera using the camera matrix =newCameraMatrix** 
and zero distortion. 

In case of a monocular camera, newCameraMatrix is usually equal to cameraMatrix, or it can be computed by getOptimalNewCameraMatrix for a better control over scaling. In case of a stereo camera, newCameraMatrix is normally set to P1 or P2 computed by stereoRectify .


Refs: [1](https://docs.opencv.org/3.4/da/d54/group__imgproc__transform.html#ga7dfb72c9cf9780a347fbe3d1c47e5d5a)


# Stereo matcher

```cpp
static Ptr<StereoBM> cv::StereoBM::create(int numDisparities = 0, int blockSize = 21)
```


- `numDisparities`:	the disparity search range. For each pixel algorithm will find the best disparity from 0 (default minimum disparity) to numDisparities. 

- `blockSize`:	the linear size of the blocks, The size should be odd (as the block is centered at the current pixel). Larger block size implies smoother, though less accurate disparity map. Smaller block size gives more detailed disparity map, but there is higher chance for algorithm to find a wrong correspondence.






# Disparity map

```cpp
cv::Mat disparity16S, disparity8U;
stereoBM->compute(rectifiedImageLeft, rectifiedImageRight, disparity16S);
```

- By default, `StereoBM::compute` returns a 16-bit signed single-channel image (`CV_16S`), where each pixel stores the disparity in $\frac{1}{16}$ of a pixel (i.e., the disparity is scaled by 16).
- To visualize or post-process the disparity, you often convert it to an 8-bit image:

Convert from 16S to 8U for easy visualization

```cpp
disparity16S.convertTo(disparity8U, CV_8U, 255.0/(numDisparities*16.0));
```

Now `disparity8U` is an 8-bit image you can display with `imshow` or save to disk.



<img src="images/disparity.png" />


# Convert disparity to 3D (depth map)

If you want a 3D point cloud or actual depth values, you can use `cv::reprojectImageTo3D` along with the \(Q\) matrix from `cv::stereoRectify`. When you call `cv::stereoRectify(...)`, one of the outputs is `Q`, which is a \(4 \times 4\) reprojection matrix used to map disparity values to 3D coordinates.

reprojectImageTo3D will compute `(X, Y, Z)` for each pixel


```cpp
cv::Mat xyz;  // Will hold 3D coordinates of each pixel
cv::reprojectImageTo3D(disparity16S, xyz, Q, /* handleMissingValues = */ true);
```

The `xyz` matrix will be `CV_32FC3`, where each pixel contains \((X, Y, Z)\) in the **rectified camera’s coordinate system** (in whatever units your focal length/baseline imply).

---


# Distance Between the two cameras (Baseline) and stereo angle

The short answer is: **it depends on your application.** Unlike human binocular vision—where eyes are fixed about 6–7 cm apart and slightly converged—engineering stereo setups can vary widely. That said, there are some useful rules of thumb and practical guidelines:

---

#### 1. Baseline (Distance Between Cameras)

- **Human-Sized Baseline (~6–7 cm).** If you aim to replicate human vision (e.g., VR or robotics that interacts with objects at roughly arm’s length to several meters), a baseline in this range is often ideal. It provides a reasonable balance between near and far depth perception.

- **Wider Baseline (> 10 cm).**  
  For detecting depth in environments where objects might be further away (e.g., several meters to tens of meters), a wider baseline increases disparity and yields more accurate depth at longer distances. However, if you’re looking at very close objects, a larger baseline can cause “dead zones” or excessive disparity.

- **Narrow Baseline (< 5 cm).**  
  Useful if you’re mostly interested in small, close objects or have physical constraints (like a compact camera rig). A narrower baseline reduces the maximum measurable depth but can improve matching accuracy for very close objects.

**Rule of Thumb:**  
Choose a baseline approximately **1/30 to 1/50 of the target distance** you care about. For example, if you’re consistently looking at objects around 1 meter away, consider a baseline of ~2–3 cm. If the objects are ~3 meters away, a 6–10 cm baseline might be better, and so on.

---

#### 2. Stereo Angle (Toe-In vs. Parallel)

1. **Parallel Cameras (0° Stereo Angle)**  
   - Easiest calibration and rectification (the image planes align more simply in software).  
   - Common choice in many stereo vision setups.  
   - You rely on epipolar geometry in post-processing to “shift” and find matching features.  

2. **Slight Toe-In (Convergence Angle ~5–10°)**  
   - Mimics the human eye convergence more closely.  
   - Can improve depth measurement for a known region of interest (e.g., a near focus point).  
   - More complex geometry: you must ensure accurate calibration since each camera has a different perspective with a non-zero yaw angle.

**Rule of Thumb:**  
- If you have a single, well-defined depth region (e.g., an object on a conveyor belt always at 1 m), you can angle your cameras slightly inward to center that region in your sensors and maximize disparity where it matters.  
- If your depth range varies a lot, parallel mounting keeps it simpler and is easier to generalize.

---

#### Putting It All Together

1. **Determine Your Primary Depth Range**  
   - How close/far are the objects you want to measure?

2. **Select a Baseline**  
   - Use a baseline that yields good disparity over that range (e.g., 6–7 cm if it’s roughly human-scale distances, or larger if you need more depth coverage).

3. **Decide on Stereo Angle**  
   - **Parallel** for general-purpose or wide-range stereo.  
   - **Slight convergence** for a more controlled, narrow-depth application.  

4. **Calibrate Carefully**  
   - Stereo calibration (finding intrinsic and extrinsic parameters) is critical no matter the baseline or angle. Good calibration ensures accurate depth reconstruction.

---

### Example Scenarios

1. **Mobile Robot or Drone**  
   - Often uses a baseline close to 6–10 cm (human-like) for obstacle avoidance at 1–5 m range.  
   - Cameras are usually mounted parallel for simpler algorithms.

2. **Industrial 3D Inspection**  
   - Baselines can be larger (10–30 cm or more) if objects are 1–5 meters away.  
   - Might slightly toe-in the cameras to optimize for a conveyor belt’s known path.

3. **VR/AR Headset**  
   - Typically ~6.3–6.5 cm to match the average human inter-pupillary distance (IPD).  
   - Slight toe-in might replicate natural eye convergence; often the software pipeline handles stereo rendering in parallel mode for convenience.

---

#### Key Takeaways

- **Baseline** is your primary lever for tuning the stereo system to a specific depth range.  
- **Small baseline** = better for short-range precision; **large baseline** = better for long-range resolution.  
- **Stereo angle** can be parallel or slightly toed-in; parallel is simpler but toe-in can be useful for a narrower focus range.  

By balancing these two factors and performing a precise stereo calibration, you can optimize your stereo camera setup for your particular application.

Refs: [1](https://towardsdatascience.com/a-comprehensive-tutorial-on-stereo-geometry-and-stereo-rectification-with-python-7f368b09924a)

[Python code](../scripts/multi_snapshot_stereo.py)


[C++ code](../src/virtual_stereo_vision_cameras.cpp)

