# Applied Computer Vision (ACV)

## Lab 1 - January 16th, 2026

### Camera Calibration & Projection + OpenCV Primer


# Camera Calibration with OpenCV: A Complete Guide


This notebook will guide you and your class through the concepts and practical steps of camera calibration using OpenCV. We'll cover:
- What camera calibration is and why it's important
- How to capture calibration images
- How calibration is computed
- How to use the results
- A primer on OpenCV basics for beginners

Let's get started!

---

## 1. What is Camera Calibration?

**Camera calibration** is the process of determining the internal characteristics (intrinsic parameters) and lens distortion of a camera. This is essential for:
- Correcting lens distortion (straight lines appear curved in raw images)
- Mapping 3D real-world points to 2D image points accurately
- Enabling precise measurements, 3D reconstruction, and robotics

### Why do we need calibration?
- Real-world cameras are not perfect pinhole cameras. Lenses introduce distortion.
- Calibration finds the camera matrix (focal length, principal point) and distortion coefficients.

### The Pinhole Camera Model
The relationship between a 3D point $(X, Y, Z)$ and its image projection $(x, y)$ is:

$$
\begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = K \cdot [R|t] \cdot \begin{bmatrix} X \\ Y \\ Z \\ 1 \end{bmatrix}
$$

Where $K$ is the camera matrix, $[R|t]$ is the rotation and translation (extrinsics).

#### Exploding the Matrices
- **Camera Matrix $K$ (Intrinsics):**
  - $K$ is a $3 \times 3$ matrix that contains the intrinsic parameters of the camera:
    $$
    K = \begin{bmatrix}
      f_x & s & c_x \\
      0 & f_y & c_y \\
      0 & 0 & 1
    \end{bmatrix}
    $$
    - $f_x, f_y$: Focal lengths in pixels (can differ for non-square pixels)
    - $s$: Skew (usually 0, unless the sensor axes are not perpendicular)
    - $c_x, c_y$: Principal point (usually near the image center)
- **$[R|t]$ (Extrinsics):**
  - $R$ is a $3 \times 3$ rotation matrix (orientation of the camera)
  - $t$ is a $3 \times 1$ translation vector (position of the camera)
  - Together, $[R|t]$ transforms 3D world coordinates to the camera's coordinate system.

### Lens Distortion
- **Radial distortion:** Straight lines appear curved (barrel or pincushion)
- **Tangential distortion:** Image appears slanted if lens is not parallel to sensor

*You can visualize these distortions with the images below (add your own for radial and tangential distortion):*

---

### Perspective Projection Visualizations

<img src="notebook_images/perspective_projection_offset.png" alt="Perspective Projection Offset" width="600"/>

<img src="notebook_images/perspective_projection_equation.png" alt="Perspective Projection Equation" width="600"/>

- To ensure positive pixel coordinates, a **principal point offset** $c$ is usually added.
- This moves the image coordinate system to the corner of the image plane.


## 2. Understanding Chessboard Patterns for Calibration

- **Why chessboards?**
  - Chessboard patterns provide a regular grid of high-contrast corners that are easy for algorithms to detect.
  - The known geometry (spacing and number of inner corners) allows us to relate 3D world points to 2D image points.

- **What are "inner corners"?**
  - Inner corners are the intersections inside the chessboard, not the squares themselves.
  - For a 9x7 chessboard, there are 9 columns and 7 rows of inner corners (so 63 total corners).

- **How does this help calibration?**
  - For each image, we know where each corner is in 3D (on the chessboard) and where it appears in the image (2D).
  - This correspondence is the foundation for solving the camera parameters.

**Diagram:**

|  |  |  |  |  |  |  |  |  |
|--|--|--|--|--|--|--|--|--|
|● |● |● |● |● |● |● |● |● |
|● |● |● |● |● |● |● |● |● |
|● |● |● |● |● |● |● |● |● |
|● |● |● |● |● |● |● |● |● |
|● |● |● |● |● |● |● |● |● |
|● |● |● |● |● |● |● |● |● |
|● |● |● |● |● |● |● |● |● |

(Each ● is an inner corner for a 9x7 pattern)

---

## 3. Capturing Calibration Images with OpenCV

To calibrate a camera, you need several images of a chessboard pattern taken from different angles and positions.

**Tips for capturing good calibration images:**
- Use at least 10-20 images for best results
- Vary the angle, distance, and orientation of the chessboard
- Ensure the chessboard is flat and fully visible in each image
- Avoid glare, shadows, and motion blur
- Use good lighting

**What happens during capture:**
- The script shows a live camera feed
- When the chessboard is detected, you can press SPACE to save the image
- The script saves images to a folder for later calibration

---

## 4. Exploring the Image Capture Script

Let's break down the `capture.py` script used for collecting calibration images:

**Key steps:**
1. **Set parameters:**
   - Save directory, number of images, chessboard size, camera index
2. **Open the camera:**
   - Uses `cv2.VideoCapture` to access the webcam
3. **Detect chessboard corners:**
   - Converts each frame to grayscale
   - Uses `cv2.findChessboardCorners` to find the pattern
   - If found, draws corners with `cv2.drawChessboardCorners`
4. **User controls:**
   - Press SPACE to save an image when the chessboard is detected
   - Press ESC to exit
5. **Save images:**
   - Images are saved to the specified directory for calibration

**Example code snippet:**
```python
found, corners = cv2.findChessboardCorners(gray, CHESSBOARD_SIZE, None)
if found:
    cv2.drawChessboardCorners(display, CHESSBOARD_SIZE, corners, found)
    # ...
```

**What to watch for:**
- Only save images when the chessboard is detected (green corners appear)
- Try to capture a variety of angles and positions

---

## 5. Basics of OpenCV: A Primer


OpenCV is a powerful library for computer vision and image processing. Here are some basics to get you started, in a recommended learning sequence:


### 1. Reading and Displaying Images
```python
import cv2
img = cv2.imread('image.jpg')
print('Image shape:', img.shape)  # (height, width, channels)
print('Pixel value at (100, 100):', img[100, 100])
cv2.imshow('Image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
```


### 2. Viewing Pixel Values and Image Shape
```python
print('Image shape:', img.shape)  # (height, width, channels)
print('Pixel at (50, 50):', img[50, 50])  # [B, G, R] values
```


### 3. Saving Images
```python
cv2.imwrite('output.jpg', img)
```


### 4. Converting to Grayscale
```python
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow('Grayscale', gray)
cv2.waitKey(0)
cv2.destroyAllWindows()
```


### 5. Drawing on Images
```python
cv2.rectangle(img, (50, 50), (200, 200), (0, 255, 0), 2)  # Draw a green rectangle
cv2.circle(img, (100, 100), 40, (255, 0, 0), -1)           # Draw a filled blue circle
cv2.imshow('Drawn Image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()
```


### 6. Reading Video Files
```python
cap = cv2.VideoCapture('video.mp4')
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    cv2.imshow('Video', frame)
    if cv2.waitKey(30) & 0xFF == 27:  # ESC to quit
        break
cap.release()
cv2.destroyAllWindows()
```


### 7. Reading from Webcam or USB Camera
```python
cap = cv2.VideoCapture(0)  # 0 for default webcam, 1/2/... for USB cameras
while True:
    ret, frame = cap.read()
    if not ret:
        break
    cv2.imshow('Webcam', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break
cap.release()
cv2.destroyAllWindows()
```


### 8. Basic Image Processing
```python
# Thresholding
_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)

# Edge detection
edges = cv2.Canny(gray, 100, 200)

# Show results
cv2.imshow('Threshold', thresh)
cv2.imshow('Edges', edges)
cv2.waitKey(0)
cv2.destroyAllWindows()
```


In [1]:
# Practice: Reading and displaying an image
import cv2
img = cv2.imread('calib_images\img_00.jpg')
cv2.imshow('Image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

In [None]:
# Practice: Viewing pixel values and image shape
import cv2
img = cv2.imread('img_00.jpg')
print('Image shape:', img.shape)  # (height, width, channels)
print('Pixel at (50, 50):', img[50, 50])  # [B, G, R] values

Image shape: (480, 640, 3)
Pixel at (50, 50): [20 26 21]


In [10]:
# Practice: Converting to grayscale
import cv2
img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.imshow('Grayscale', gray)
cv2.waitKey(0)
cv2.destroyAllWindows()

In [None]:
# Practice: Drawing on images
import cv2
img = cv2.imread('image.jpg')
cv2.rectangle(img, (50, 50), (200, 200), (0, 255, 0), 2)  # Green rectangle
cv2.circle(img, (100, 100), 40, (255, 0, 0), -1)           # Filled blue circle
cv2.imshow('Drawn Image', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

In [None]:
# Practice: Saving an image
import cv2
img = cv2.imread('image.jpg')
cv2.imwrite('output.jpg', img)

In [None]:
# Practice: Reading from webcam or USB camera
import cv2
cap = cv2.VideoCapture(0)  # 0 for default webcam, 1/2/... for USB cameras
while True:
    ret, frame = cap.read()
    if not ret:
        break
    cv2.imshow('Webcam', frame)
    if cv2.waitKey(1) & 0xFF == 27:
        break
cap.release()
cv2.destroyAllWindows()

In [None]:
# Practice: Reading a video file
import cv2
cap = cv2.VideoCapture('video.mp4')
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        break
    cv2.imshow('Video', frame)
    if cv2.waitKey(30) & 0xFF == 27:  # ESC to quit
        break
cap.release()
cv2.destroyAllWindows()

In [None]:
# Practice: Basic image processing (thresholding and edge detection)
import cv2
img = cv2.imread('image.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Thresholding
_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
# Edge detection
edges = cv2.Canny(gray, 100, 200)
# Show results
cv2.imshow('Threshold', thresh)
cv2.imshow('Edges', edges)
cv2.waitKey(0)
cv2.destroyAllWindows()

## 6. How Camera Calibration is Computed

Camera calibration is a mathematical process that finds the camera's intrinsic parameters and lens distortion.

### Steps:
1. **Collect object points:**
   - The 3D coordinates of the chessboard corners (in real-world units, e.g., millimeters)
2. **Collect image points:**
   - The 2D pixel coordinates where those corners appear in each image
3. **Run calibration:**
   - Use `cv2.calibrateCamera` to solve for the camera matrix and distortion coefficients

### The Camera Matrix (Intrinsics)
$$
K = \begin{bmatrix}
  f_x & 0 & c_x \\
  0 & f_y & c_y \\
  0 & 0 & 1
\end{bmatrix}
$$
- $f_x, f_y$: Focal lengths (in pixels)
- $c_x, c_y$: Principal point (optical center)

### Distortion Coefficients
- $k_1, k_2, k_3$: Radial distortion
- $p_1, p_2$: Tangential distortion

### The Calibration Equation
$$
\text{image point} = \text{project}(K, \text{distortion}, R, t, \text{object point})
$$
Where $R$ and $t$ are the rotation and translation for each image.

---

## 7. Exploring the Calibration Script

Let's walk through the `calibrate.py` script:

**Key steps:**
1. **Set parameters:**
   - Chessboard size, square size, image directory, output file
2. **Prepare object points:**
   - Create a 3D grid of chessboard corners in real-world units
3. **Load images:**
   - Find all `.jpg` images in the calibration folder
4. **Detect and refine corners:**
   - For each image, detect chessboard corners with `cv2.findChessboardCorners`
   - Refine corner locations to subpixel accuracy with `cv2.cornerSubPix`
5. **Collect points:**
   - Store 3D object points and 2D image points for each successful detection
6. **Run calibration:**
   - Use `cv2.calibrateCamera` to compute the camera matrix and distortion
7. **Assess quality:**
   - Print the reprojection error and interpret the results
8. **Save results:**
   - Store the calibration data in a YAML file for later use

**Error handling:**
- The script checks for missing images, failed detections, and calibration errors, and provides helpful messages.

---

## 8. Saving and Using Calibration Parameters

After calibration, the camera matrix and distortion coefficients are saved to a file (e.g., `calibration.yaml`).

**How to load and use these parameters:**

```python
import cv2
import yaml
import numpy as np

# Load calibration data
with open('calibration.yaml') as f:
    calib = yaml.safe_load(f)
    mtx = np.array(calib['camera_matrix'])
    dist = np.array(calib['dist_coeff'])

# Undistort an image
img = cv2.imread('test_image.jpg')
undistorted = cv2.undistort(img, mtx, dist)
cv2.imshow('Undistorted', undistorted)
cv2.waitKey(0)
cv2.destroyAllWindows()
```

**Applications:**
- Undistorting images for measurement or further processing
- Accurate 3D reconstruction
- Robotics and augmented reality

---

## ArUco Marker Pose Detection: Code Walkthrough and Concepts

### What is "Pose"?
- In computer vision, the **pose** of an object means its position and orientation in 3D space relative to the camera.
- For a marker, this is usually given as:
  - **Translation vector (tvec):** X, Y, Z position (in meters) from the camera.
  - **Rotation vector (rvec):** Orientation, often as axis-angle, Euler angles, or a rotation matrix.
- Knowing the pose allows you to overlay graphics (augmented reality), localize robots, or measure real-world distances.

### How ArUco Pose Detection Works
1. **Camera Calibration**:
   - Loads the camera matrix and distortion coefficients from a YAML file (from your calibration step).
2. **ArUco Marker Detection**:
   - Uses OpenCV's ArUco module to detect markers in the video frame.
   - Finds the 2D image coordinates of the marker corners.
3. **Pose Estimation**:
   - Knows the real-world size and 3D coordinates of the marker corners.
   - Uses `cv2.solvePnP` to compute the marker's pose (rvec, tvec) from the 2D-3D correspondences.
   - Draws axes on the marker to visualize orientation.
4. **Display**:
   - Shows the marker ID, distance, position (X, Y, Z), and rotation (roll, pitch, yaw) on the video.

### Code Walkthrough
- **Imports and Config**: Loads OpenCV, numpy, yaml, and sets marker size, dictionary, and calibration file.
- **Calibration Load**: Reads camera intrinsics and distortion from YAML.
- **ArUco Detector**: Sets up the ArUco dictionary and detector parameters.
- **Marker 3D Points**: Defines the real-world coordinates of the marker corners (centered at (0,0,0)).
- **Video Capture**: Opens the webcam.
- **Main Loop**:
  - Reads a frame, detects markers.
  - For each detected marker:
    - Finds its corners in the image.
    - Calls `cv2.solvePnP` to estimate pose.
    - Draws axes and overlays pose info.
    - Optionally prints detailed pose to terminal.
  - If no marker is found, displays a warning.
- **Rotation Conversion**: Converts the rotation vector to roll, pitch, yaw (Euler angles) for easier interpretation.
- **Controls**: ESC/Q to quit, P to print pose details.

### Key Functions
- `cv2.aruco.ArucoDetector`: Detects ArUco markers in the image.
- `cv2.solvePnP`: Computes the pose of the marker from 2D-3D correspondences.
- `cv2.drawFrameAxes`: Draws the 3D axes on the marker.

### Example Output
- **Position (X, Y, Z):** Where the marker is in meters relative to the camera.
- **Rotation (Roll, Pitch, Yaw):** How the marker is oriented.
- **Axes Colors:** X (red), Y (green), Z (blue).

**Generate your own ArUco markers here:** [https://chev.me/arucogen/](https://chev.me/arucogen/)

This technique is widely used in robotics, AR, and camera localization!
