# NS1 - Realsense
## Depth Estimation and Depth Sensors
Goal is to estimate the distance between objects and pixels in the image plane. With no prior knowledge about the objects in the scene we cannot estimate this from one monocular image only.
<figure style="text-align:center">
    <img src="assets/depth_sample_nyu.jpeg" />
    <figcaption > Indoor Depth Image (from NYU dataset) </figcaption>
</figure>

There are different approaches to estimate the depth, depending on which sensors are used in the process. 

### Time of Flight Sensors (ToF Sensors)
<figure style="text-align:center">
    <center> <img src="assets/PPHAU-Time-of-Flight-Sensors.png" /> </center>
    <figcaption > ToF sensors</figcaption>
</figure>

* Estimate the depth by by measuring the time it takes for a light puls to reflect of target.
* An example of this would be Azure Kinect Sensor.

### Stereo Depth Sensors
<figure style="text-align:center">
    <center><img style="allignt:center" src="assets/stereo-ssd-1.png" /></center>
    <figcaption > Stereo Depth Reconstruction (image source https://www.intelrealsense.com/stereo-depth-vision-basics/)</figcaption>
</figure>

* Estimate the depth using two infrared/color cameras with on the same baseline with a known displacement between them. 
* Match pixels/blocks between images using epipolar line search.
* An example of that is Intel Realsense D435i.

## Realsense D435(i): Tools and Setup

We are going to mainly use this sensor in both multiview tabletop setup, and head-mounted setup. So it is important to have a good understanding it, and the limitations the sensor has.

D435 and D435i sensors have stereo infrared depth sensor with static laser pattern for active stereo.
Sensor specs from the [Intel-RealSense-D400-Series-Datasheet](https://www.intel.com/content/dam/support/us/en/documents/emerging-technologies/intel-realsense-technology/Intel-RealSense-D400-Series-Datasheet.pdf)

* 1280x720 active stereo depth resolution
* 1920x1080 RGB resolution
* Depth Diagonal Field of View over 90°
* Dual global shutter sensors for up to 90 FPS depth streaming
* Range 0.2m to over 10m (Varies with lighting conditions)
* D435i includes Inertial Measurement Unit (IMU) for 6 degrees of freedom (6DoF) data.


There are multiple [whitepapers](https://dev.intelrealsense.com/docs/whitepapers) available to show the performace and limitation of these sensors. We will not cover them in this project lab.

### [realsense-viewer](https://github.com/IntelRealSense/librealsense/tree/master/tools/realsense-viewer)
A software which allows visualizing and recording and playing recorded sequences. 
Good for recording small sequences and experimenting with different post processing filters.
We will use it to do intrinsics and stereo extrinsics calibration.
<figure style="text-align:center">
    <center><img src="assets/realsense-viewer.png" /> </center>
    <figcaption> Realsense Viewer </figcaption>
</figure>
More on calibration and post-processing filters later.

#### Setup
Installation for realsense SDK and Viewer could be found in the [github repo](https://github.com/IntelRealSense/librealsense)

### [ROS Wrapper](https://github.com/IntelRealSense/realsense-ros)
A ROS package which supports multiple applications like streaming color, depth, and point clouds in addition to other examples like SLAM (along with tracking sensors).
<figure style="text-align:center">
    <img src="assets/rs-pointcloud-rviz.png" />
    <figcaption> Rviz visualization of rs-pointcloud launch</figcaption>
</figure>    
We will mainly use rosbags to record ego-perspective, and multiview sequences.

#### Setup

We are using [ROS Noetic](https://wiki.ros.org/noetic) setup the desktop-full. 
The packages we need are in https://github.com/IntelRealSense/realsense-ros for ubuntu they you could install the pre-built packages.

```bash
apt install ros-noetic-realsense2-camera ros-noetic-realsense2-camera-dbgsym ros-noetic-realsense2-description
```

#### Ros Messages
When reading the data from a RowWrapper rosbag or directly from the sensor there are some important topics depending on the [launch file](https://github.com/IntelRealSense/realsense-ros/tree/development/realsense2_camera/launch)



| topic  | type  | description  | launch file  | required options |
|---|---|---|---| --- |
| /camera/color/image_raw| sensor_msgs/Image  |  contains raw color image | rs_camera.launch  |  enable_color:=true |  
| /camera/aligned_depth_to_color/image_raw  | sensor_msgs/Image  | contains the depth image aligned  | rs_camera.launch | align_depth:=true |
| /camera/infra1/image_rect_raw | sensor_msgs/Image | contains rectified left infra image | enable_infra1:=true |
| /camera/infra2/image_rect_raw | sensor_msgs/Image | contains rectified right infra image | enable_infra2:=true |
 

#### Camera Relative Transformation
The sensor has two infrared cameras and one color. Therefore, we have multiple frames for each of the cameras. These relative transformations are important to get relative transformation from depth to color for example.
<figure style="text-align:center">
    <center><img src="assets/tf_static.svg" /> </center>
    <figcaption> Camera Transformation </figcaption>
</figure>

### [SDK (python/c++)](https://github.com/IntelRealSense/librealsense)
A software development kit which supports reading data from device and recordings (rosbags).


## Perspective Projection
<!-- ![Pinhole-Camera-Model-ideal-projection-of-a-3D-object-on-a-2D-image.png](attachment:8492ac5b-f710-4bd5-a110-d055626774a7.png)## Pinhole Camera Model -->
<figure style="text-align:center">
    <img src="assets/perspective_projection.png" />
    <figcaption > Pinhole camera and Perspective Projection </figcaption>
</figure>

The equation to project 3D points into the camera plane


$$pixels = K \times [R|t] \times Points$$


$$\begin{bmatrix}
u \\
v \\
1\end{bmatrix} = 
\begin{bmatrix} 
f_x & 0 & c_x \\
0 & f_y & c_y \\
0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
r_{0,0} & r_{0,1} & r_{0,2} & t_x \\
r_{1,0} & r_{1,1} & r_{1,2} & t_y\\
r_{2,0} & r_{2,1} & r_{2,2} & t_z\\
\end{bmatrix}
\begin{bmatrix}
x\\
y\\
z\\
1
\end {bmatrix}
$$
 
### Camera Intrinsics
These are parameters specific to the camera physichal model 
* $c_x, c_y$ are the principal point coordinates 
* $f$ focal length / principal point $z$ value
* $f_x$ focal length $f$ $/$ pixel width
* $f_y$ focal length $f$ $/$ pixel height
* Calibration matrix 3x3 matrix $$K = 
    \begin{bmatrix}
        f_x & 0 & c_x \\
       0 & f_y & c_y \\
       0 & 0 & 1 \end{bmatrix}$$
* Camera Extrinsics a 4x3 matrix $T=[R|t]$

$$ T = \begin{bmatrix}
r_{0,0} & r_{0,1} & r_{0,2} & t_0\\
r_{1,0} & r_{1,1} & r_{1,2} & t_1\\
r_{2,0} & r_{2,1} & r_{2,2} & t_2\\
\end{bmatrix}$$
* Distortion Model
Depending on lens shape the difference between real lens shape
D435i has "plumb bob"/"brown conrady" distortion model, which means it has two types of distortion [7.](https://calib.io/blogs/knowledge-base/camera-models)
  * Radial Distortion: Since the lens has a circular shape
  * Tangential Distortion: The image seems tilted and stretched because different lens elements not beeing perfectly aligned, or because the optical axis is not perfectly normal to the sensor plane.

We will use pyrender library to render images from a camera while controlling its intrinsics.


In [1]:
import matplotlib.pyplot as plt
import numpy as np
import trimesh
import numpy as np
import pyrender
from scipy.spatial.transform import Rotation


# inspired by https://pyrender.readthedocs.io/en/latest/examples/quickstart.html
model_path = 'assets/004_sugar_box/textured.obj'
trimesh_model = trimesh.load(model_path)
model = pyrender.Mesh.from_trimesh(trimesh_model)
scene = pyrender.Scene()
object_pose = np.eye(4)
object_pose[:3,3] = [0,0,2]
camera = pyrender.IntrinsicsCamera(fx=640, fy=640, cx=200, cy=200)
camera_pose = np.eye(4)
camera_pose[:3,3] = [0.3, 0.0, 0.35]
camera_pose[:3,:3] = Rotation.from_euler('xyz', [45, 0, 90],degrees=True).as_matrix()
scene.add(camera, pose=camera_pose)
light = pyrender.SpotLight(color=np.ones(3), intensity=50.0,
                                innerConeAngle=np.pi/16.0,
                                outerConeAngle=np.pi/6.0)
scene.add(light, pose=camera_pose)
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets


scene.add(model)

def update_camera(f:float, mx:int,my:int,cx:float, cy:float, height:int,width: int):
    global camera
    fx = f/mx
    fy = f/my
    camera.fx = fx
    camera.fy = fy
    camera.cx = cx
    camera.cy = cy
    
    r = pyrender.OffscreenRenderer(width,height)
    color, depth = r.render(scene)
    plt.figure()
    plt.subplot(1,2,1)
    # plt.axis('off')
    plt.imshow(color)
    plt.title('color')
    plt.subplot(1,2,2)
    # plt.axis('off')
    plt.imshow(depth, cmap=plt.cm.gray_r)
    plt.title('depth');
    
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

interact(update_camera, f=widgets.FloatSlider(value=400, min=-2000, max=2000, step=0.01),
                        mx=widgets.FloatSlider(value=1, min=0.1, max=100,step=0.01),
                        my=widgets.FloatSlider(value=1, min=0.1, max=100,step=0.01),
                        cx=widgets.FloatText(value=320),
                        cy=widgets.FloatText(value=240),
                        height=widgets.FloatText(value=480),
                        width=widgets.FloatText(value=640));

interactive(children=(FloatSlider(value=400.0, description='f', max=2000.0, min=-2000.0, step=0.01), FloatSlid…

## Stereo Reconstruction
<figure style="text-align:center">
    <img src="assets/realsense-stereo.png" />
    <figcaption > Stereo Reconstruction and Disparity Map </figcaption>
</figure>

### Epipolar Line Search and Disparity
* Our goal is to find the matching pixels from left and right image. If we know this, we could use triangles similarity to estimate the depth.
* The projection ray $C_{left}-P$ is a line on the right image plane called the epipolar line ($P_{right}, e_{right}$). 
* The intersection of the baeline and the image plane is called an epipole $e_{left}, e_{right}$
* This can be shown in the figure below
<figure style="text-align:center">
    <img src="assets/EpipolarLineSearch.png" />
    <figcaption > Epipolar Line Search </figcaption>
</figure>

* This means in order to find the matching pixel  for $P$ on the right view we only need to search on the Epipolar line corresponding to the ray $C_{left}-P$.

* In other words, the epipolar constraints will reduce the search space for us to find matching pixels.

* If the relative transformation between both views is only a horizontal displacement (i.e. cameras are alinged to the same basline). Then the the projection of $C_{left}-P$ to the right view view will be on the same row of pixels as in the left image (the epipoles will be in infinity because the baseline will be parallel to the pixels row).

* In practice the relative cameras are not exactly alligned on the same baseline, but have a sligt rotation. Therefore, the process of projecting the image view into a view alligned with the baseline is known as rectification.

* Note: In realsense the ros-topics rectified image topics has the suffix `_rect_raw`




<figure style="text-align:center">
    <img src="assets/realsense-stereo.png" />
    <figcaption > (https://dev.intelrealsense.com/docs/stereo-depth-cameras-for-phones) </figcaption>
</figure>

## D435(i) Calibration
The during calibration we optimize a subset of the sensor parameters to enhance the depth estimation. Realsense provides calibration using `realsense-viewer` and `dynamic calibration tool`. We will not cover them in details in this course.
With realsense-viewer we could calibrate the following:
* On Chip Calibration (stereo camera extrinsics)
* Focal length Calibration (focal length)
* Tare Calibration (stereo camera extrinsics)

For more details check reference [9]

## Homework
### Noe: Data for tasks 1 and 2 could be found [here](https://syncandshare.lrz.de/getlink/fiLmDyv8FXqFyN1X3hbhwazH/01-Realsense)

### 1.Stereo Reconstruction and laser-pattern (workload 1 student):
In this exercise, we will have a look over the 
1. Read the color, infrared1, infrared2 images in the folder Homework/HW-1-data (images with numbers (1262, 1755, 1131, 0000))
2. Use OpenCV Stereo Block Matching to find the disparity map, then use the equation for depth to calculate the estimated depth map. You could assume that (focal_length=970 mm, baseline=50 mm) 
3. Use OpenCV to visualize the reconstructed depth image along with the infrared images using `cv2.imshow`.
4. What is the difference between the depth quality with respect to 
     1. planes with texture (Checkerboard) vs. planes without texture (the PC case)
     2. with laser pattern (1262,1755) vs no laser-pattern (0000,1131) 

### 2. Object Twin (workload 3 students):
In this exercise, we will load a realsense-viewer rosbag recording, then use opencv and pyrender to create a twin of a moving checkerboard.
1. Loading color and depth data:
     * Use pyrealsense2 to read the bagfile and acquire color, depth, aligned depth to color, color camera intrinsics, depth camera intrinsics. (Show the images in a loop using `cv2.imshow`)
     
2. Checkerboard detection and tracking: 
     * The checkerboard has a `6x9` pattern where each square has an edge length of 4 cm.
     * Using opencv we want Find its corners (use `cv2.findChessboardCorners`, and `cv2.cornersSubPix`). then use `cv2.drawChessboardCorners` to overlay the detections on the colored image
     * From the previous step, you will have 2D/3D correspondences for the corners. Use `cv2.solvePnP` to estimate the object to camera translation and rotation vectors.
     * *Extra:* Use opencv drawing utils and perspective projection function to draw a 3D axis, and a cropping mask for the board. Useful functions here could be `cv2.line,cv2.projectPoints,cv2.fillPoly`.
3. Modeling the checkerboard in pyrender:
    * Using pyrender create a scene with camera and a `Box` mesh corresponding to the checkerboard.
    * Notes:
      1. You will need to scale the box and shift its center to match the checkerboard 3d coordinate system in opencv
      2. To convert from opencv camera to pyrender camera in you system you may need to rotate your objects by 90 degees around the X-axis (depending on your implementation) 
4. Visualization:
    * In the loop, update the mesh pose with the updated pose of the checkerboard
    * Compare the rendered depth value to the actual algined_depth values we got from realsense.

## References and Resources
[1]. https://www.intelrealsense.com/stereo-depth-vision-basics/

[2]. https://dev.intelrealsense.com/docs/intel-realsensetm-d400-series-calibration-tools-user-guide

[3]. https://dev.intelrealsense.com/docs/whitepapers

[4]. https://docs.opencv.org/4.x/

[5]. https://pyrender.readthedocs.io/en/latest/examples/quickstart.html

[6]. https://wiki.ros.org/noetic

[7]. https://calib.io/blogs/knowledge-base/camera-models

[8]. https://web.stanford.edu/class/cs231a/course_notes/03-epipolar-geometry.pdf

[9]. https://dev.intelrealsense.com/docs/intel-realsensetm-d400-series-calibration-tools-user-guide