# "Depth Estimation and Object Detection"

> "Depth Estimation via stereo images and Object Detection"

- toc: true
- branch: master
- badges: false
- comments: true
- categories: [Computer Vision]
- hide: false
- search_exclude: false
- image: images/post-thumbnails/DepthOfCar.png
- metadata_key1: notes
- metadata_key2: 

# Purpose

Use stereo vision to find out the depth of an object(s) in an image. Specifically, we will find out how far cars & people are from the camera 

**Dataset** [KITTI Dataset](http://www.cvlibs.net/datasets/kitti/)

**Output** 

![](https://abhisheksreesaila.github.io/blog/images/stereo/DepthOfCar.png "Depth of Objects")

> youtube: https://youtu.be/ewBLt1lZ2Ik


## Outline of the implementation

The code is available here

- Load Left and Right Images from [KITTI Dataset](http://www.cvlibs.net/datasets/kitti/)

- Compute **Disparity**. Refer [here](https://ablearn.io/computer%20vision/2021/12/07/StereoVision.html#Finding-Depth) for an defintion and explanation 
    - Apply Stereo SGBM Matcher algorithm and computer Disparity
    - We wil obtain Disparity Map
    - Each pixel in the Disparity gives the Disparity value
- Read calibration parameters
 - P0 and P1 are projection matrix for gray scale
 - P1 and P2 are projection matrix for color scale
 - R0_rect = rotation matrix
 - Tr_velo_to_cam & Tr_imu_to_velo are translation matrices
- Compute the **Depth Map** (The depth map is a map that contains Z for each pixel) Check out the explanation [here](https://ablearn.io/computer%20vision/2021/12/07/StereoVision.html#Finding-Depth) 

 - From the calibration parameters we get the projection matrix. Decompose them to get Rotation (R), K (Camera) and T (translation matrix) through a process called **QR Factorization** Check out the explanation [here](https://ablearn.io/computer%20vision/2021/12/07/CameraCalibration.html#Decomposing-Projection-Matrix)
 
 - Use YOLO Object Detector to detect cars
  - You get bounding boxes coordinates. We need a point that represent the object (not 4 coordinates of the bounding boxes).  So, we get the "center" of the bounding box which effectively represents in the object. 
  
 - Build the pipeline and run it on image 
 - Run the pipeline on the video

### Bonus Section

Given the following

1. disparity map
2. camera matrices
3. baseline

- Use **cv2.stereoRectify**. to get a perspective transformation matrix (or Q matrix)

- Use the Q Matrix and **cv2.reprojectImageTo3D** to get a points in 3D space (camera)

- Use library such as "open3d" to visualize

![](https://abhisheksreesaila.github.io/blog/images/stereo/3d-reconstruction.png "3D reconstruction")

see opencv documentation [here](https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html#ga1bc1152bd57d63bc524204f21fde6e02) for explanation


# References

[Think Autonomous Course](https://courses.thinkautonomous.ai/stereo-vision)
