# Pseudo-LIDAR

# Overview
An approach to creating LIDAR-like point clouds without working with stereo images, or depth camera but just working with a single image input (is applicable to Tesla).

## First Step:
- Perform monocular depth estimation and generate pseudo-LIDAR for the entire scene by lifting every pixel within the image into its 3D coordinate.
- Then train LIDAR-based 3D detection network with the pseudo-LIDAR.
- Using LIDAR-based 3D detector, **Frustum PointNets**, we detect the 2D object proposals in the input image and extract a point cloud frustum from the pseudo-LIDAR for each 2D proposal. Then, an oriented 3D bounding box is detected for each frustum.

## Problems:
1. Depth estimation based on a monocular image is inaccurate because of local misalignment, especially for the objects that are far off.
2. The extracted Point cloud always has a long-tail because it is hard to estimate depth near the edge/periphery of an object. This means that there are always extra points that are shown as belonging to the object when they actually don't.

## Solutions:
1. To solve local misalignment, when projecting the 3D box onto the image, we use a 2D-3D bounding box consistency constraint i.e. the 3D bounding box overlaps with the 2D detected proposals on the image. During training, we formulate the constraint as bounding box consistency loss (BBCL) to supervise learning.
    - During testing, a bounding box consistency optimization (BBCO) is solved subject to this constraint using a global optimization method to further improve the prediction results.

2. To deal with the long-tail of points proposed as belonging to the object, we porpose to use mask segmentation instead of 2D bounding boxes around the object because that would define the object pixel by pixel.


# Other Approaches
Models using 2D-3D bounding box consistency constraint are also used to predict 3D bounding boxes using 2D processing. For example, one proposal is to use 2D CNNs to predict a subset of features like the object orientation and size. During testing, we combine the estimates with the constraint to compute te remaining parameters like the object center location.

# Pseudo-LiDAR Approach:

Goal: Using one RGB image to estimate 3D bounding box of objects.
Parameters for the 3D bounding box (total 7):
Object center: (x,y,z)
Object's size: (h, w, l)
Heading angle: (theta)



![Pseudo LiDAR](PseudoLiDAR.png)

## Approach
Input image is passed into two module simultaneously:
a. Pseudo-LiDAR Generator
b. 2D Instance Mask Proposal detection (proposal loss is used to train this part of net)

The outputs from both are put together into Frustum PointNet which does 3D point cloud segmentation -> Using 3D segmentation Loss, we optimize the point cloud, then we pass it into 3D box estimation module and 3D box correction module simultaneously. 
1. 3D box estimation module outputs the 7 parameters which are added with the 7 parameters output by the correction module and then, we pass on the final estimate. We then project it onto the image. 

# Monocular Depth Estimation:

DORN network comes with pre-trained weights that serves the purpose of estimating monocular depth using a single RGB image. We do not update the weights of the network and so it can be thought os as an offline module. 