# 3D Object Detection via Sensor Fusion (Lidar and Camera) 

Originaly written by: **PixelOverflow**

If you wish to watch videos from you can find videos in the following links:
### Sensor Fusion Tutorial:

- Part 1 - [3D Object Detection Overview](https://www.youtube.com/watch?v=hXpXKRnnM9o&t=0s)
- Part 2 - [Coordinate Transformations](https://www.youtube.com/watch?v=EfiYr61RGUA&t=0s) 
- Part 3 - [Loading Calibration Data](https://www.youtube.com/watch?v=pRAPXfWy-3A&t=0s)     
- Part 4 - [Sensor Fusion Pipeline](https://www.youtube.com/watch?v=vVtpKzEwEFM&t=0s)  
- Part 5 - [Check the Math](https://www.youtube.com/watch?v=lpjQnIrnt20&t=0s)  

In this tutorial we will dive into the KITTI dataset and detect objects in 3D using Early Sensor Fusion or Early Fusion which aims to fuse raw data from multiple sources and then perform detection. Late fusion on the other hand involves first detecting objects, and then fusing the detections. In this case we will perform a modified fusion, where we detect objects in the camera images and then fuse their centers with the LiDAR data to get depth.

The main steps are summarized as:

- Detect objects in the camera images (Detection)
- Project 3D LiDAR point clouds to 2D Image space (Fusion)
- Associate LiDAR depth with each Detected object (Association to get Depth)
- Detection in 3D as opposed to 2D is much more useful to an autonomous vehicle since 3D detection allows the system know where objects are physically located in the world.


For more information a readme for the KITTI data can be found [here](https://github.com/yanii/kitti-pcl/blob/master/KITTI_README.TXT), and a paper that details the data collection and coordinate systems can be found [here](http://www.cvlibs.net/publications/Geiger2013IJRR.pdf).


Now let's get the data and get started.

### Data prepration

In [5]:
# Download data
!wget https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_10_03_drive_0047/2011_10_03_drive_0047_sync.zip


In [1]:
# Unzip them
!unzip  2011_10_03_drive_0047_sync.zip
!unzip  2011_10_03_calib.zip

### Base Library Import

In [2]:
# Base Library Import
import os
from glob import glob
import cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams["figure.figsize"] = (20, 10)

### Import KITTI Utility functions

In [2]:
!wget https://github.com/itberrios/CV_tracking/raw/main/kitti_tracker/kitti_utils.py
from kitti_utils import *

### Data Overview
In the KITTI raw dataset we get images from four cameras (two grayscale and two RGB), the velodyne LiDAR, and the OXTS GPS navigation system.

The update rates are as follows:

- RGB camera: 15 Hz (15 fps)
- OXTS GPS navigation system: 100Hz
- Velodyne LiDAR: 10Hz

The data is synched to the LiDAR, since it has the lowest update rate, but the sync between the camera, GPS/IMU (navigation), and LiDAR is not precise (even though we are using the synched raw data!). Per the KITTI [description](http://www.cvlibs.net/publications/Geiger2013IJRR.pdf) the worst time difference between the camera/velodyne and gps/imu is at most 5ms. More precise measurements can be obtained with interpolation, but for simplicity we will neglect these differences since the small error from the imprecise sync will not greatly impact our measurements. We will see later when we project LiDAR points onto the camera images, that there is no noticable difference.

Now let's get the paths to all of the datafiles, the RGB images are standard .png's, the Navigation frames are .txt files, but the LiDAR point clouds are binary files. The [KITTI README](https://github.com/yanii/kitti-pcl/blob/master/KITTI_README.TXT) describes the structure of the binary files, and we will import a utility function to handle them.

In [8]:
PATH_DATA = r'2011_10_03/2011_10_03_drive_0047_sync'

# Get RGB camera data
left_image_paths = sorted(glob(os.path.join(PATH_DATA, 'image_02/data/*.png')))
right_image_paths = sorted(glob(os.path.join(PATH_DATA, 'image_03/data/*.png')))

# Get LiDAR data
bin_paths = sorted(glob(os.path.join(PATH_DATA, 'velodyne_points/data/*.bin')))

# Get GPS/IMU data
oxts_path = sorted(glob(os.path.join(PATH_DATA, r'oxts/data**/*.txt')))

print(f"Number of left images: {len(left_image_paths)}")
print(f"Number of right images: {len(left_image_paths)}")
print(f"Number of LiDAR point clouds: {len(left_image_paths)}")
print(f"Number of GPS/IMU frames: {len(left_image_paths)}")

Number of left images: 837
Number of right images: 837
Number of LiDAR point clouds: 837
Number of GPS/IMU frames: 837


### Camera/LiDAR/IMU Data

In order to obtain an understanding of the code to follow, it will help to cover the different reference frames that we will be working with and how we can convert between them. The Camera, LiDAR, and IMU are located at different positions on the vehicle and all have different reference frames. In autonomous research, the main vehicle that is collecting perception data (camera/LiDAR) is usually called the ego vehicle.

- camera

    - x → right
    - y → down
    - z → forward

- LiDAR

    - x → forward
    - y → left
    - z → up

- IMU

    - x → forward
    - y → left
    - z → up