# 3D object detection

### The chosen model and justification for its selection.
  -  We have chosen the https://github.com/maudzung/SFA3D (SFA3D) for the 3d object detection task.
  - Paper focuses on 3D object detection using LiDAR point clouds. 
  - From the paper we could see that this method does not uses Non-Max Suppression which is computationally expensive, further focusing on real-time 3d object detection applications like in our case, vehicle detection tasks.
  - The model's inference is simple in terms of input data like a bird's-eye view (BEV) map encoded by height, intensity, and other features extracted from the raw LiDAR data.
  

 





## Model configuration.
- Feature Pyramid Network (FPN) with ResNet: The model uses FPN built on a ResNet backbone to process the LiDAR data. This combination particularly helps in detecting objects across various scales and sizes by utilizing a hierarchical feature pyramid.
- No Non-Max Suppression (NMS): Unlike typical detection frameworks that use NMS to filter out overlapping boxes, SFA3D skips this step to speed up the detection process.
- Model takes BEV map and returns bounding box information in 3d space such as class_scores, x, y, z, length, width, height and yaw.
- The returned model predictions are in BEV co-ordinates and further we scale it to lidar co-ordinates.



 


## Overview of Bird's Eye View (BEV) calculation.
- Model uses a bird's-eye view (BEV) map to represent the LiDAR point cloud data. The BEV map is a 2D representation of the 3D point cloud data from a top-down perspective.    
- The BEV map is created by discretizing the 3D space into a grid and projecting the LiDAR points onto the grid. Each cell in the grid contains information about the points that fall within it, such as the height, intensity, and other features.
- Roughly the BEV computation can be broken down into the following steps:
  - Filtering Points: filter out points outside the specified x, y, and z limits. 
  - Normalization: normalize the z-coordinates by subtracting the minimum z limit, which helps in handling different elevation levels correctly.
  - Discretization: convert x and y coordinates to BEV map indices. The adjustment for y-coordinates to ensure there are no negative indices.
  - Channel Computation: Then calculate channels for height, density, and intensity.
  - Returned BEV Map is of 608x608x3 shape.



# Performance metrics attained on the provided dataset.
- The following metrics are performed for IOU threshold of 0.5.
- We considered only the detections that were in the x range of 0 to 50 meters.
- The following metrics are average score over all the frames in the dataset for the car class.


`Precision: 0.6959`
`Recall: 0.6724`
`F1 Score: 0.6806`
`Average IoU(Bounding box overlap): 0.4320`


# Instructions to run the code

 - git clone git@github.com:Harighs/Autonomous_vehicles_ue_2.git
 - cd Autonomous_vehicles_ue_2/SFA3D
 - pip install -r requirements.txt
 - cd sfa
 - we provide the bev_maps in .pkl file format which can be downloaded using the link https://drive.google.com/file/d/1QwRw2PNACQQQuhhbGUw3f2RoqwmRPS3c/view?usp=sharing 
 - place the `bev_maps_v1.pkl` inside the `sfa` folder.
 - Now run the following command to evaluate the model on the provided dataset.
   - python 3d_object_detection.py --data_path /path/to/bev_maps_v1.pkl --image_path /path/to/images 
 - Set the --data_path to the path where the `bev_maps_v1.pkl` is placed and --image_path to the path where the waymo dataset are stored in the .pb format.
  - When running the `3d_object_detection.py` file by default it will print out the performance metrics attained on the provided dataset.
    - in case you want to visualize the detections on the images like BEV predictions or tracking for a each and every single frame you can comment out the several lines of code. Those instructions are provided in the `3d_object_detection.py` file.
 - We have also placed the prediction results for each frame in a video format placed it as `output.mp4`.
                 


# Contributions

| Name                    | student ID | Task                                                                                             | 
|-------------------------|------------|--------------------------------------------------------------------------------------------------|
| Ariharasudhan Muthusami | K52008888  | 3d object detection and evaluation, performance metrics(3d_object_detection.py), report writing. | 
| Harishankar Govindasamy | K11931161  |                                                                                                  | 
| Ayman Kamel             | K12136508  |                                                                                                  | 
| Jonathan Uyi Ehiosu     | K01628444  |                                                                                                  | 
 
