

# ML Accelerator for Autonomous Driving Algorithm

Ritarka Samanta, Connor Talley, Minseung Jung, Gayoung Nam, Dinyar Islam, Owen Kew Advisor: Dr. Callie Hao



#### Motivation

Object Tracking algorithm is quite slow

HLS and FPGAs can be used to design complex hardware

Use FPGAs to accelerate object detection algorithm



Detect multiple objects in a HD frame
Track objects across frames
Accurately, Efficiently, In real-time!

| Dataset | FPS   |  |  |
|---------|-------|--|--|
| MOT17   | 25-30 |  |  |
| MOT20   | 25    |  |  |
| BDD100K | 5     |  |  |
| Waymo   | 10    |  |  |
| TAO     | 1     |  |  |









## ResNet50 layer's latency

run on Xilinx ZCU

| Layer<br>Name | Latency<br>(ns) | BRAM               | DSP      | FF            | LUT           |
|---------------|-----------------|--------------------|----------|---------------|---------------|
| 0.0.1         | 3.117e+12       | 5488<br>(300%)     | 26 (1%)  | 3273<br>(~0%) | 7012 (2%)     |
| 0.0.2         | 2.894e+10       | 5209<br>(300%)     | 13 (~0%) | 4523<br>(~0%) | 9823 (3%)     |
| 1.0.0         | 5.529e+09       | 214130<br>(11739%) | 15 (~0%) | 10142<br>(1%) | 10754<br>(3%) |
| 1.0.1         | 1.448e+09       | 427570<br>(23441%) | 17 (~0%) | 10275<br>(1%) | 9885 (3%)     |
| 1.0.2         | 2.452e+09       | 322349<br>(17672%) | 17 (~0%) | 10242<br>(1%) | 10235<br>(3%) |
| •••           | •••             | •.••               | •••      | •••           | • . • .       |

### Live Inference for QDTrack









#### **Future Work**

- 1. Reduce BRAM Utilization
- 2. Implement inter-FPGA Communication
- 3. Synthesize entire algorithm into hardware



