Custom YOLOv4 (You Only Look Once) for apple recognition (clean/damaged) on Alveo U280 accelerator card using Vitis AI framework.
A deep-learning model is caracterized by two distinct computation-intensive processes that are training and inference. During the training step, the model is taught to perform a specific task. On the other hand, inference is the deployment of the trained model to perform on new data. Real-time inference of deep neural network (DNN) models is a big challenge that the Industry faces, with the growth of latency constrained applications. For this reason, inference acceleration has become more critical than faster training. While the training step is most often carried out by GPUs, due to their high throughput, massive parallelism, simple control flow, and energy efficiency, FPGA devices (Field Programmable Gate Arrays) are more adapted to AI inference, by providing better performance per watt of power consumption than GPUs thanks to their flexible hardware configuration.
An important axis of research is the deployment of AI models on embedded platforms. To achieve that, along with smaller neural network architectures, some techniques like quantization and pruning allow to reduce the size of existing architectures without losing much accuracy. It minimizes the hardware footprint and energy consumption of the target board. These techniques perform well on FPGAs, over GPU.
One significant issue about conjugating AI inference with hardware acceleration is the expertise required in both domains, especially regarding low level development on accelerator cards. Fortunately, some frameworks make hardware more accessible to software engineers and data scientists. With the Xilinx’s Vitis AI toolset, we can quite easily deploy models from Keras-TensorFlow straight onto FPGAs.
Vitis™ is a unified software platform for embedded software and accelerated applications development on Xilinx® hardware platforms, with Edge, Cloud or Hybrid computing. The application code can be developed using high-level programming languages such as C++ and Python.
Vitis™ AI is a development environment whose purpose is to accelerate AI inference. Thanks to optimized IP cores and tools, it allows to implement pre-compiled or custom AI models and provides libraries to accelerate the application by interacting with the processor unit of the target platform. With Vitis AI, the user can easily develop Deep Learning inference applications without having a strong FPGA knowledge.
We chose to use the Vitis AI TensorFlow framework. For more information on Vitis AI, please refer to the official user guide.
In our case, the hardware platform is an Alveo™ Data Center Accelerator Card. This FPGA (Field Programmable Gate Arrays) is a Cloud device to accelerate the computing workloads of deep learning inference algorithms. Its processor unit is called a Deep-Learning Processor Unit (DPU), a a group of parameterizable IP cores pre-implemented on the hardware optimized for deep neural networks, compatible with the Vitis AI specialized instruction set. Different versions exists so as to offer different levels of throughput, latency, scalability, and power. The Alveo U280 Data Center Accelerator Card supports the Xilinx DPUCAHX8H DPU optimized for high throughput applications involving Convolutional Neural Networks (CNNs). It is composed of a high performance scheduler module, a hybrid computing array module, an instruction fetch unit module, and a global memory pool module.
A YOLOv4 model is able to detect objects in images through bounding boxes, classify the objects among a prefefined list of classes, and attribute a confidence score for each prediction. Please read this article and this one too to better understand the concept.
The original Darknet model was made from this tutorial. To implement your custom model, make your changes according to the section "Create your custom config file and upload it to your drive".
Our model was trained to detect apples in images and determine whether they are clean or damaged. The classes are written in this file and the anchors here.
To build the dataset, we used this scraper.
To annotate the samples, we used this GitHub project by developer0hye. The annotations follow the template "image_name class_label x_top_left y_top_left width height".
To make the model fit the accelerator card, we had to change the MaxPool size, and convert the mish activations to leaky RELU. Our changes are based on this tutorial. The '.cfg' file can be found here and the '.weights' can be downloaded here.
Before running the project, check the requirements from Vitis AI and make sure to complete the following steps :
Weights file :
🠊 Please download the weights of the YOLOv4 trained model here. Place the file in the folder /model/darknet, alongside the '.cfg' Darknet model.
Dataset folder :
🠊 Please unzip the dataset folder.
Versions :
- Docker : 20.10.6
- Docker Vitis AI image : 1.3.598
- Vitis AI : 1.3.2
- TensorFlow : 1.15.2
- Python : 3.6.12
- Anaconda : 4.9.2
Hardware :
In this section, we are going to explain how to run the project.
Open a terminal and make sure to be located in the workspace directory.
This project is executed through a succession of bash files, located in the /workflow/ folder.
You may need to first set the permissions for the bash files :
cd ./docker_ws/workflow/
chmod +x *.sh
cd ..
chmod +x *.sh
You can either run the scripts from the /workflow/ folder step by step, or run the two main scripts.
The first script to run serves to open the Vitis AI image in the Docker container.
Indeed, we can use the Vitis™ AI software through Docker Hub. It contains the tools such as the Vitis AI quantizer, AI compiler, and AI runtime for cloud DPU. We chose to use the Vitis AI Docker image for host CPU.
cd docker_ws
source ./workflow/0_run_docker_cpu.sh
See this guide.
source ./run_demo.sh
We used these model and dataset to quickly test our application code before deploying our own model.
Run the following script to execute the whole process.
source ./run_all.sh
This project is based on the workflow from Vitis AI tutorials using the Anaconda environment for TensorFlow.
For more details, please consult this guide.
Here are some results after running the model on the FPGA :
Let's evaluate the mAP score of the model running on the accelerator card. We set the confidence threshold to 0.6 and the IoU threshold to 0.5.
Model | Original | Intermediate graph | App (on Alveo U280) |
---|---|---|---|
mAP @ IOU50 score | 75.0 % | x | 91.0 % on the training set |
FPS | x | x | 12 |
- Find a way to be able to set the input shape with a variable when compiling the model;
- Create and annotate a new test set;
- Increase the FPS;
- Modify the AlexeyAB application that runs the Darknet model on the host machine to measure the execution time of the inference;
- Modify the AlexeyAB application to process the whole test set at once;
- Evaluate the mAP score for the AlexeyAB application after changing the output data to fit the annotations;
- Modify the code to run the freeze/quantized TensorFlow graph to normalize the data to be able to evaluate its score;
- Modify the code to run the freeze/quantized TensorFlow graph to draw boxes when running the graph;
- Improve the labels display in the application code;
- Run the Vitis AI Profiler
In order to deploy on the accelerator card your own YOLOv4 or YOLOv3 model, replace the '.cfg' and '.weights' files in this folder. Then, change the environment variables that determine the model specifications defined in the script "1_set_env.sh". Set the input shape in the script that compiles the model. Don't forget to update the name of the input and output tensors, and the shape of the input tensor. Finally, replace the current dataset by your own in this folder.
The mentionned projects below were used for this project as tools or source of inspiration :