# Object Detection with YOLO-v2 (Darknet -> Caffe)

This tutorial demonstrates the steps required to prepare and deploy a trained Darknet model for FPGA acceleration  
We will prepare a trained YOLO v2 model, and then run a single detection.  

## Introduction

You only look once (YOLO) is a state-of-the-art, real-time object detection algorithm.  
The algorithm was published by Redmon et al. in 2016 via the following publications:
[YOLOv1](https://arxiv.org/abs/1506.02640),
[YOLOv2](https://arxiv.org/abs/1612.08242).  
The same author has already released YOLO-v3, and some experimental tiny YOLO networks. We focus on YOLOv2.  
This application requires more than just simple classification. The task here is to detect the presence of objects, and localize them within a frame.  
Please refer to the papers for full algorithm details, and/or watch [this.](https://www.youtube.com/watch?v=9s_FpMpdYW8) 
In this tutorial, the network was trained on the 80 class [COCO dataset.](http://cocodataset.org/#home)

## Background

The authors of the YOLO papers used their own programming framework called "Darknet" for research, and development. 
The framework is written in C, and was [open sourced.](https://github.com/pjreddie/darknet)
Additionally, they host documentation, and pretrained weights [here.](https://pjreddie.com/darknet/yolov2/)

Currently, the Darknet framework is not supported by Xilinx Vitis-AI.
Additionally, there are some aspects of the YOLOv2 network that are not supported by the Hardware Accelerator, such as the reorg layer. For these reasons the network was modified, retrained, and converted to caffe. In this tutorial we will run the network accelerated on an FPGA using INT8 quantized weights. All convolutions/pools are accelerated on the FPGA fabric, while the final sigmoid, softmax, and non-max suppression functions are executed on the CPU.  

# Model Preparation (Offline Process, Performed Once):


## Setup (Before Running Notebook)

```sh
source $VAI_ALVEO_ROOT/overlaybins/setup.sh
```

## Convert to Caffe

- Xilinx provides a darknet2caffe.py python script  
- The script will take as arguments a darknet `.cfg` file, and a darknet `.weights` file, then generate a `.prototxt` and a `.caffemodel`.  
- This is necessary for integration into the downstream components of Vitis-AI.  
                                                                          
## Quantize The Model

- The Quantizer will generate a `quantize_info.txt` file holding parameters for quantizing floats to INT8.
- This is required, because FPGAs will take advantage of Fixed Point Precision, to achieve faster inference
- While floating point precision is useful in the model training scenario, it is not required for high speed, high accuracy inference
          
## Compile The Model

- A Network Graph (`prototxt`) and a Weights Blob (`caffemodel`) are compiled along with the `quantize_info` that is generated in the prevoius step.
- The network is optimized.
- FPGA Instructions are generated.
- These instructions are required to run the network in "one-shot", and minimize data movement.
- This step also generates a `quantizer.json` which has channel-wise parameters to carry out fixed point operations accross channels and layers.

### Get VAI Environment and Import libraries

In [None]:
# Environment Variables ("source overlaybins/setup.sh")
import os
os.environ["DECENT_DEBUG"] = '1'
VAI_ALVEO_ROOT = os.getenv("VAI_ALVEO_ROOT","../")
VAI_LIB = os.getenv("LIBXDNN_PATH")
print("Running w/ VAI_ALVEO_ROOT: %s" % VAI_ALVEO_ROOT)
print("Running w/ VAI_LIB: %s" % VAI_LIB)

In [None]:
import sys, cv2, timeit
import numpy as np

from matplotlib import pyplot as plt
%matplotlib inline

from vai.dpuv1.rt import xdnn, xdnn_io
from vai.dpuv1.rt.vitis.python.dpu.runner import Runner
from vai.dpuv1.utils.postproc import yolo

sys.path.append(os.path.join(VAI_ALVEO_ROOT, "apps/yolo"))
from get_decent_q_prototxt import get_train_prototxt_deephi
from yolo_utils import bias_selector, saveDetectionDarknetStyle, yolo_parser_args
from yolo_utils import draw_boxes, generate_colors
from get_mAP_darknet import calc_detector_mAP

### Load an image from disk.

Let's load an image (Image courtesy of openimages)

In [None]:
img = cv2.imread(VAI_ALVEO_ROOT+"/apps/yolo/test_image_set/5904386289_924b24d75d_z.jpg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.rcParams['figure.figsize'] = [24.0,16.0]
plt.imshow(img)
plt.show()

### Now load yolo_v2 model

In [None]:
config = {}

# Required files
config["prototxt"] = VAI_ALVEO_ROOT+"/models/caffe/yolov2/fp32/yolo_v2_prelu_608.prototxt" 
config["caffemodel"] = VAI_ALVEO_ROOT+"/models/caffe/yolov2/fp32/yolo_v2_prelu.caffemodel" 
config["labels"] = VAI_ALVEO_ROOT+"/apps/yolo/coco.names"
config["images"] = [VAI_ALVEO_ROOT+"/apps/yolo/test_image_set/5904386289_924b24d75d_z.jpg"]

# YOLO Configs
config["yolo_model"] = "yolo_v2_prelu"
config["yolo_version"] = "v2"
config['net_h'] = 608
config['net_w'] = 608
config['scorethresh'] = 0.7
config['iouthresh'] =  0.4
config['anchorCnt'] = 5
config['classes'] = 80

# VAI configs
config['batch_sz']  = 1
config['vitis_rundir'] = 'work'
config['architecture'] = os.path.join(VAI_ALVEO_ROOT, 'apps/yolo/arch.json')
config['xlnxlib'] = VAI_LIB


In [None]:
!mkdir -p vitis_rundir

## Download Model From Xilinx.com
prototxt = "$VAI_ALVEO_ROOT/models/caffe/yolov2/fp32/yolo_v2_prelu_608.prototxt"
!if [ ! -f $prototxt ]; then cd $VAI_ALVEO_ROOT && \
wget https://www.xilinx.com/bin/public/openDownload?filename=models.caffe.yolov2_2019-08-01.zip -O temp.zip && \
unzip -o temp.zip && cd -; fi;
##

### Quantize the model

Here, we will quantize the model. The inputs are model train_val prototxt, model weights, number of test iterations and calibration iterations. The output is quantized prototxt, weights, and quantize_info.txt and will be generated in the quantize_results/ directory.

The Quantizer will generate a json file holding scaling parameters for quantizing floats to INT8 This is required, because FPGAs will take advantage of Fixed Point Precision, to achieve accelerated inference

In [None]:
# We need to generate a train val prototxt from depoly prototxt
get_train_prototxt_deephi(".", config["prototxt"], 
                          config["vitis_rundir"] +"/"+ "train_val.prototxt",
                          VAI_ALVEO_ROOT+"/apps/yolo/images.txt", 
                          VAI_ALVEO_ROOT+"/apps/yolo/test_image_set/")

In [None]:
# Use a config dictionary to pass parameters to the compiler
# Quantizer Arguments
#config["outmodel"] = Defined in Step 4 # String for naming intermediate prototxt, caffemodel
def Quantize(prototxt,caffemodel,calib_iter=1):
    
    quantizer = xfdnnQuantizer(
        model=prototxt,
        weights=caffemodel,
        calib_iter=calib_iter,
    )
    
    quantizer.quantize()

In [None]:
# Run the Quantizer
!vai_q_caffe quantize \
    --model work/train_val.prototxt \
    --weights {config["caffemodel"]} \
    --output_dir work/ \
    --calib_iter 5 

### Define an VAI Compiler instance and pass it arguments.  
The compiler takes in the quantizer outputs from the previous step (prototxt, weights, quantize_info) and outputs a compiler.json and quantizer.json.

* A Network Graph (prototxt) and a Weights Blob (caffemodel) are compiled
* The network is optimized
* FPGA Instructions are generated
  

In [None]:
compiler_flags = "{ \
    'ddr':1024, \
    'quant_cfgfile':'work/quantize_info.txt', \
    'mixmemorystrategy':True, \
    'poolingaround':True, \
    'parallism':True, \
    'parallelread':['bottom','tops'], \
    'parallelismstrategy':['tops','bottom'], \
    'pipelineconvmaxpool':True, \
    'fancyreplication':True }"

!vai_c_caffe \
    --prototxt work/deploy.prototxt \
    --caffemodel work/deploy.caffemodel \
    --arch {config['architecture']} \
    --output_dir work \
    --net_name "compiler" \
    --options "{compiler_flags}"

In [None]:
config['netcfg'] = 'work/compiler.json'
config['quantizecfg'] = 'work/quantizer.json'
config['weights'] = 'work/weights.h5'
config['xclbin'] = '/opt/xilinx/overlaybins/xdnnv3'

# Model Deployment (Online Process, Typically Performed Iteratively):  
    

Next, we'll utilize the Vitis-AI APIs to deploy our network to the FPGA. We will walk through the deployment APIs, step by step:

1. Open a handle for FPGA communication
2. Load weights, biases, and quantization parameters to the FPGA DDR
3. Allocate storage for FPGA inputs (such as images to process)
4. Allocate storage for FPGA outputs (the activation of the final layer run on the FPGA)
5. Execute the network
6. Run the postprocessing on CPU
7. Print the result 

First, we will create the handle to communicate with the FPGA and choose which FPGA overlay to run the inference on. 
        
### Open a handle for FPGA communication.

In [None]:
runner = Runner(config['vitis_rundir'])

### Allocate space in host memory for inputs, load images from disk, and prepare images. 

In [None]:
inTensors = runner.get_input_tensors()
outTensors = runner.get_output_tensors()
batch_sz = config['batch_sz']
if batch_sz == -1:
    batch_sz = inTensors[0].dims[0]

fpgaBlobs = []
for io in [inTensors, outTensors]:
    blobs = []
    for t in io:
        shape = (batch_sz,) + tuple([t.dims[i] for i in range(t.ndims)][1:])
        blobs.append(np.empty((shape), dtype=np.float32, order='C'))
    fpgaBlobs.append(blobs)
fpgaInput = fpgaBlobs[0][0]

# Load the image to the buffers
fpgaInput[0,...], img_shape = xdnn_io.loadYoloImageBlobFromFile(img,  config['net_h'], config['net_w'])


### Execute the network.

In [None]:
# Run the inference in FPGA
jid = runner.execute_async(fpgaBlobs[0], fpgaBlobs[1])
runner.wait(jid)

# Run the postprocessing on CPU
boxes = yolo.yolov2_postproc(fpgaBlobs[1], config, [img_shape], biases=yolo.yolov2_bias_coco)

### Draw the boxes on image.
Now we must print the results, and we can draw the detections on the original image for reference.

In [None]:
# Create a list of class labels given a file containing the coco dataset classes
with open(VAI_ALVEO_ROOT+"/apps/yolo/coco.names") as f:      
    namez = f.readlines()      
    names = [x.strip() for x in namez]

# Given the detection results above, lets draw our findings on the original image, and display it
bboxes = boxes[0]
result_image = "work/result.jpg"
colors = generate_colors(config["classes"])
draw_boxes(config["images"][0], bboxes, names, colors, result_image, VAI_ALVEO_ROOT+"/apps/yolo/font", False)
img = cv2.imread(result_image)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
plt.rcParams['figure.figsize'] = [24.0,16.0]
plt.imshow(img)
plt.title("Output Image w/ Bounding Boxes Drawn")
plt.show()

In [None]:
# Lets print the detections our model made
for j in range(len(bboxes)):
    print("Obj %d: %s\ class id = %d" % (j, names[bboxes[j]['classid']], bboxes[j]['classid']))
    print("\t score = %f" % (bboxes[j]['prob']))
    print("\t (xlo, ylo) = (%d, %d)" % (bboxes[j]['ll']['x'], bboxes[j]['ll']['y']))
    print("\t (xhi, yhi) = (%d, %d)" % (bboxes[j]['ur']['x'], bboxes[j]['ur']['y']))
    