# OpenCV Overlay - Build Your Own (Tracking Example)

In this notebook, you will be given an overlay which contains a number of accelerator cores and the goal is to leverage those cores to accelerate a custom design of your own. This overlay contains the following accelerator cores:
* Gaussian Blur (2D filter)
* Dilate

There are many creative ways to utilize these accelerator blocks in your custom vision processing design. The goal is to develop an algorithm, profile the design and see if the available overlay blocks can help speed up the pain points in the design. 
To provide a different real example of this process, this notebook uses a tracking design developed by Adam Taylor an posted on http://www.hackster.io. 

## Program overlay

Here we program the overlay on the FPGA, load the associated overlay library and load the PYNQ xlnk memory manager library.

In [None]:
import cv2 #NOTE: This needs to be loaded first

# Load filter2D + dilate overlay
from pynq import Overlay
bs = Overlay("/usr/local/lib/python3.6/dist-packages/pynq_cv/overlays/xv2Filter2DDilateAbsdiff.bit")
bs.download()
import pynq_cv.overlays.xv2Filter2DDilateAbsdiff as xv2

# Load xlnk memory mangager
from pynq import Xlnk
Xlnk.set_allocator_library('/usr/local/lib/python3.6/dist-packages/pynq_cv/overlays/xv2Filter2DDilateAbsdiff.so')
mem_manager = Xlnk()

## Setup and configure USB camera 

We use OpenCV (cv2) for capturing frames from a USB camera and to process those image frames. Here, we start by setting up the interface to the USB camera and configuring its resolution (1080p).

In [None]:
import cv2

camera = cv2.VideoCapture(0)

width = 1920
height = 1080
camera.set(cv2.CAP_PROP_FRAME_WIDTH,width)
camera.set(cv2.CAP_PROP_FRAME_HEIGHT,height)

Set up IPython based imshow call which encode OpenCV image data to jpeg before displaying it in the notebook. Other methods of displaying image data would perform similar conversions as well.

In [None]:
import IPython

def imshow(img):
    returnValue, buffer = cv2.imencode('.jpg',img)
    IPython.display.display(IPython.display.Image(data=buffer.tobytes()))

def imwrite(img, name):
    returnValue, buffer = cv2.imencode('.jpg',img)
    cv2.imwrite(name,buffer)

## Run SW Algorithm

In [None]:
%%prun -s tottime -q -l 20 -T prunSW
#%%prun -s cumulative -q -l 20 -T prunSW
import numpy as np
import time

# Read reference frame (assumes camera and scene remains stationary)
ret, frame_in = camera.read()
if (not ret):
    # Release the Video Device if ret is false
    camera.release()
    # Message to be displayed after releasing the device
    print("Release camera resource")
else:
    img_gray       = cv2.cvtColor(frame_in, cv2.COLOR_BGR2GRAY)
    reference_blur = cv2.GaussianBlur(img_gray, (5,5), 0)

num_frames = 15

start = time.time()
for _ in range(num_frames):
    ret, frame_in = camera.read()
    if (not ret):
        # Release the Video Device if ret is false
        camera.release()
        # Message to be displayed after releasing the device
        print("Release camera resource")
        break
    else:
        img_gray          = cv2.cvtColor(frame_in, cv2.COLOR_BGR2GRAY)
        blur              = cv2.GaussianBlur(img_gray, (5,5), 0)
        difference        = cv2.absdiff(reference_blur, blur)
        _, threshold      = cv2.threshold(difference, 25, 255, cv2.THRESH_BINARY)[:]
        dilated           = cv2.dilate(threshold, None, iterations=2)
        _, contours, heir = cv2.findContours(dilated.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        for i in contours:
            #if cv2.contourArea(i) < 2000:
            if cv2.contourArea(i) < 3000:
                continue
            (x,y,w,h) = cv2.boundingRect(i)
            cv2.rectangle(frame_in, (x,y), (x+w,y+h),(0,0,255),2)
            #outframe[0:height,0:width:] = frame_in[0:height,0:width,:]
    imshow(frame_in)
    IPython.display.clear_output(wait=True)    
time_sw_total = time.time() - start
print("frames per second; " + str(num_frames/time_sw_total))

In [None]:
print(open('prunSW','r').read())
res = !cat prunSW | grep GaussianBlur | awk '{{print $$2}}'
tottime_sw_blur = float(res[0])
res = !cat prunSW | grep dilate | awk '{{print $$2}}'
tottime_sw_dilate = float(res[0])
res = !cat prunSW | grep absdiff | awk '{{print $$2}}'
tottime_sw_absdiff = float(res[0])

## Run HW Acceleration

In [None]:
%%prun -s tottime -q -l 20 -T prunHW
#%%prun -s cumulative -q -l 20 -T prunSW
import numpy as np
import time

kernelG    = np.array([[0.0625,0.125,0.0625],[0.125,0.25,0.125],[0.0625,0.125,0.0625]],np.float32) #gaussian blur  
kernelD    = np.ones((3,3),np.uint8)
kernelVoid = np.zeros(0)
reference_blur = np.ones((height,width),np.uint8)
threshold       = np.ones((height,width),np.uint8)
dilated    = np.ones((height,width),np.uint8)
frame_out  = np.ones((height,width),np.uint8)
xFin       = mem_manager.cma_array((height,width),np.uint8)
xFblur     = mem_manager.cma_array((height,width),np.uint8)
xFdiff     = mem_manager.cma_array((height,width),np.uint8)
xFout      = mem_manager.cma_array((height,width),np.uint8)

# Read reference frame (assumes camera and scene remains stationary)
ret, frame_in = camera.read()
if (not ret):
    # Release the Video Device if ret is false
    camera.release()
    # Message to be displayed after releasing the device
    print("Release camera resource")
else:
    img_gray       = cv2.cvtColor(frame_in, cv2.COLOR_BGR2GRAY)
    reference_blur = cv2.GaussianBlur(img_gray, (5,5), 0)


num_frames = 15

start = time.time()
for _ in range(num_frames):
    ret, frame_in = camera.read()
    if (not ret):
        # Release the Video Device if ret is false
        camera.release()
        # Message to be displayed after releasing the device
        print("Release camera resource")
        break
    else:
        frame_in_gray     = cv2.cvtColor(frame_in, cv2.COLOR_BGR2GRAY)
        xv2.filter2D(frame_in_gray, -1, kernelG, xFblur, borderType=cv2.BORDER_CONSTANT) #Gaussian blur
        xv2.absdiff(reference_blur, xFblur,xFdiff)
        cv2.threshold(xFdiff, 25, 255, cv2.THRESH_BINARY,threshold)
        xv2.dilate(threshold, kernelVoid, xFout, borderType=cv2.BORDER_CONSTANT)
        dilated[:]        = xFout[:]
        _, contours, heir = cv2.findContours(dilated.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
        for i in contours:
            if cv2.contourArea(i) < 3000:
                continue
            (x,y,w,h) = cv2.boundingRect(i)
            cv2.rectangle(frame_in, (x,y), (x+w,y+h),(0,0,255),2)
    imshow(frame_in)
    IPython.display.clear_output(wait=True)    
time_hw_total = time.time() - start
print("frames per second; " + str(num_frames/time_hw_total))

In [None]:
print(open('prunHW','r').read())
res = !cat prunHW | grep filter2D | awk '{{print $$2}}'
tottime_hw_blur = float(res[0])
res = !cat prunHW | grep dilate | awk '{{print $$2}}'
tottime_hw_dilate = float(res[0])
res = !cat prunHW | grep absdiff | awk '{{print $$2}}'
tottime_hw_absdiff = float(res[0])

## Plot performance
In addition to having easy access to OpenCV functions, we can access functions from pyPlot for plotting results in graphs and charts. Here, we take the recorded time data and plot out the processing times in a bar chart along with computed FPS of each function. Pay particular attention to the actual performance of each function and note the effect when placing two functions back-to-back in this example.

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt

TIME_SW   = [t*1000/num_frames for (t) in (time_sw_total, tottime_sw_dilate, tottime_sw_blur, tottime_sw_absdiff)]
FPS_SW    = [1000/t for (t) in (TIME_SW)]
TIME_HW   = [t*1000/num_frames for (t) in (time_hw_total, tottime_hw_dilate, tottime_hw_blur, tottime_hw_absdiff)]
FPS_HW    = [1000/t for (t) in (TIME_HW)]
LABELS    = ['Total','Dilate','Gaussian Blur(Filter2D)', 'Absdiff']

f, (ax1, ax2) = plt.subplots(2, 1, sharex='col', sharey='row', figsize=(7,4))
x_pos = np.arange(len(LABELS))
plt.yticks(x_pos, LABELS)

ax1.barh(x_pos, FPS_SW, height=0.6, color='g', zorder=4)
ax1.invert_yaxis()
ax1.set_xlabel("Frames per second")
ax1.set_ylabel("Kernel (SW)")
ax1.tick_params(axis='x',labelleft=1)
ax1.grid(zorder=0)

ax2.barh(x_pos, FPS_HW, height=0.6, color='b', zorder=4)
ax2.invert_yaxis()
ax2.set_xlabel("Frames per second")
ax2.set_ylabel("Kernel (HW)")
ax2.grid(zorder=0)

plt.show()

## Release up USB camera

NOTE: This is needed to close the camera between subsequent runs. If the camera is unable to read a frame, be sure to call camera.release() and then try opening the VideoCapture again.

In [None]:
camera.release()

<font color=red size=4>IMPORTANT NOTE</font>: Be sure to run the cell below, shutting down the notebook, before starting a new one. The notebook interface shows "No Kernel", the cell below will incorrectly show a running status [ * ]. You can ignore this an safely close the tab of the notebook.

In [None]:
%%javascript
Jupyter.notebook.session.delete();