# Exercise: Heterogenous Plugin and the DevCloud

In this exercise, we will load a model using the hetero plugin on to the FPGA and CPU, and the GPU and CPU. We will then perform an inference on it and compare the time it takes to do the same for each device pair.

<span class="graffiti-highlight graffiti-id_z8bfs11-id_d97ox8f"><i></i><button>Graffiti Sample Button (edit me)</button></span>



#### Set up paths so we can run Dev Cloud utilities
You *must* run this every time they enter a Workspace session.

In [None]:
%env PATH=/opt/conda/bin:/opt/spark-2.4.3-bin-hadoop2.7/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/intel_devcloud_support
import os
import sys
sys.path.insert(0, os.path.abspath('/opt/intel_devcloud_support'))
sys.path.insert(0, os.path.abspath('/opt/intel'))

## The model

We will be using the `vehicle-license-plate-detection-barrier-0106` model for this exercise. Remember that to run a model using the HETERO Plugin, we need to use FP16 as the model precision.

The model is present in the `/data/models/intel` folder.

# Step 1: Creating a Python Script

The first step is to create a python script that you can use to load the model and perform an inference. I have used the `writefile` magic to create a python file called `inference_on_device.py`. You will need to complete this file.

In [None]:
%%writefile inference_on_device.py

import time
import numpy as np
import cv2
from openvino.inference_engine import IENetwork
from openvino.inference_engine import IECore
import argparse

def main(args):
    model=args.model_path
    model_weights=model+'.bin'
    model_structure=model+'.xml'
    
    start=time.time()
    
    # TODO: Load the model on VPU
    
    print(f"Time taken to load model = {time.time()-start} seconds")
    
    # Reading and Preprocessing Image
    input_img=cv2.imread('car.png')
    input_img=cv2.resize(input_img, (300,300), interpolation = cv2.INTER_AREA)
    input_img=np.moveaxis(input_img, -1, 0)

    # TODO: Prepare the model for inference (create input dict etc.)
    
    start=time.time()
    for _ in range(100):
        # TODO: Run Inference in a Loop
    
    print(f"Time Taken to run 100 Inference is = {time.time()-start} seconds")

if __name__=='__main__':
    parser=argparse.ArgumentParser()
    parser.add_argument('--model_path', required=True)
    parser.add_argument('--device', default=None)
    
    args=parser.parse_args() 
    main(args)

<span class="graffiti-highlight graffiti-id_1rnmf5g-id_nmeqj1a"><i></i><button>Hide Solution</button></span>

In [None]:
%%writefile inference_on_device.py

import time
import cv2
import numpy as np
from openvino.inference_engine import IENetwork
from openvino.inference_engine import IECore
import argparse

def main(args):
    model=args.model_path
    model_weights=model+'.bin'
    model_structure=model+'.xml'
    
    start=time.time()
    model=IENetwork(model_structure, model_weights)

    core = IECore()
    net = core.load_network(network=model, device_name=args.device, num_requests=1)
    load_time=time.time()-start
    print(f"Time taken to load model = {load_time} seconds")
    
    # Get the name of the input node
    input_name=next(iter(model.inputs))

    # Reading and Preprocessing Image
    input_img=cv2.imread('/data/resources/car.png')
    input_img=cv2.resize(input_img, (300,300), interpolation = cv2.INTER_AREA)
    input_img=np.moveaxis(input_img, -1, 0)

    # Running Inference in a loop on the same image
    input_dict={input_name:input_img}

    start=time.time()
    for _ in range(100):
        net.infer(input_dict)
    
    inference_time=time.time()-start
    fps=100/inference_time
    
    print(f"Time Taken to run 100 Inference is = {inference_time} seconds")
    
    with open(f"/output/{args.path}.txt", "w") as f:
        f.write(str(load_time)+'\n')
        f.write(str(inference_time)+'\n')
        f.write(str(fps)+'\n')

if __name__=='__main__':
    parser=argparse.ArgumentParser()
    parser.add_argument('--model_path', required=True)
    parser.add_argument('--device', default=None)
    parser.add_argument('--path', default=None)
    
    args=parser.parse_args() 
    main(args)


## Step 2: Creating a job submission script

To submit a job to the devcloud, we need to create a script. I have named the script as `inference_hetero_model_job.sh`.

Can you write a script that will take the model path and device as a command line argument and then call the python file you created in the previous cell with the path to the model?

In [None]:
%%writefile inference_model_job.sh

#TODO: Create job submission script

<span class="graffiti-highlight graffiti-id_f1nbmn9-id_ia7yjlq"><i></i><button>Hide Solution</button></span>

In [19]:
%%writefile inference_model_job.sh
#!/bin/bash

exec 1>/output/stdout.log 2>/output/stderr.log

mkdir -p /output

DEVICE=$1
MODELPATH=$2


source /opt/intel/init_openvino.sh
aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/2019R4_PL1_FP16_MobileNet_Clamp.aocx


# Run the load model python script
python3 inference_on_device.py  --model_path ${MODELPATH} --device ${DEVICE}

cd /output

tar zcvf output.tgz *

Overwriting inference_model_job.sh


## Step 3a: Running on the FPGA and CPU

In the cell below, can you write the qsub command that will submit your job to the CPU?

In [None]:
fpga_cpu_job = # TODO: Write qsub command
print(fpga_cpu_job[0])

<span class="graffiti-highlight graffiti-id_cvp3lyi-id_chmeh50"><i></i><button>Hide Solution</button></span>

In [20]:
fpga_cpu_job = !qsub inference_model_job.sh -d . -l nodes=1:tank-870:i5-6500te:iei-mustang-f100-a10 -F "HETERO:FPGA,CPU /data/models/intel/vehicle-license-plate-detection-barrier-0106/FP16/vehicle-license-plate-detection-barrier-0106 fpga_cpu_stats" -N store_core 
print(fpga_cpu_job[0])

S4JnyfRTRLgnuYAELRGDGPPyWcBeLu5K


## Step 3b: Running on CPU and GPU

In [None]:
fpga_gpu_job = # TODO: Write qsub command
print(fpga_gpu_job[0])

<span class="graffiti-highlight graffiti-id_7k34s6u-id_022l4bj"><i></i><button>Hide Solution</button></span>

In [21]:
cpu_gpu_job = !qsub inference_model_job.sh -d . -l nodes=tank-870:i5-6500te:intel-hd-530 -F "HETERO:CPU,GPU /data/models/intel/vehicle-license-plate-detection-barrier-0106/FP16/vehicle-license-plate-detection-barrier-0106 cpu_gpu_stats" -N store_core 
print(cpu_gpu_job[0])

Eh6UyjrAVHGzoOtXzCduZQ2Xew2GnSJf


## Step 3c: Running on FPGA, GPU and CPU

In [None]:
fpga_gpu_cpu_job = # TODO: Write qsub command
print(fpga_gpu_cpu_job[0])

<span class="graffiti-highlight graffiti-id_mxh5ozv-id_qicoukm"><i></i><button>Hide Solution</button></span>

In [22]:
fpga_gpu_cpu_job = !qsub inference_model_job.sh -d . -l nodes=tank-870:i5-6500te:intel-hd-530:iei-mustang-f100-a10 -F "HETERO:FPGA,GPU,CPU /data/models/intel/vehicle-license-plate-detection-barrier-0106/FP16/vehicle-license-plate-detection-barrier-0106 fpga_gpu_cpu_stats" -N store_core 
print(fpga_gpu_cpu_job[0])

0nJKbX8NJKvekIoxvVQ77gTk9bt2ldBk


## Step 4: Getting the Live Stat Values

By running the below command, we can see the live status of the commands.

<span class="graffiti-highlight graffiti-id_clj7fxa-id_d3gqjz0"><i></i><button>Graffiti Sample Button (edit me)</button></span>

In [None]:
import liveQStat
liveQStat.liveQStat()

## Step 5a: Get the results for FPGA and CPU

Running the cell below will get the output files from our job

<span class="graffiti-highlight graffiti-id_cygruth-id_6nd1x96"><i></i><button>Graffiti Sample Button (edit me)</button></span>

In [23]:
import get_results

get_results.getResults(fpga_cpu_job[0], get_stderr=True, filename="output.tgz", blocking=True)

getResults() is blocking until results of the job (id:S4JnyfRTRLgnuYAELRGDGPPyWcBeLu5K) are ready.
Please wait................................................Success!
output.tgz was downloaded in the same folder as this notebook.


In [24]:
!tar zxf output.tgz

In [25]:
!cat stdout.log

INTELFPGAOCLSDKROOT is set to /opt/altera/aocl-pro-rte/aclrte-linux64. Using that.

aoc was not found, but aocl was found. Assuming only RTE is installed.

AOCL_BOARD_PACKAGE_ROOT is set to /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1. Using that.
Adding /opt/altera/aocl-pro-rte/aclrte-linux64/bin to PATH
Adding /opt/altera/aocl-pro-rte/aclrte-linux64/host/linux64/lib to LD_LIBRARY_PATH
Adding /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1/linux64/lib to LD_LIBRARY_PATH
[setupvars.sh] OpenVINO environment initialized
aocl program: Running program from /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1/linux64/libexec
Programming device: a10gx_2ddr : Intel Vision Accelerator Design with Intel Arria 10 FPGA (acla10_1150_sg10)
Program succeed. 
Time taken to load model = 4.475625038146973 seconds
Time Taken to run 100 Inference is = 0.8625667095184326 seconds
None.txt
stderr.

## Step 5b: Get the result for CPU and GPU

In [26]:
import get_results

get_results.getResults(cpu_gpu_job[0], filename="output.tgz", blocking=True)

getResults() is blocking until results of the job (id:Eh6UyjrAVHGzoOtXzCduZQ2Xew2GnSJf) are ready.
Please wait...Success!
output.tgz was downloaded in the same folder as this notebook.


In [27]:
!tar zxf output.tgz

In [32]:
!cat stdout.log

INTELFPGAOCLSDKROOT is set to /opt/altera/aocl-pro-rte/aclrte-linux64. Using that.

aoc was not found, but aocl was found. Assuming only RTE is installed.

AOCL_BOARD_PACKAGE_ROOT is set to /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1. Using that.
Adding /opt/altera/aocl-pro-rte/aclrte-linux64/bin to PATH
Adding /opt/altera/aocl-pro-rte/aclrte-linux64/host/linux64/lib to LD_LIBRARY_PATH
Adding /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1/linux64/lib to LD_LIBRARY_PATH
[setupvars.sh] OpenVINO environment initialized
aocl program: Running program from /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1/linux64/libexec
Programming device: a10gx_2ddr : Intel Vision Accelerator Design with Intel Arria 10 FPGA (acla10_1150_sg10)
Program succeed. 
DetectionOutput_Reshape_priors_/Output_0/Data__const is CPU
DetectionOutput_Reshape_conf_ is CPU
SSD/concat_reshape_softmax/mbox_conf_

## Step 5c: Get the result for FPGA, GPU and CPU

In [29]:
import get_results

get_results.getResults(fpga_gpu_cpu_job[0], filename="output.tgz", blocking=True)

getResults() is blocking until results of the job (id:0nJKbX8NJKvekIoxvVQ77gTk9bt2ldBk) are ready.
Please wait.....Success!
output.tgz was downloaded in the same folder as this notebook.


In [30]:
!tar zxf output.tgz

In [31]:
!cat stdout.log

INTELFPGAOCLSDKROOT is set to /opt/altera/aocl-pro-rte/aclrte-linux64. Using that.

aoc was not found, but aocl was found. Assuming only RTE is installed.

AOCL_BOARD_PACKAGE_ROOT is set to /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1. Using that.
Adding /opt/altera/aocl-pro-rte/aclrte-linux64/bin to PATH
Adding /opt/altera/aocl-pro-rte/aclrte-linux64/host/linux64/lib to LD_LIBRARY_PATH
Adding /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1/linux64/lib to LD_LIBRARY_PATH
[setupvars.sh] OpenVINO environment initialized
aocl program: Running program from /opt/intel/openvino/bitstreams/a10_vision_design_sg1_bitstreams/BSP/a10_1150_sg1/linux64/libexec
Programming device: a10gx_2ddr : Intel Vision Accelerator Design with Intel Arria 10 FPGA (acla10_1150_sg10)
Program succeed. 
DetectionOutput_Reshape_priors_/Output_0/Data__const is CPU
DetectionOutput_Reshape_conf_ is CPU
SSD/concat_reshape_softmax/mbox_conf_

## Step 6: View the Outputs

Can you plot the load time, inference time and the frames per second in the cell below?

In [None]:
import matplotlib.pyplot as plt

#File Paths to stats files
paths=['gpu_stats.txt', 'cpu_stats.txt']

# TODO: Plot the different stats

<span class="graffiti-highlight graffiti-id_m9kxw9k-id_4h5tl2h"><i></i><button>Hide Solution</button></span>

In [33]:
import matplotlib.pyplot as plt

def plot(labels, data, title, label):
    fig = plt.figure()
    ax = fig.add_axes([0,0,1,1])
    ax.set_ylabel(label)
    ax.set_title(title)
    ax.bar(labels, data)
    
def read_files(paths, labels):
    load_time=[]
    inference_time=[]
    fps=[]
    
    for path in paths:
        if os.path.isfile(path):
            f=open(path, 'r')
            load_time.append(float(f.readline()))
            inference_time.append(float(f.readline()))
            fps.append(float(f.readline()))

    plot(labels, load_time, 'Model Load Time', 'seconds')
    plot(labels, inference_time, 'Inference Time', 'seconds')
    plot(labels, fps, 'Frames per Second', 'Frames')

paths=['fpga_cpu_stats.txt', 'cpu_gpu_stats.txt', 'fpga_gpu_cpu_stats.txt']
read_files(paths, ['FPGA/CPU', 'CPU/GPU', 'FPGA/GPU/CPU'])

ValueError: shape mismatch: objects cannot be broadcast to a single shape