

# Intel® Distribution of OpenVINO™ toolkit hetero plugin


    
This example shows how to use hetero plugin to define preferences to run different network layers on different hardware types. Here, we will use the command line option to define hetero plugin usage where the layer distribution is already defined. However, hetero plugin also allows developers to customize distribution of layers execution on different hardware by specifying it in the application code.

## Car detection tutorial example

### 1. Importing dependencies, Setting the Environment variables and Generate the IR files

In [7]:
from IPython.display import HTML
import os
import time
import sys                                     
from pathlib import Path
sys.path.insert(0, str(Path().resolve().parent.parent))
from demoTools.demoutils import *

In [8]:
!/opt/intel/openvino/bin/setupvars.sh

[setupvars.sh] OpenVINO environment initialized


In [9]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name mobilenet-ssd -o models


###############|| Downloading topologies ||###############

... 100%, 28 KB, 60279 KB/s, 0 seconds passed

... 100%, 22605 KB, 19274 KB/s, 1 seconds passed


###############|| Post processing ||###############



In [10]:
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel -o models/object_detection/common/mobilenet-ssd/FP32/ --scale 256 --mean_values [127,127,127]

Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/home/u28225/Reference-samples/18oct/smart-video-workshop/hardware-heterogeneity/devcloud/models/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel
	- Path for generated IR: 	/home/u28225/Reference-samples/18oct/smart-video-workshop/hardware-heterogeneity/devcloud/models/object_detection/common/mobilenet-ssd/FP32/
	- IR output name: 	mobilenet-ssd
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	Not specified, inherited from the model
	- Output layers: 	Not specified, inherited from the model
	- Input shapes: 	Not specified, inherited from the model
	- Mean values: 	[127,127,127]
	- Scale values: 	Not specified
	- Scale factor: 	256.0
	- Precision of IR: 	FP32
	- Enable fusing: 	True
	- Enable grouped convolutions fusing: 	True
	- Move mean values to preprocess section: 	False
	- Reverse input channels: 	False
Caffe specific parameters:
	- Enable resnet 



### 2. Run the car detection tutorial with hetero plugin


#### Create Job Script 

We will run the workload on several DevCloud's edge compute nodes. We will send work to the edge compute nodes by submitting jobs into a queue. For each job, we will specify the type of the edge compute server that must be allocated for the job.

To pass the specific variables to the Python code, we will use following arguments:

* `-f`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;location of the optimized models XML
* `-i`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;location of the input video
* `-r`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;output directory
* `-d`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;hardware device type (CPU, GPU, MYRIAD, HDDL or HETERO:FPGA,CPU)
* `-n`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;number of infer requests

The job file will be executed directly on the edge compute node.

In [11]:
%%writefile object_detection_job.sh

# The default path for the job is your home directory, so we change directory to where the files are.
cd $PBS_O_WORKDIR
OUTPUT_FILE=$1
DEVICE=$2
FP_MODEL=$3
# Object detection script writes output to a file inside a directory. We make sure that this directory exists.
#  The output directory is the first argument of the bash script
mkdir -p $OUTPUT_FILE
ROIFILE=$OUTPUT_FILE/ROIs.txt
OVIDEO=$OUTPUT_FILE/output.mp4

# Running the object detection code
SAMPLEPATH=$PBS_O_WORKDIR
./tutorial1 -i cars_1900.mp4 \
            -m models/object_detection/common/mobilenet-ssd/$FP_MODEL/mobilenet-ssd.xml \
            -d $DEVICE \
            -o $OUTPUT_FILE\
            -fr 3000 

# Converting the text output to a video
./ROI_writer -i cars_1900.mp4 \
             -o $OUTPUT_FILE \
             -ROIfile $ROIFILE \
             -l pascal_voc_classes.txt \
             -r 2.0 # output in half res

Overwriting object_detection_job.sh


#### a) Prioritizing running on GPU first.

In [12]:
os.environ["VIDEO"] = "cars_1900.mp4"

In [13]:
print("Submitting a job to an edge compute node with an Intel Core CPU...")
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:tank-870:i5-6500te -F "results/GPU HETERO:GPU,CPU FP32" -N obj_det_core
print(job_id_core[0])
#Progress indicators
if job_id_core:
    progressIndicator('results/GPU', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/GPU', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

Submitting a job to an edge compute node with an Intel Core CPU...
62871.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…


    
#### b) Prioritizing running on CPU first.

In [14]:
print("Submitting a job to an edge compute node with an Intel Core CPU...")
#Submit job to the queue
job_id_core = !qsub object_detection_job.sh -l nodes=1:tank-870:i5-6500te -F "results/Core HETERO:CPU,GPU FP32" -N obj_det_core
print(job_id_core[0])
#Progress indicators
if job_id_core:
    progressIndicator('results/Core', 'i_progress_'+job_id_core[0]+'.txt', "Inference", 0, 100)
    progressIndicator('results/Core', 'v_progress_'+job_id_core[0]+'.txt', "Rendering", 0, 100)

Submitting a job to an edge compute node with an Intel Core CPU...
62873.c003


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…


Observe the performance time required to process each frame by Inference Engine. For this particular example, inference ran faster when prioritized for CPU as oppose to when GPU was the first priority.

 
## Inference Engine classification sample


Intel® Distribution of OpenVINO™ toolkit install folder (/opt/intel/openvino) includes various samples for developers to understand how Inference Engine APIs can be used. These samples have -pc flag implmented which shows per topology layer performance report. This will allow to see which layers are running on which hardware. We will run a very basic classification sample as an example in this section. We will provide car image as input to the classification sample. The output will be object labels with confidence numbers.

### 1. First, get the classification model and convert that to IR using Model Optimizer

For this example, we will use squeezenet model downloaded with the model downloader script while setting up the OS for the workshop.

In [15]:
! /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name squeezenet1.1 -o models


###############|| Downloading topologies ||###############

... 100%, 9 KB, 25256 KB/s, 0 seconds passed

... 100%, 4834 KB, 714 KB/s, 6 seconds passed


###############|| Post processing ||###############



In [16]:
! /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/classification/squeezenet/1.1/caffe/squeezenet1.1.caffemodel -o models/squeezenet/FP32/

Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/home/u28225/Reference-samples/18oct/smart-video-workshop/hardware-heterogeneity/devcloud/models/classification/squeezenet/1.1/caffe/squeezenet1.1.caffemodel
	- Path for generated IR: 	/home/u28225/Reference-samples/18oct/smart-video-workshop/hardware-heterogeneity/devcloud/models/squeezenet/FP32/
	- IR output name: 	squeezenet1.1
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	Not specified, inherited from the model
	- Output layers: 	Not specified, inherited from the model
	- Input shapes: 	Not specified, inherited from the model
	- Mean values: 	Not specified
	- Scale values: 	Not specified
	- Scale factor: 	Not specified
	- Precision of IR: 	FP32
	- Enable fusing: 	True
	- Enable grouped convolutions fusing: 	True
	- Move mean values to preprocess section: 	False
	- Reverse input channels: 	False
Caffe specific parameters:
	- Enable resnet optimization: 	True
	- Path


To display labels after classifictaion, you will need a labels file for the SqueezeNet* model. Get the available labels file from demo directory to your working directory.

In [17]:
!cp /opt/intel/openvino/deployment_tools/demo/squeezenet1.1.labels models/squeezenet/FP32/

In [25]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name mobilenet-ssd -o models


###############|| Downloading topologies ||###############

... 100%, 28 KB, 63151 KB/s, 0 seconds passed

... 100%, 22605 KB, 22484 KB/s, 1 seconds passed


###############|| Post processing ||###############



In [26]:
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel -o models/object_detection/common/mobilenet-ssd/FP32/ --scale 256 --mean_values [127,127,127]

Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/home/u28225/Reference-samples/18oct/smart-video-workshop/hardware-heterogeneity/devcloud/models/object_detection/common/mobilenet-ssd/caffe/mobilenet-ssd.caffemodel
	- Path for generated IR: 	/home/u28225/Reference-samples/18oct/smart-video-workshop/hardware-heterogeneity/devcloud/models/object_detection/common/mobilenet-ssd/FP32/
	- IR output name: 	mobilenet-ssd
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	Not specified, inherited from the model
	- Output layers: 	Not specified, inherited from the model
	- Input shapes: 	Not specified, inherited from the model
	- Mean values: 	[127,127,127]
	- Scale values: 	Not specified
	- Scale factor: 	256.0
	- Precision of IR: 	FP32
	- Enable fusing: 	True
	- Enable grouped convolutions fusing: 	True
	- Move mean values to preprocess section: 	False
	- Reverse input channels: 	False
Caffe specific parameters:
	- Enable resnet 

We will us the [car_1.bmp](car_1.bmp) image to run our classification job as described in the next steps. 


    
### 2. Run classification sample with hetero plugin, prioritizing running on GPU first.

In [35]:
%%writefile classification_job.sh
ME=`basename $0`

DEVICE=$2

# Object detection script writes output to a file inside a directory. We make sure that this directory exists.
# The output directory is the first argument of the bash script
while getopts 'd:f:i:r:n:?' OPTION; do
    case "$OPTION" in
    d)
        DEVICE=$OPTARG
        echo "$ME is using device $OPTARG"
      ;;

    f)
        FP_MODEL=$OPTARG
        echo "$ME is using floating point model $OPTARG"
      ;;

    i)
        INPUT_FILE=$OPTARG
        echo "$ME is using input file $OPTARG"
      ;;
    r)
        RESULTS_BASE=$OPTARG
        echo "$ME is using results base $OPTARG"
      ;;
    n)
        NUM_INFER_REQS=$OPTARG
        echo "$ME is running $OPTARG inference requests"
      ;;
    esac  
done

# The default path for the job is your home directory, so we change directory to where the files are.
cd $PBS_O_WORKDIR

#NN_MODEL="mobilenet-ssd.xml"
RESULTS_PATH="${RESULTS_BASE}"
#mkdir -p $RESULTS_PATH
echo "$ME is using results path $RESULTS_PATH"

if [ "$DEVICE" == "HETERO:FPGA,CPU" ]; then
    # Environment variables and compilation for edge compute nodes with FPGAs
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/altera/aocl-pro-rte/aclrte-linux64/
    # Environment variables and compilation for edge compute nodes with FPGAs
    source /opt/fpga_support_files/setup_env.sh
    aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_bitstreams/2019R1_PL1_FP11_MobileNet_Clamp.aocx
fi
    
# Running the object detection code
#SAMPLEPATH=$PBS_O_WORKDIR
./object_detection_demo_ssd_async        -i cars_1900.mp4 \
                                            -m models/object_detection/common/mobilenet-ssd/FP32/mobilenet-ssd.xml \
                                            -d $DEVICE   \
                                            -pc

                                            

Overwriting classification_job.sh


In [36]:
#Submit job to the queue
job_id_gpu = !qsub classification_job.sh -l nodes=1:idc001skl:intel-hd-530 -F "results/GPU HETERO:GPU,CPU FP32" -N obj_det_gpu 
print(job_id_gpu[0]) 
while True:
    var=job_id_gpu[0].split(".")
    file="obj_det_gpu.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

62896.c003

########################################################################
#      Date:           Wed Oct 23 00:07:05 PDT 2019
#    Job ID:           62896.c003
#      User:           u28225
# Resources:           neednodes=1:idc001skl:intel-hd-530,nodes=1:idc001skl:intel-hd-530,walltime=01:00:00
########################################################################

[setupvars.sh] OpenVINO environment initialized
62896.c003.SC is using results path 
InferenceEngine: 
	API version ............ 1.6
	Build .................. custom_releases/2019/R1_c9b66a26e4d65bb986bb740e73f58c6e9e84c7c2
[ INFO ] Parsing input parameters
[ INFO ] Reading input
[ INFO ] Loading plugin

	API version ............ 1.6
	Build .................. heteroPlugin
	Description ....... heteroPlugin
[ INFO ] Loading network files

########################################################################
# End of output for job 62896.c003
# Date: Wed Oct 23 00:07:12 PDT 2019
################################



After the execution, You should get the performance counters output as in the screenshot below:-


<img src='gpu.png'>
    


    
### 3. Now, run with CPU first

In [20]:
#Submit job to the queue
job_id_cpu = !qsub classification_job.sh -l nodes=1:idc001skl:intel-hd-530 -F "results/GPU HETERO:CPU,GPU FP32" -N obj_det_cpu
print(job_id_cpu[0]) 
while True:
    var=job_id_cpu[0].split(".")
    file="obj_det_cpu.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break


62876.c003

########################################################################
#      Date:           Tue Oct 22 22:54:33 PDT 2019
#    Job ID:           62876.c003
#      User:           u28225
# Resources:           neednodes=1:idc001skl:intel-hd-530,nodes=1:idc001skl:intel-hd-530,walltime=01:00:00
########################################################################

[setupvars.sh] OpenVINO environment initialized
62876.c003.SC is using results path 
InferenceEngine: 
	API version ............ 1.6
	Build .................. custom_releases/2019/R1_c9b66a26e4d65bb986bb740e73f58c6e9e84c7c2
[ INFO ] Parsing input parameters
[ INFO ] Reading input
[ INFO ] Loading plugin

	API version ............ 1.6
	Build .................. heteroPlugin
	Description ....... heteroPlugin
[ INFO ] Loading network files
[ INFO ] Batch size is forced to  1.
[ INFO ] Checking that the inputs are as the demo expects
[ INFO ] Checking that the outputs are as the demo expects

#######################



After the execution, You should get the performance counters output as in the screenshot below:-


<img src='cpu.png'>