

# Intel® Distribution of OpenVINO™ toolkit hetero plugin


    
This example shows how to use hetero plugin to define preferences to run different network layers on different hardware types. Here, we will use the command line option to define hetero plugin usage where the layer distribution is already defined. However, hetero plugin also allows developers to customize distribution of layers execution on different hardware by specifying it in the application code.

## Car detection tutorial example

### 1. Importing dependencies, Setting the Environment variables and Generate the IR files

In [1]:
from IPython.display import HTML
import os
import time
import sys                                     
from pathlib import Path
sys.path.insert(0, str(Path().resolve().parent.parent.parent))
from demoTools.demoutils import *

In [2]:
!/opt/intel/openvino/bin/setupvars.sh

[setupvars.sh] OpenVINO environment initialized


In [3]:
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name mobilenet-ssd  -o models
!/opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name squeezenet1.1  -o models

################|| Downloading models ||################

... 100%, 28 KB, 52579 KB/s, 0 seconds passed

... 100%, 22605 KB, 22600 KB/s, 1 seconds passed

################|| Post-processing ||################

################|| Downloading models ||################

... 100%, 9 KB, 26674 KB/s, 0 seconds passed

... 100%, 4834 KB, 27628 KB/s, 0 seconds passed

################|| Post-processing ||################



In [4]:
import os
os.environ["OMP_NUM_THREADS"] = "4" # export OMP_NUM_THREADS=4
os.environ["OPENBLAS_NUM_THREADS"] = "4" # export OPENBLAS_NUM_THREADS=4 
os.environ["MKL_NUM_THREADS"] = "6" # export MKL_NUM_THREADS=6
os.environ["VECLIB_MAXIMUM_THREADS"] = "4" # export VECLIB_MAXIMUM_THREADS=4
os.environ["NUMEXPR_NUM_THREADS"] = "6" # export NUMEXPR_NUM_THREADS=6

In [32]:
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/mobilenet-ssd/mobilenet-ssd.caffemodel -o models/mobilenet-ssd/FP32/ --scale 256 --mean_values [127,127,127]
! python3 /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/squeezenet1.1/squeezenet1.1.caffemodel -o models/squeezenet1.1/  --scale 256 --mean_values [127,127,127]

Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/home/u33131/13Nov-SVW-R3/hardware-heterogeneity/devcloud/python/models/public/mobilenet-ssd/mobilenet-ssd.caffemodel
	- Path for generated IR: 	/home/u33131/13Nov-SVW-R3/hardware-heterogeneity/devcloud/python/models/mobilenet-ssd/FP32/
	- IR output name: 	mobilenet-ssd
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	Not specified, inherited from the model
	- Output layers: 	Not specified, inherited from the model
	- Input shapes: 	Not specified, inherited from the model
	- Mean values: 	[127,127,127]
	- Scale values: 	Not specified
	- Scale factor: 	256.0
	- Precision of IR: 	FP32
	- Enable fusing: 	True
	- Enable grouped convolutions fusing: 	True
	- Move mean values to preprocess section: 	False
	- Reverse input channels: 	False
Caffe specific parameters:
	- Path to Python Caffe* parser generated from caffe.proto: 	/opt/intel/openvino/deployment_tools/model_optimizer/



### 2. Run the car detection tutorial with hetero plugin


#### Create Job Script 

We will run the workload on several DevCloud's edge compute nodes. We will send work to the edge compute nodes by submitting jobs into a queue. For each job, we will specify the type of the edge compute server that must be allocated for the job.

To pass the specific variables to the Python code, we will use following arguments:

* `-f`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;location of the optimized models XML
* `-i`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;location of the input video
* `-r`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;output directory
* `-d`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;hardware device type (CPU, GPU, MYRIAD, HDDL or HETERO:FPGA,CPU)
* `-n`&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;number of infer requests

The job file will be executed directly on the edge compute node.

In [5]:
%%writefile object_detection.sh

ME=`basename $0`

# The default path for the job is your home directory, so we change directory to where the files are.
cd $PBS_O_WORKDIR

# Object detection script writes output to a file inside a directory. We make sure that this directory exists.
# The output directory is the first argument of the bash script
while getopts 'd:f:i:r:n:?' OPTION; do
    case "$OPTION" in
    d)
        DEVICE=$OPTARG
        echo "$ME is using device $OPTARG"
      ;;

    f)
        FP_MODEL=$OPTARG
        echo "$ME is using floating point model $OPTARG"
      ;;

    i)
        INPUT_FILE=$OPTARG
        echo "$ME is using input file $OPTARG"
      ;;
    r)
        RESULTS_BASE=$OPTARG
        echo "$ME is using results base $OPTARG"
      ;;
    n)
        NUM_INFER_REQS=$OPTARG
        echo "$ME is running $OPTARG inference requests"
      ;;
    esac  
done

NN_MODEL="mobilenet-ssd.xml"
RESULTS_PATH="${RESULTS_BASE}"
mkdir -p $RESULTS_PATH
echo "$ME is using results path $RESULTS_PATH"

if [ "$DEVICE" = "HETERO:FPGA,CPU" ]; then
    # Environment variables and compilation for edge compute nodes with FPGAs
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/altera/aocl-pro-rte/aclrte-linux64/
    # Environment variables and compilation for edge compute nodes with FPGAs
    source /opt/fpga_support_files/setup_env.sh
    aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_bitstreams/2019R1_PL1_FP11_MobileNet_Clamp.aocx
fi
    
# Running the object detection code
SAMPLEPATH=$PBS_O_WORKDIR
python3 tutorial1.py                        -m models/mobilenet-ssd/${FP_MODEL}/${NN_MODEL}  \
                                            -i $INPUT_FILE \
                                            -o $RESULTS_PATH \
                                            -d $DEVICE \
                                            -nireq $NUM_INFER_REQS \
                                            -ce /opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_avx2.so

g++ -std=c++14 ROI_writer.cpp -o ROI_writer  -lopencv_core -lopencv_videoio -lopencv_imgproc -lopencv_highgui  -fopenmp -I/opt/intel/openvino/opencv/include/ -L/opt/intel/openvino/opencv/lib/
# Rendering the output video
SKIPFRAME=1
RESOLUTION=0.5
./ROI_writer $INPUT_FILE $RESULTS_PATH $SKIPFRAME $RESOLUTION

Overwriting object_detection.sh


#### a) Prioritizing running on GPU first.

In [34]:
os.environ["VIDEO"] = "cars_1900.mp4"

In [35]:
#Submit job to the queue
job_id_gpu = !qsub object_detection.sh -l nodes=1:idc001skl:intel-hd-530 -F "-r results/GPU -d HETERO:GPU,CPU -f FP32 -i $VIDEO -n 4" -N obj_det_gpu 
print(job_id_gpu[0]) 
#Progress indicators
if job_id_gpu:
    progressIndicator('results/GPU', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/GPU', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/GPU', 'post_progress.txt', "Rendering", 0, 100)
    
while True:
    var=job_id_gpu[0].split(".")
    file="obj_det_gpu.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

2915.v-qsvr-1.devcloud-edge


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…


########################################################################
#      Date:           Thu Nov 14 01:22:01 PST 2019
#    Job ID:           2915.v-qsvr-1.devcloud-edge
#      User:           u33131
# Resources:           neednodes=1:idc001skl:intel-hd-530,nodes=1:idc001skl:intel-hd-530,walltime=01:00:00
########################################################################

[setupvars.sh] OpenVINO environment initialized
2915.v-qsvr-1.devcloud-edge.SC is using results base results/GPU
2915.v-qsvr-1.devcloud-edge.SC is using device HETERO:GPU,CPU
2915.v-qsvr-1.devcloud-edge.SC is using floating point model FP32
2915.v-qsvr-1.devcloud-edge.SC is using input file cars_1900.mp4
2915.v-qsvr-1.devcloud-edge.SC is running 4 inference requests
2915.v-qsvr-1.devcloud-edge.SC is using results path results/GPU
[ INFO ] Initializing plugin for HETERO:GPU,CPU device...
[ INFO ] Loading plugins for HETERO:GPU,CPU device...
[ INFO ] Reading IR...
[ INFO ] Loading IR to th


    
#### b) Prioritizing running on CPU first.

In [36]:
#Submit job to the queue
job_id_cpu = !qsub object_detection.sh -l nodes=1:idc001skl:tank-870:i5-6500te -F "-r results/Core -d HETERO:CPU,GPU -f FP32 -i $VIDEO -n 4" -N obj_det_cpu 
print(job_id_cpu[0]) 
if job_id_cpu:
    progressIndicator('results/Core', 'pre_progress.txt', "Preprocessing", 0, 100)
    progressIndicator('results/Core', 'i_progress.txt', "Inference", 0, 100)
    progressIndicator('results/Core', 'post_progress.txt', "Rendering", 0, 100)
while True:
    var=job_id_cpu[0].split(".")
    file="obj_det_cpu.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

2917.v-qsvr-1.devcloud-edge


HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Preprocessing', style=ProgressStyle(des…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Inference', style=ProgressStyle(descrip…

HBox(children=(FloatProgress(value=0.0, bar_style='info', description='Rendering', style=ProgressStyle(descrip…


########################################################################
#      Date:           Thu Nov 14 01:24:24 PST 2019
#    Job ID:           2917.v-qsvr-1.devcloud-edge
#      User:           u33131
# Resources:           neednodes=1:idc001skl:tank-870:i5-6500te,nodes=1:idc001skl:tank-870:i5-6500te,walltime=01:00:00
########################################################################

[setupvars.sh] OpenVINO environment initialized
2917.v-qsvr-1.devcloud-edge.SC is using results base results/Core
2917.v-qsvr-1.devcloud-edge.SC is using device HETERO:CPU,GPU
2917.v-qsvr-1.devcloud-edge.SC is using floating point model FP32
2917.v-qsvr-1.devcloud-edge.SC is using input file cars_1900.mp4
2917.v-qsvr-1.devcloud-edge.SC is running 4 inference requests
2917.v-qsvr-1.devcloud-edge.SC is using results path results/Core
[ INFO ] Initializing plugin for HETERO:CPU,GPU device...
[ INFO ] Loading plugins for HETERO:CPU,GPU device...
[ INFO ] Reading IR...
[ INFO ] Lo


Observe the performance time required to process each frame by Inference Engine. For this particular example, inference ran faster when prioritized for CPU as oppose to when GPU was the first priority.

 
## Inference Engine classification sample


Intel® Distribution of OpenVINO™ toolkit install folder (/opt/intel/openvino) includes various samples for developers to understand how Inference Engine APIs can be used. These samples have -pc flag implmented which shows per topology layer performance report. This will allow to see which layers are running on which hardware. We will run a very basic classification sample as an example in this section. We will provide car image as input to the classification sample. The output will be object labels with confidence numbers.

### 1. First, get the classification model and convert that to IR using Model Optimizer

For this example, we will use squeezenet model downloaded with the model downloader script while setting up the OS for the workshop.

In [6]:
! /opt/intel/openvino/deployment_tools/tools/model_downloader/downloader.py --name squeezenet1.1 -o models

################|| Downloading models ||################

... 100%, 9 KB, 31560 KB/s, 0 seconds passed

... 100%, 4834 KB, 27619 KB/s, 0 seconds passed

################|| Post-processing ||################



In [7]:
! /opt/intel/openvino/deployment_tools/model_optimizer/mo_caffe.py --input_model models/public/squeezenet1.1/squeezenet1.1.caffemodel -o models/squeezenet/FP32/

Model Optimizer arguments:
Common parameters:
	- Path to the Input Model: 	/home/u33131/13Nov-SVW-R3/hardware-heterogeneity/devcloud/python/models/public/squeezenet1.1/squeezenet1.1.caffemodel
	- Path for generated IR: 	/home/u33131/13Nov-SVW-R3/hardware-heterogeneity/devcloud/python/models/squeezenet/FP32/
	- IR output name: 	squeezenet1.1
	- Log level: 	ERROR
	- Batch: 	Not specified, inherited from the model
	- Input layers: 	Not specified, inherited from the model
	- Output layers: 	Not specified, inherited from the model
	- Input shapes: 	Not specified, inherited from the model
	- Mean values: 	Not specified
	- Scale values: 	Not specified
	- Scale factor: 	Not specified
	- Precision of IR: 	FP32
	- Enable fusing: 	True
	- Enable grouped convolutions fusing: 	True
	- Move mean values to preprocess section: 	False
	- Reverse input channels: 	False
Caffe specific parameters:
	- Path to Python Caffe* parser generated from caffe.proto: 	/opt/intel/openvino/deployment_tools/model_optim


To display labels after classifictaion, you will need a labels file for the SqueezeNet* model. Get the available labels file from demo directory to your working directory.

In [8]:
!cp /opt/intel/openvino/deployment_tools/demo/squeezenet1.1.labels models/squeezenet/FP32/

We will us the [car_1.bmp](car_1.bmp) image to run our classification job as described in the next steps. 


    
### 2. Run classification sample with hetero plugin, prioritizing running on GPU first.

In [21]:
%%writefile classification_job.sh
ME=`basename $0`

DEVICE=$2

# Object detection script writes output to a file inside a directory. We make sure that this directory exists.
# The output directory is the first argument of the bash script
while getopts 'd:f:i:r:n:?' OPTION; do
    case "$OPTION" in
    d)
        DEVICE=$OPTARG
        echo "$ME is using device $OPTARG"
      ;;

    f)
        FP_MODEL=$OPTARG
        echo "$ME is using floating point model $OPTARG"
      ;;

    i)
        INPUT_FILE=$OPTARG
        echo "$ME is using input file $OPTARG"
      ;;
    r)
        RESULTS_BASE=$OPTARG
        echo "$ME is using results base $OPTARG"
      ;;
    n)
        NUM_INFER_REQS=$OPTARG
        echo "$ME is running $OPTARG inference requests"
      ;;
    esac  
done

# The default path for the job is your home directory, so we change directory to where the files are.
cd $PBS_O_WORKDIR

#NN_MODEL="mobilenet-ssd.xml"
RESULTS_PATH="${RESULTS_BASE}"
#mkdir -p $RESULTS_PATH
echo "$ME is using results path $RESULTS_PATH"

if [ "$DEVICE" == "HETERO:FPGA,CPU" ]; then
    # Environment variables and compilation for edge compute nodes with FPGAs
    export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/opt/altera/aocl-pro-rte/aclrte-linux64/
    # Environment variables and compilation for edge compute nodes with FPGAs
    source /opt/fpga_support_files/setup_env.sh
    aocl program acl0 /opt/intel/openvino/bitstreams/a10_vision_design_bitstreams/2019R1_PL1_FP11_MobileNet_Clamp.aocx
fi
    
# Running the object detection code
#SAMPLEPATH=$PBS_O_WORKDIR
python3 classification_sample.py                     -i car_1.bmp \
                                            -m models/squeezenet/FP32/squeezenet1.1.xml \
                                            -d $DEVICE \
                                            -pc  

                                            

Overwriting classification_job.sh


In [22]:
#Submit job to the queue
job_id_gpu = !qsub classification_job.sh -l nodes=1:idc001skl:intel-hd-530 -F "results/GPU HETERO:GPU,CPU FP32" -N obj_det_gpu 
print(job_id_gpu[0]) 
while True:
    var=job_id_gpu[0].split(".")
    file="obj_det_gpu.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break

2991.v-qsvr-1.devcloud-edge

########################################################################
#      Date:           Thu Nov 14 04:05:20 PST 2019
#    Job ID:           2991.v-qsvr-1.devcloud-edge
#      User:           u33131
# Resources:           neednodes=1:idc001skl:intel-hd-530,nodes=1:idc001skl:intel-hd-530,walltime=01:00:00
########################################################################

[setupvars.sh] OpenVINO environment initialized
2991.v-qsvr-1.devcloud-edge.SC is using results path 
[ INFO ] Loading network files:
	models/squeezenet/FP32/squeezenet1.1.xml
	models/squeezenet/FP32/squeezenet1.1.bin
[ INFO ] Preparing input blobs
size is 1
[ INFO ] Batch size is 1
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (1 iterations)
[ INFO ] Processing output blob
[ INFO ] Top 10 results: 
Image car_1.bmp

899  0.2101037 label jug
882  0.1619387 label vacuum cleaner
438  0.0896358 label beaker
804  0.0695650 label dispenser
898  0.0602500 label bott



After the execution, You should get the performance counters output as in the screenshot below:-


<img src='gpu.png'>
    


    
### 3. Now, run with CPU first

In [23]:
#Submit job to the queue
job_id_cpu = !qsub classification_job.sh -l nodes=1:idc001skl:tank-870:i5-6500te  -F "results/GPU HETERO:CPU,GPU FP32" -N obj_det_cpu
print(job_id_cpu[0]) 
while True:
    var=job_id_cpu[0].split(".")
    file="obj_det_cpu.o"+var[0]
    if os.path.isfile(file): 
        ! cat $file
        break


2992.v-qsvr-1.devcloud-edge

########################################################################
#      Date:           Thu Nov 14 04:06:22 PST 2019
#    Job ID:           2992.v-qsvr-1.devcloud-edge
#      User:           u33131
# Resources:           neednodes=1:idc001skl:tank-870:i5-6500te,nodes=1:idc001skl:tank-870:i5-6500te,walltime=01:00:00
########################################################################

[setupvars.sh] OpenVINO environment initialized
2992.v-qsvr-1.devcloud-edge.SC is using results path 
[ INFO ] Loading network files:
	models/squeezenet/FP32/squeezenet1.1.xml
	models/squeezenet/FP32/squeezenet1.1.bin
[ INFO ] Preparing input blobs
size is 1
[ INFO ] Batch size is 1
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (1 iterations)
[ INFO ] Processing output blob
[ INFO ] Top 10 results: 
Image car_1.bmp

899  0.2101035 label jug
882  0.1619391 label vacuum cleaner
438  0.0896353 label beaker
804  0.0695652 label dispenser
898  0.060250



After the execution, You should get the performance counters output as in the screenshot below:-


<img src='cpu.png'>