# OpenVINOâ„¢ benchmark_app profiling 

This page demonstrates how to use the Benchmark Tool to estimate deep learning inference performance on supported devices.

Node: 
- The Python version is recommended for benchmarking models that will be used in Python applications, and the C++ version is recommended for benchmarking models that will be used in C++ applications. 
- Both tools have a similar command interface and backend. Let's take Python version as an example.

In [10]:
# Install openvino package
%pip install -q "openvino-nightly"

[0mNote: you may need to restart the kernel to use updated packages.


The Python benchmark_app is automatically installed when you install OpenVINO using PyPI. Before running benchmark_app, make sure the openvino_env virtual environment is activated, and navigate to the directory where your model is located.

The benchmarking application works with models in the OpenVINO IR (model.xml and model.bin) and ONNX (model.onnx) formats. Make sure to convert your models if necessary.

To run benchmarking with default options on a model, use the following command

`benchmark_app -m <model.xml>`

By using the previous demo enable IR model as an example

`benckmark_app -m model/resnet18_fp32.xml`

In [11]:
! benchmark_app -m model/resnet18_fp32.xml -t 5

[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.4.0-16079-814f3067bdd
[ INFO ] 
[ INFO ] Device info:
[ INFO ] CPU
[ INFO ] Build ................................. 2024.4.0-16079-814f3067bdd
[ INFO ] 
[ INFO ] 
[Step 3/11] Setting device configuration
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 12.82 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     x (node: x) : f32 / [...] / [1,3,64,64]
[ INFO ] Model outputs:
[ INFO ]     ***NO_NAME*** (node: __module.fc/aten::linear/Add) : f32 / [...] / [1,200]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     x (node: x) : u8 / [N,C,H,W] / [1,3,64,64]
[ INFO ] Model outputs:
[ INFO ]     ***NO_NAME*** (node: __mod

By default, the application will load the specified model onto the CPU and perform inference on batches of randomly-generated data inputs for 60 seconds. As it loads, it prints information about the benchmark parameters. 

You may be able to improve benchmark results beyond the default configuration by configuring some of the execution parameters for your model.

Read on `benchmark_app -h` to learn more about the configuration options available with benchmark_app.

The benchmark app provides various options for configuring execution parameters. This section covers key configuration options for easily tuning benchmarking to achieve better performance on your device

In [None]:
# benchmark_app -h 
# [Step 1/11] Parsing and validating input arguments
# [ INFO ] Parsing input parameters
# usage: benchmark_app [-h [HELP]] [-i PATHS_TO_INPUT [PATHS_TO_INPUT ...]] -m
#                      PATH_TO_MODEL [-d TARGET_DEVICE]
#                      [-hint {throughput,tput,cumulative_throughput,ctput,latency,none}]
#                      [-niter NUMBER_ITERATIONS] [-t TIME] [-b BATCH_SIZE]
#                      [-shape SHAPE] [-data_shape DATA_SHAPE] [-layout LAYOUT]
#                      [-extensions EXTENSIONS] [-c PATH_TO_CLDNN_CONFIG]
#                      [-cdir CACHE_DIR] [-lfile [LOAD_FROM_FILE]]
#                      [-api {sync,async}] [-nireq NUMBER_INFER_REQUESTS]
#                      [-nstreams NUMBER_STREAMS]
#                      [-inference_only [INFERENCE_ONLY]]
#                      [-infer_precision INFER_PRECISION]
#                      [-ip {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}]
#                      [-op {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}]
#                      [-iop INPUT_OUTPUT_PRECISION] [--mean_values [R,G,B]]
#                      [--scale_values [R,G,B]] [-nthreads NUMBER_THREADS]
#                      [-pin {YES,NO,NUMA,HYBRID_AWARE}]
#                      [-latency_percentile LATENCY_PERCENTILE]
#                      [-report_type {no_counters,average_counters,detailed_counters}]
#                      [-report_folder REPORT_FOLDER] [-json_stats [JSON_STATS]]
#                      [-pc [PERF_COUNTS]] [-pcsort {no_sort,sort,simple_sort}]
#                      [-pcseq [PCSEQ]] [-exec_graph_path EXEC_GRAPH_PATH]
#                      [-dump_config DUMP_CONFIG] [-load_config LOAD_CONFIG]

# Options:
#   -h [HELP], --help [HELP]
#                         Show this help message and exit.
                        
#   -i PATHS_TO_INPUT [PATHS_TO_INPUT ...], --paths_to_input PATHS_TO_INPUT [PATHS_TO_INPUT ...]
#                         Optional. Path to a folder with images and/or binaries
#                         or to specific image or binary file.It is also allowed
#                         to map files to model inputs:
#                         input_1:file_1/dir1,file_2/dir2,input_4:file_4/dir4
#                         input_2:file_3/dir3 Currently supported data types:
#                         bin, npy. If OPENCV is enabled, this functionalityis
#                         extended with the following data types: bmp, dib,
#                         jpeg, jpg, jpe, jp2, png, pbm, pgm, ppm, sr, ras,
#                         tiff, tif.
                        
#   -m PATH_TO_MODEL, --path_to_model PATH_TO_MODEL
#                         Required. Path to an .xml/.onnx file with a trained
#                         model or to a .blob file with a trained compiled
#                         model.
                        
#   -d TARGET_DEVICE, --target_device TARGET_DEVICE
#                         Optional. Specify a target device to infer on (the
#                         list of available devices is shown below). Default
#                         value is CPU. Use '-d HETERO:<comma separated devices
#                         list>' format to specify HETERO plugin. Use '-d
#                         MULTI:<comma separated devices list>' format to
#                         specify MULTI plugin. The application looks for a
#                         suitable plugin for the specified device.
                        
#   -hint {throughput,tput,cumulative_throughput,ctput,latency,none}, 
#   --perf_hint {throughput,tput,cumulative_throughput,ctput,latency,none}
#                         Optional. Performance hint (latency or throughput or
#                         cumulative_throughput or none). Performance hint
#                         allows the OpenVINO device to select the right model-
#                         specific settings. 'throughput': device performance
#                         mode will be set to THROUGHPUT.
#                         'cumulative_throughput': device performance mode will
#                         be set to CUMULATIVE_THROUGHPUT. 'latency': device
#                         performance mode will be set to LATENCY. 'none': no
#                         device performance mode will be set. Using explicit
#                         'nstreams' or other device-specific options, please
#                         set hint to 'none'
                        
#   -niter NUMBER_ITERATIONS, --number_iterations NUMBER_ITERATIONS
#                         Optional. Number of iterations. If not specified, the
#                         number of iterations is calculated depending on a
#                         device.
                        
#   -t TIME, --time TIME  Optional. Time in seconds to execute topology.
                        
#   -api {sync,async}, --api_type {sync,async}
#                         Optional. Enable using sync/async API. Default value
#                         is async.
                        
#   -json_stats [JSON_STATS], --json_stats [JSON_STATS]
#                         Optional. Enables JSON-based statistics output (by
#                         default reporting system will use CSV format). Should
#                         be used together with -report_folder option.
                        

# Input shapes:
#   -b BATCH_SIZE, --batch_size BATCH_SIZE
#                         Optional. Batch size value. If not specified, the
#                         batch size value is determined from Intermediate
#                         Representation
                        
#   -shape SHAPE          Optional. Set shape for input. For example,
#                         "input1[1,3,224,224],input2[1,4]" or "[1,3,224,224]"
#                         in case of one input size. This parameter affect model
#                         Parameter shape, can be dynamic. For dynamic dimesions
#                         use symbol `?`, `-1` or range `low.. up`.
                        
#   -data_shape DATA_SHAPE
#                         Optional. Optional if model shapes are all static
#                         (original ones or set by -shape).Required if at least
#                         one input shape is dynamic and input images are not
#                         provided.Set shape for input tensors. For example,
#                         "input1[1,3,224,224][1,3,448,448],input2[1,4][1,8]" or
#                         "[1,3,224,224][1,3,448,448] in case of one input size.
                        
#   -layout LAYOUT        Optional. Prompts how model layouts should be treated
#                         by application. For example, "input1[NCHW],input2[NC]"
#                         or "[NCHW]" in case of one input size.
                        

# Advanced options:
#   -extensions EXTENSIONS, --extensions EXTENSIONS
#                         Optional. Path or a comma-separated list of paths to
#                         libraries (.so or .dll) with extensions.
                        
#   -c PATH_TO_CLDNN_CONFIG, --path_to_cldnn_config PATH_TO_CLDNN_CONFIG
#                         Optional. Required for GPU custom kernels. Absolute
#                         path to an .xml file with the kernels description.
                        
#   -cdir CACHE_DIR, --cache_dir CACHE_DIR
#                         Optional. Enable model caching to specified directory
                        
#   -lfile [LOAD_FROM_FILE], --load_from_file [LOAD_FROM_FILE]
#                         Optional. Loads model from file directly without
#                         read_model.
                        
#   -nireq NUMBER_INFER_REQUESTS, --number_infer_requests NUMBER_INFER_REQUESTS
#                         Optional. Number of infer requests. Default value is
#                         determined automatically for device.
                        
#   -nstreams NUMBER_STREAMS, --number_streams NUMBER_STREAMS
#                         Optional. Number of streams to use for inference on
#                         the CPU/GPU (for HETERO and MULTI device cases use
#                         format <device1>:<nstreams1>,<device2>:<nstreams2> or
#                         just <nstreams>). Default value is determined
#                         automatically for a device. Please note that although
#                         the automatic selection usually provides a reasonable
#                         performance, it still may be non - optimal for some
#                         cases, especially for very small models. Also, using
#                         nstreams>1 is inherently throughput-oriented option,
#                         while for the best-latency estimations the number of
#                         streams should be set to 1. See samples README for
#                         more details.
                        
#   -inference_only [INFERENCE_ONLY], --inference_only [INFERENCE_ONLY]
#                         Optional. If true inputs filling only once before
#                         measurements (default for static models), else inputs
#                         filling is included into loop measurement (default for
#                         dynamic models)
                        
#   -infer_precision INFER_PRECISION
#                         Optional. Specifies the inference precision. Example
#                         #1: '-infer_precision bf16'. Example #2:
#                         '-infer_precision CPU:bf16,GPU:f32'
                        
#   -exec_graph_path EXEC_GRAPH_PATH, --exec_graph_path EXEC_GRAPH_PATH
#                         Optional. Path to a file where to store executable
#                         graph information serialized.
                        

# Preprocessing options:
#   -ip {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}, 
#   --input_precision {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}
#                         Optional. Specifies precision for all input layers of
#                         the model.
                        
#   -op {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}, 
#   --output_precision {bool,f16,f32,f64,i8,i16,i32,i64,u8,u16,u32,u64}
#                         Optional. Specifies precision for all output layers of
#                         the model.
                        
#   -iop INPUT_OUTPUT_PRECISION, --input_output_precision INPUT_OUTPUT_PRECISION
#                         Optional. Specifies precision for input and output
#                         layers by name. Example: -iop "input:f16, output:f16".
#                         Notice that quotes are required. Overwrites precision
#                         from ip and op options for specified layers.
                        
#   --mean_values [R,G,B]
#                         Optional. Mean values to be used for the input image
#                         per channel. Values to be provided in the [R,G,B]
#                         format. Can be defined for desired input of the model,
#                         for example: "--mean_values
#                         data[255,255,255],info[255,255,255]". The exact
#                         meaning and order of channels depend on how the
#                         original model was trained. Applying the values
#                         affects performance and may cause type conversion
                        
#   --scale_values [R,G,B]
#                         Optional. Scale values to be used for the input image
#                         per channel. Values are provided in the [R,G,B]
#                         format. Can be defined for desired input of the model,
#                         for example: "--scale_values
#                         data[255,255,255],info[255,255,255]". The exact
#                         meaning and order of channels depend on how the
#                         original model was trained. If both --mean_values and
#                         --scale_values are specified, the mean is subtracted
#                         first and then scale is applied regardless of the
#                         order of options in command line. Applying the values
#                         affects performance and may cause type conversion
                        

# Device-specific performance options:
#   -nthreads NUMBER_THREADS, --number_threads NUMBER_THREADS
#                         Number of threads to use for inference on the CPU
#                         (including HETERO and MULTI cases).
                        
#   -pin {YES,NO,NUMA,HYBRID_AWARE}, --infer_threads_pinning {YES,NO,NUMA,HYBRID_AWARE}
#                         Optional. Enable threads->cores ('YES' which is
#                         OpenVINO runtime's default for conventional CPUs),
#                         threads->(NUMA)nodes ('NUMA'), threads->appropriate
#                         core types ('HYBRID_AWARE', which is OpenVINO
#                         runtime's default for Hybrid CPUs) or completely
#                         disable ('NO') CPU threads pinning for CPU-involved
#                         inference.
                        

# Statistics dumping options:
#   -latency_percentile LATENCY_PERCENTILE, --latency_percentile LATENCY_PERCENTILE
#                         Optional. Defines the percentile to be reported in
#                         latency metric. The valid range is [1, 100]. The
#                         default value is 50 (median).
                        
#   -report_type {no_counters,average_counters,detailed_counters}, 
#   --report_type {no_counters,average_counters,detailed_counters}
#                         Optional. Enable collecting statistics report.
#                         "no_counters" report contains configuration options
#                         specified, resulting FPS and latency.
#                         "average_counters" report extends "no_counters" report
#                         and additionally includes average PM counters values
#                         for each layer from the model. "detailed_counters"
#                         report extends "average_counters" report and
#                         additionally includes per-layer PM counters and
#                         latency for each executed infer request.
                        
#   -report_folder REPORT_FOLDER, --report_folder REPORT_FOLDER
#                         Optional. Path to a folder where statistics report is
#                         stored.
                        
#   -pc [PERF_COUNTS], --perf_counts [PERF_COUNTS]
#                         Optional. Report performance counters.
                        
#   -pcsort {no_sort,sort,simple_sort}, --perf_counts_sort {no_sort,sort,simple_sort}
#                         Optional. Report performance counters and analysis the
#                         sort hotpoint opts. sort: Analysis opts time cost,
#                         print by hotpoint order no_sort: Analysis opts time
#                         cost, print by normal order simple_sort: Analysis opts
#                         time cost, only print EXECUTED opts by normal order
                        
#   -pcseq [PCSEQ], --pcseq [PCSEQ]
#                         Optional. Report latencies for each shape in
#                         -data_shape sequence.
                        
#   -dump_config DUMP_CONFIG
#                         Optional. Path to JSON file to dump OpenVINO
#                         parameters, which were set by application.
                        
#   -load_config LOAD_CONFIG
#                         Optional. Path to JSON file to load custom OpenVINO parameters.
#                         Please note, command line parameters have higher priority then parameters from configuration file.
#                         Example 1: a simple JSON file for HW device with primary properties.
#                                      {
#                                         "CPU": {"NUM_STREAMS": "3", "PERF_COUNT": "NO"}
#                                      }
#                         Example 2: a simple JSON file for meta device(AUTO/MULTI) with HW device properties.
#                                      {
#                                         "AUTO": {
#                                              "PERFORMANCE_HINT": "THROUGHPUT",
#                                              "PERF_COUNT": "NO",
#                                              "DEVICE_PROPERTIES": "{CPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:3},GPU:{INFERENCE_PRECISION_HINT:f32,NUM_STREAMS:5}}"
#                                         }
#                                      }

# Available target devices:   CPU  GPU.0  GPU.1

## Performance optimization

### "-hint" : latency and throughput

The benchmark app allows users to provide high-level __performance hints__ for setting ___latency-focused___ or ___throughput-focused___ inference modes. The performance hints do not require any device-specific settings and they are completely portable between devices. Parameters are __automatically__ configured based on whichever device is being used. 

This hint causes the runtime to automatically adjust __runtime parameters__, such as the ___-nstreams___(number of processing streams) and ___-b___(inference batch size), to prioritize for reduced latency or high throughput.

In [None]:
#   -hint {throughput,tput,cumulative_throughput,ctput,latency,none}, 
#   --perf_hint {throughput,tput,cumulative_throughput,ctput,latency,none}
#                         Optional. Performance hint (latency or throughput or
#                         cumulative_throughput or none). Performance hint
#                         allows the OpenVINO device to select the right model-
#                         specific settings. 'throughput': device performance
#                         mode will be set to THROUGHPUT.
#                         'cumulative_throughput': device performance mode will
#                         be set to CUMULATIVE_THROUGHPUT. 'latency': device
#                         performance mode will be set to LATENCY. 'none': no
#                         device performance mode will be set. Using explicit
#                         'nstreams' or other device-specific options, please
#                         set hint to 'none'

In [None]:
# If not specified, throughput is used as the default. 
# Use -hint latency or -hint throughput when running benchmark_app:

! benchmark_app -m model/resnet18_fp32.xml -t 5 -hint latency
! benchmark_app -m model/resnet18_fp32.xml -t 5 -hint throughput

### Latency Mode
Latency is the amount of time it takes to process a single inference request. In applications where data needs to be inferenced and acted on as quickly as possible. For conventional devices, lower latency is achieved by reducing the amount of parallel processing streams so the system can utilize as many resources as possible to quickly calculate each inference request.

So that, When benchmark_app is run with ___-hint latency___, it determines the optimal number of parallel inference requests for __minimizing__ latency while still maximizing the parallelization capabilities of the hardware. 

### Throughput Mode
Throughput is the amount of data an inference pipeline can process at once, and it is usually measured in frames per second (FPS) or inferences per second. In applications where large amounts of data needs to be inferenced simultaneously (such as multi-camera video streams), high throughput is needed. 

So that, When benchmark_app is run with ___-hint throughput___, it utilizes as much memory and as many parallel streams as possible to __maximize__ the amount of data that can be processed simultaneously.

--------------------

## Device
To set which device benchmarking runs on, use the ___-d DEVICE___ argument. This will tell benchmark_app to run benchmarking on that specific device. The benchmark app supports CPU, GPU, and GNA devices. In order to use GPU, the system must have the appropriate drivers installed. If no device is specified, benchmark_app will default to using CPU.

In [None]:
#   -d TARGET_DEVICE, --target_device TARGET_DEVICE
#                         Optional. Specify a target device to infer on (the
#                         list of available devices is shown below). Default
#                         value is CPU. Use '-d HETERO:<comma separated devices
#                         list>' format to specify HETERO plugin. Use '-d
#                         MULTI:<comma separated devices list>' format to
#                         specify MULTI plugin. The application looks for a
#                         suitable plugin for the specified device.

In [None]:
# For example, to run benchmarking on GPU
! benchmark_app -m model.xml -d GPU

Using specify parameter ___AUTO___ as the device, in which case the benchmark_app will detects available devices, picks the one best-suited for the task, and configures its optimization settings.  
If the latency or throughput hint is set, it will automatically configure streams and batch sizes for optimal performance based on the specified device.

# For example, to run benchmarking on the best device (CPU, GPU)
! benchmark_app -m model.xml -d AUTO

--------------------------

### Input-Shape



#### -shape


In [None]:
#   -shape SHAPE          Optional. Set shape for input. For example,
#                         "input1[1,3,224,224],input2[1,4]" or "[1,3,224,224]"
#                         in case of one input size. This parameter affect model
#                         Parameter shape, can be dynamic. For dynamic dimesions
#                         use symbol `?`, `-1` or range `low.. up`.

#### -data_shape

In [None]:
#   -data_shape DATA_SHAPE
#                         Optional. Optional if model shapes are all static
#                         (original ones or set by -shape).Required if at least
#                         one input shape is dynamic and input images are not
#                         provided.Set shape for input tensors. For example,
#                         "input1[1,3,224,224][1,3,448,448],input2[1,4][1,8]" or
#                         "[1,3,224,224][1,3,448,448] in case of one input size.