Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic dimensions required for input: input, but no shapes were provided. Automatically overriding shape to: 1x3x608x608 #1111

Closed
vilmara opened this issue Mar 10, 2021 · 13 comments
Labels
Topic: Dynamic Shape triaged Issue has been triaged by maintainers

Comments

@vilmara
Copy link

vilmara commented Mar 10, 2021

Description

I am trying to convert the pre-trained Pytorch YOLOV4 (darknet) model to TensorRT INT8 with dynamic batching, to later on deploying it on DS-Triton. I am following the general steps in the same NVIDIA-AI-IOT/yolov4_deepstream, but getting issues first with dynamic dimensions at the ONNX-TRT conversion step, then loading the model on DS-Triton :

Environment

TensorRT Version: 7.2.1
NVIDIA GPU: T4
NVIDIA Driver Version: 450.51.06
CUDA Version: 11.1
CUDNN Version: 8.0.4
Operating System: Ubuntu 18.04
Python Version (if applicable): 1.8
Tensorflow Version (if applicable):
PyTorch Version (if applicable): container image nvcr.io/nvidia/pytorch:20.11-py3
Baremetal or Container (if so, version): container image deepstream:5.1-21.02-triton

Relevant Files

YOLOV4 pre-trained model weights and cfg downloaded from
https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg
https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights

Steps To Reproduce

Complete Pipeline: Pytoch YOLOV4 (darknet) --> ONNX --> TensorRT --> DeepStream-Triton

Step 1: download cfg file and weights from the above link

Step 2: git clone repository pytorch-YOLOv4
$ sudo git clone https://github.com/Tianxiaomo/pytorch-YOLOv4.git

Step 3: Convert model YOLOV4 Pytoch --> ONNX | Dynamic Batch size

$ sudo docker run --gpus all -it --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v l/pytorch-YOLOv4/:/workspace/pytorch-YOLOv4/ nvcr.io/nvidia/pytorch:20.11-py3
$ cd /workspace/pytorch-YOLOv4
$ python demo_darknet2onnx.py "/workspace/pytorch-YOLOv4/models_cfg_weights/yolov4.cfg" "/workspace/pytorch-YOLOv4/models_cfg_weights/yolov4.weights" "/workspace/pytorch-YOLOv4/data/dog.jpg" -1

Result:

Onnx model exporting done
The model expects input shape:  ['batch_size', 3, 608, 608]
Saved model: yolov4_-1_3_608_608_dynamic.onnx

Step 4: Convert model ONNX --> TensorRT | Dynamic Batch size
$ sudo docker run --gpus all -it --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=$DISPLAY -v /pytorch-YOLOv4/:/workspace/pytorch-YOLOv4/ deepstream:5.1-21.02-triton

$ /usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx --explicitBatch --minShapes=\'data\':1x3x608x608 --optShapes=\'data\':2x3x608x608 --maxShapes=\'data\':8x3x608x608 --workspace=4096 --buildOnly -- saveEngine=yolov4_-1_3_608_608_dynamic.onnx_int8.engine --int8

Note: trtexec automatically overrides the engine shape to: 1x3x608x608 instead of keeping the dynamicbatching

[03/09/2021-22:24:24] [W] Dynamic dimensions required for input: input, but no shapes were provided. Automatically overriding shape to: 1x3x608x608
[03/09/2021-22:24:24] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[03/09/2021-22:24:25] [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[03/09/2021-22:43:52] [I] [TRT] Detected 1 inputs and 8 output network tensors.

$ /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic.onnx_int8.engine --int8
Result BS=1:

.
[03/09/2021-22:48:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::445, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch siz but engine max batch size was: 1
[03/09/2021-22:48:45] [I] Warmup completed 312 queries over 200 ms
[03/09/2021-22:48:45] [I] Timing trace has 4704 queries over 3.0043 s
[03/09/2021-22:48:45] [I] Trace averages of 10 runs:
.
[03/09/2021-22:46:29] [I] Host Latency
[03/09/2021-22:46:29] [I] min: 6.81131 ms (end to end 11.6827 ms)
[03/09/2021-22:46:29] [I] max: 10.3354 ms (end to end 21.7613 ms)
[03/09/2021-22:46:29] [I] mean: 7.02095 ms (end to end 12.1098 ms)
[03/09/2021-22:46:29] [I] median: 7.00833 ms (end to end 12.0729 ms)
[03/09/2021-22:46:29] [I] percentile: 7.2074 ms at 99% (end to end 12.4701 ms at 99%)
[03/09/2021-22:46:29] [I] throughput: 163.949 qps
[03/09/2021-22:46:29] [I] walltime: 3.02533 s
[03/09/2021-22:46:29] [I] Enqueue Time
[03/09/2021-22:46:29] [I] min: 1.49683 ms
[03/09/2021-22:46:29] [I] max: 1.841 ms
[03/09/2021-22:46:29] [I] median: 1.52332 ms
[03/09/2021-22:46:29] [I] GPU Compute
[03/09/2021-22:46:29] [I] min: 5.86343 ms
[03/09/2021-22:46:29] [I] max: 9.38628 ms
[03/09/2021-22:46:29] [I] mean: 6.0721 ms
[03/09/2021-22:46:29] [I] median: 6.05927 ms
[03/09/2021-22:46:29] [I] percentile: 6.25732 ms at 99%
[03/09/2021-22:46:29] [I] total compute time: 3.01176 s

Result BS=2:
Error:
03/09/2021-22:48:45] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::445, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 2, but engine max batch size was: 1

Step 5: Config the DS-Triton files as described in the sample NVIDIA-AI-IOT/yolov4_deepstream

Step 6: Run YOLOV4 INT8 mode with Dynamic shapes with DS-Triton
$ deepstream-app -c deepstream_app_config_yoloV4.txt
Error: "unable to autofill for 'yolov4_nvidia', either all model tensor configuration should specify their dims or none"

root@1101333383d9:/workspace/Deepstream_5.1_Triton/samples/configs/deepstream-app-trtis# deepstream-app -c source1_primary_yolov4.txt
I0309 23:25:10.628131 260 metrics.cc:219] Collecting metrics for GPU 0: Tesla T4
I0309 23:25:10.634856 260 metrics.cc:219] Collecting metrics for GPU 1: Tesla T4
I0309 23:25:10.641297 260 metrics.cc:219] Collecting metrics for GPU 2: Tesla T4
I0309 23:25:10.647843 260 metrics.cc:219] Collecting metrics for GPU 3: Tesla T4
I0309 23:25:10.706528 260 pinned_memory_manager.cc:199] Pinned memory pool is created at '0x7febf8000000' with size 268435456
I0309 23:25:10.710959 260 cuda_memory_manager.cc:99] CUDA memory pool is created on device 0 with size 67108864
I0309 23:25:10.710967 260 cuda_memory_manager.cc:99] CUDA memory pool is created on device 1 with size 67108864
I```
0309 23:25:10.710972 260 cuda_memory_manager.cc:99] CUDA memory pool is created on device 2 with size 67108864
I0309 23:25:10.710976 260 cuda_memory_manager.cc:99] CUDA memory pool is created on device 3 with size 67108864
I0309 23:25:10.991848 260 server.cc:141]
.
| Backend | Config | Path |
.
.

I0309 23:25:10.991880 260 server.cc:184]
.
| Model | Version | Status |
.
.

I0309 23:25:10.991971 260 tritonserver.cc:1620]
.
| Option                           | Value                                                                                                                            |
.
| server_id                        | triton                                                                                                                           |
| server_version                   | 2.5.0                                                                                                                            |
| server_extensions                | classification sequence model_repository schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tens |
|                                  | or_data statistics                                                                                                               |
| model_repository_path[0]         | /workspace/Deepstream_5.1_Triton/samples/trtis_model_repo                                                                        |
| model_control_mode               | MODE_EXPLICIT                                                                                                                    |
| strict_model_config              | 0                                                                                                                                |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                        |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                         |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                                                                         |
| cuda_memory_pool_byte_size{2}    | 67108864                                                                                                                         |
| cuda_memory_pool_byte_size{3}    | 67108864                                                                                                                         |
| min_supported_compute_capability | 6.0                                                                                                                              |
| strict_readiness                 | 1                                                                                                                                |
| exit_timeout                     | 30                                                                                                                               |
.

E0309 23:25:22.300254 260 model_repository_manager.cc:1705] unable to autofill for 'yolov4_nvidia', either all model tensor configuration should specify their dims or none.
ERROR: infer_trtis_server.cpp:1044 Triton: failed to load model yolov4_nvidia, triton_err_str:Internal, err_msg:failed to load 'yolov4_nvidia', no version is available
ERROR: infer_trtis_backend.cpp:45 failed to load model: yolov4_nvidia, nvinfer error:NVDSINFER_TRTIS_ERROR
ERROR: infer_trtis_backend.cpp:184 failed to initialize backend while ensuring model:yolov4_nvidia ready, nvinfer error:NVDSINFER_TRTIS_ERROR
0:00:14.399726167   260 0x564fdec902f0 ERROR          nvinferserver gstnvinferserver.cpp:362:gst_nvinfer_server_logger:<primary_gie> nvinferserver[UID 1]: Error in createNNBackend() <infer_trtis_context.cpp:246> [UID = 1]: failed to initialize trtis backend for model:yolov4_nvidia, nvinfer error:NVDSINFER_TRTIS_ERROR
I0309 23:25:22.300489 260 server.cc:280] Waiting for in-flight requests to complete.
I0309 23:25:22.300497 260 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
0:00:14.399831360   260 0x564fdec902f0 ERROR          nvinferserver gstnvinferserver.cpp:362:gst_nvinfer_server_logger:<primary_gie> nvinferserver[UID 1]: Error in initialize() <infer_base_context.cpp:81> [UID = 1]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_TRTIS_ERROR
0:00:14.399843072   260 0x564fdec902f0 WARN           nvinferserver gstnvinferserver_impl.cpp:439:start:<primary_gie> error: Failed to initialize InferTrtIsContext
0:00:14.399868241   260 0x564fdec902f0 WARN           nvinferserver gstnvinferserver_impl.cpp:439:start:<primary_gie> error: Config file path: /workspace/Deepstream_5.1_Triton/samples/configs/deepstream-app-trtis/config_infer_primary_yolov4.txt
0:00:14.400284532   260 0x564fdec902f0 WARN           nvinferserver gstnvinferserver.cpp:460:gst_nvinfer_server_start:<primary_gie> error: gstnvinferserver_impl start failed
** ERROR: <main:655>: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to initialize InferTrtIsContext
Debug info: gstnvinferserver_impl.cpp(439): start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie:
Config file path: /workspace/Deepstream_5.1_Triton/samples/configs/deepstream-app-trtis/config_infer_primary_yolov4.txt
ERROR from primary_gie: gstnvinferserver_impl start failed
Debug info: gstnvinferserver.cpp(460): gst_nvinfer_server_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie
App run failed

I think the problem is with trtexec, is there a sample/tool that shows how to optimize a YOLO Pytorch-ONNX to TensorRT engine INT8 mode with full INT8 calibration and dynamic input shapes?

@vilmara vilmara changed the title Unable to autofill for 'yolov4_nvidia', either all model tensor configuration should specify their dims or none. Dynamic dimensions required for input: input, but no shapes were provided. Automatically overriding shape to: 1x3x608x608 Mar 10, 2021
@pranavm-nvidia
Copy link
Collaborator

@vilmara It looks like the model input is called input, but you're using data:

--minShapes=\'data\':1x3x608x608 --optShapes=\'data\':2x3x608x608 --maxShapes=\'data\':8x3x608x608

@vilmara
Copy link
Author

vilmara commented Mar 10, 2021

Hi @pranavm-nvidia, thanks for your prompt reply. You are right, I have tried with input name "input" and got the same result from trtexec generating the engine with static batch size (Dynamic dimensions required for input: input, but no shapes were provided. Automatically overriding shape to: 1x3x608x608). Please see below:

Input shape ONNX model:
image

Step: Generating the engine with shape's name 'input'
$ /usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx --explicitBatch --minShapes=\'input\':1x3x608x608 --optShapes=\'input\':2x3x608x608 --maxShapes=\'input\':8x3x608x608 --workspace=4096 --buildOnly --saveEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx --explicitBatch --minShapes='input':1x3x608x608 --optShapes='input':2x3x608x608 --maxShapes='input':8x3x608x608 --workspace=4096 --buildOnly --saveEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8
[03/10/2021-00:43:28] [I] === Model Options ===
[03/10/2021-00:43:28] [I] Format: ONNX
[03/10/2021-00:43:28] [I] Model: yolov4_-1_3_608_608_dynamic.onnx
[03/10/2021-00:43:28] [I] Output:
[03/10/2021-00:43:28] [I] === Build Options ===
[03/10/2021-00:43:28] [I] Max batch: explicit
[03/10/2021-00:43:28] [I] Workspace: 4096 MB
[03/10/2021-00:43:28] [I] minTiming: 1
[03/10/2021-00:43:28] [I] avgTiming: 8
[03/10/2021-00:43:28] [I] Precision: FP32+INT8
[03/10/2021-00:43:28] [I] Calibration: Dynamic
[03/10/2021-00:43:28] [I] Safe mode: Disabled
[03/10/2021-00:43:28] [I] Save engine: yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine
[03/10/2021-00:43:28] [I] Load engine:
[03/10/2021-00:43:28] [I] Builder Cache: Enabled
[03/10/2021-00:43:28] [I] NVTX verbosity: 0
[03/10/2021-00:43:28] [I] Inputs format: fp32:CHW
[03/10/2021-00:43:28] [I] Outputs format: fp32:CHW
[03/10/2021-00:43:28] [I] Input build shape: input=1x3x608x608+2x3x608x608+8x3x608x608
[03/10/2021-00:43:28] [I] Input calibration shapes: model
[03/10/2021-00:43:28] [I] === System Options ===
[03/10/2021-00:43:28] [I] Device: 0
[03/10/2021-00:43:28] [I] DLACore:
[03/10/2021-00:43:28] [I] Plugins:
[03/10/2021-00:43:28] [I] === Inference Options ===
[03/10/2021-00:43:28] [I] Batch: Explicit
[03/10/2021-00:43:28] [I] Input inference shape: input=2x3x608x608
[03/10/2021-00:43:28] [I] Iterations: 10
[03/10/2021-00:43:28] [I] Duration: 3s (+ 200ms warm up)
[03/10/2021-00:43:28] [I] Sleep time: 0ms
[03/10/2021-00:43:28] [I] Streams: 1
[03/10/2021-00:43:28] [I] ExposeDMA: Disabled
[03/10/2021-00:43:28] [I] Spin-wait: Disabled
[03/10/2021-00:43:28] [I] Multithreading: Disabled
[03/10/2021-00:43:28] [I] CUDA Graph: Disabled
[03/10/2021-00:43:28] [I] Skip inference: Enabled
[03/10/2021-00:43:28] [I] Inputs:
[03/10/2021-00:43:28] [I] === Reporting Options ===
[03/10/2021-00:43:28] [I] Verbose: Disabled
[03/10/2021-00:43:28] [I] Averages: 10 inferences
[03/10/2021-00:43:28] [I] Percentile: 99
[03/10/2021-00:43:28] [I] Dump output: Disabled
[03/10/2021-00:43:28] [I] Profile: Disabled
[03/10/2021-00:43:28] [I] Export timing to JSON file:
[03/10/2021-00:43:28] [I] Export output to JSON file:
[03/10/2021-00:43:28] [I] Export profile to JSON file:
[03/10/2021-00:43:28] [I]
----------------------------------------------------------------
Input filename:   yolov4_-1_3_608_608_dynamic.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.8
Domain:
Model version:    0
Doc string:
----------------------------------------------------------------
[03/10/2021-00:43:39] [W] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/10/2021-00:43:39] [W] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[03/10/2021-00:43:39] [W] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[03/10/2021-00:43:39] [W] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[03/10/2021-00:43:39] [W] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[03/10/2021-00:43:39] [W] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[03/10/2021-00:43:39] [W] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[03/10/2021-00:43:39] [W] [TRT] Output type must be INT32 for shape outputs
[03/10/2021-00:43:39] [W] [TRT] Output type must be INT32 for shape outputs
[03/10/2021-00:43:39] [W] [TRT] Output type must be INT32 for shape outputs
[03/10/2021-00:43:39] [W] [TRT] Output type must be INT32 for shape outputs
[03/10/2021-00:43:39] [I] FP32 and INT8 precisions have been specified - more performance might be enabled by additionally specifying --fp16 or --best
[03/10/2021-00:43:39] [W] [TRT] Calibrator is not being used. Users must provide dynamic range for all tensors that are not Int32.
[03/10/2021-00:48:29] [I] [TRT] Detected 1 inputs and 8 output network tensors.
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx --explicitBatch --minShapes='input':1x3x608x608 --optShapes='input':2x3x608x608 --maxShapes='input':8x3x608x608 --workspace=4096 --buildOnly --saveEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8

Running the model | BS=2
$ /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8 --batch=2

Note: I got batch dimension errors again, the engine was converted to static batch size 1x3x608x608 instead of keeping the dynamic shape, and the engine generates wrong throughput

[03/10/2021-00:52:09] [I] === Model Options ===
[03/10/2021-00:52:09] [I] Format: *
[03/10/2021-00:52:09] [I] Model:
[03/10/2021-00:52:09] [I] Output:
[03/10/2021-00:52:09] [I] === Build Options ===
[03/10/2021-00:52:09] [I] Max batch: 2
[03/10/2021-00:52:09] [I] Workspace: 16 MB
[03/10/2021-00:52:09] [I] minTiming: 1
[03/10/2021-00:52:09] [I] avgTiming: 8
[03/10/2021-00:52:09] [I] Precision: FP32+INT8
[03/10/2021-00:52:09] [I] Calibration: Dynamic
[03/10/2021-00:52:09] [I] Safe mode: Disabled
[03/10/2021-00:52:09] [I] Save engine:
[03/10/2021-00:52:09] [I] Load engine: yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine
[03/10/2021-00:52:09] [I] Builder Cache: Enabled
[03/10/2021-00:52:09] [I] NVTX verbosity: 0
[03/10/2021-00:52:09] [I] Inputs format: fp32:CHW
[03/10/2021-00:52:09] [I] Outputs format: fp32:CHW
[03/10/2021-00:52:09] [I] Input build shapes: model
[03/10/2021-00:52:09] [I] Input calibration shapes: model
[03/10/2021-00:52:09] [I] === System Options ===
[03/10/2021-00:52:09] [I] Device: 0
[03/10/2021-00:52:09] [I] DLACore:
[03/10/2021-00:52:09] [I] Plugins:
[03/10/2021-00:52:09] [I] === Inference Options ===
[03/10/2021-00:52:09] [I] Batch: 2
[03/10/2021-00:52:09] [I] Input inference shapes: model
[03/10/2021-00:52:09] [I] Iterations: 10
[03/10/2021-00:52:09] [I] Duration: 3s (+ 200ms warm up)
[03/10/2021-00:52:09] [I] Sleep time: 0ms
[03/10/2021-00:52:09] [I] Streams: 1
[03/10/2021-00:52:09] [I] ExposeDMA: Disabled
[03/10/2021-00:52:09] [I] Spin-wait: Disabled
[03/10/2021-00:52:09] [I] Multithreading: Disabled
[03/10/2021-00:52:09] [I] CUDA Graph: Disabled
[03/10/2021-00:52:09] [I] Skip inference: Disabled
[03/10/2021-00:52:09] [I] Inputs:
[03/10/2021-00:52:09] [I] === Reporting Options ===
[03/10/2021-00:52:09] [I] Verbose: Disabled
[03/10/2021-00:52:09] [I] Averages: 10 inferences
[03/10/2021-00:52:09] [I] Percentile: 99
[03/10/2021-00:52:09] [I] Dump output: Disabled
[03/10/2021-00:52:09] [I] Profile: Disabled
[03/10/2021-00:52:09] [I] Export timing to JSON file:
[03/10/2021-00:52:09] [I] Export output to JSON file:
[03/10/2021-00:52:09] [I] Export profile to JSON file:
[03/10/2021-00:52:09] [I]
[03/10/2021-00:52:20] [W] Dynamic dimensions required for input: input, but no shapes were provided. Automatically overriding shape to: 1x3x608x608
[03/10/2021-00:52:21] [I] Starting inference threads
[03/10/2021-00:52:21] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::445, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 2, but engine max batch size was: 1
[03/10/2021-00:52:21] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::445, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 2, but engine max batch size was: 1
[03/10/2021-00:52:21] [E] [TRT] Parameter check failed at: engine.cpp::enqueue::445, condition: batchSize > 0 && batchSize <= mEngine.getMaxBatchSize(). Note: Batch size was: 2, but engine max batch size was: 1
.
[03/10/2021-00:52:24] [I] min: 1.95605 ms (end to end 2.39429 ms)
[03/10/2021-00:52:24] [I] max: 2.0972 ms (end to end 2.50848 ms)
[03/10/2021-00:52:24] [I] mean: 2.06553 ms (end to end 2.50324 ms)
[03/10/2021-00:52:24] [I] median: 2.06543 ms (end to end 2.50348 ms)
[03/10/2021-00:52:24] [I] percentile: 2.06836 ms at 99% (end to end 2.50635 ms at 99%)
**[03/10/2021-00:52:24] [I] throughput: 1565.76 qps**
[03/10/2021-00:52:24] [I] walltime: 3.0043 s
[03/10/2021-00:52:24] [I] Enqueue Time
[03/10/2021-00:52:24] [I] min: 0.0118408 ms
[03/10/2021-00:52:24] [I] max: 0.0271606 ms
[03/10/2021-00:52:24] [I] median: 0.0124512 ms
[03/10/2021-00:52:24] [I] GPU Compute
[03/10/2021-00:52:24] [I] min: 0.00195312 ms
[03/10/2021-00:52:24] [I] max: 0.00463867 ms
[03/10/2021-00:52:24] [I] mean: 0.00305392 ms
[03/10/2021-00:52:24] [I] median: 0.00292969 ms
[03/10/2021-00:52:24] [I] percentile: 0.00390625 ms at 99%
[03/10/2021-00:52:24] [I] total compute time: 0.00718283 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8 --batch=2

@pranavm-nvidia
Copy link
Collaborator

@vilmara Can you try using --shapes to set the inference shapes in your second command? --batch used to be for implicit batch networks, and is deprecated.

@vilmara
Copy link
Author

vilmara commented Mar 10, 2021

@pranavm-nvidia , please see below the results; with shape =1 and shape =2

With Max batch: 1
$ /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8

Throughput: 154.158 qps

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8
[03/10/2021-01:20:54] [I] === Model Options ===
[03/10/2021-01:20:54] [I] Format: *
[03/10/2021-01:20:54] [I] Model:
[03/10/2021-01:20:54] [I] Output:
[03/10/2021-01:20:54] [I] === Build Options ===
[03/10/2021-01:20:54] [I] Max batch: 1
[03/10/2021-01:20:54] [I] Workspace: 16 MB
[03/10/2021-01:20:54] [I] minTiming: 1
[03/10/2021-01:20:54] [I] avgTiming: 8
[03/10/2021-01:20:54] [I] Precision: FP32+INT8
[03/10/2021-01:20:54] [I] Calibration: Dynamic
[03/10/2021-01:20:54] [I] Safe mode: Disabled
[03/10/2021-01:20:54] [I] Save engine:
[03/10/2021-01:20:54] [I] Load engine: yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine
[03/10/2021-01:20:54] [I] Builder Cache: Enabled
[03/10/2021-01:20:54] [I] NVTX verbosity: 0
[03/10/2021-01:20:54] [I] Inputs format: fp32:CHW
[03/10/2021-01:20:54] [I] Outputs format: fp32:CHW
[03/10/2021-01:20:54] [I] Input build shapes: model
[03/10/2021-01:20:54] [I] Input calibration shapes: model
[03/10/2021-01:20:54] [I] === System Options ===
[03/10/2021-01:20:54] [I] Device: 0
[03/10/2021-01:20:54] [I] DLACore:
[03/10/2021-01:20:54] [I] Plugins:
[03/10/2021-01:20:54] [I] === Inference Options ===
[03/10/2021-01:20:54] [I] Batch: 1
[03/10/2021-01:20:54] [I] Input inference shapes: model
[03/10/2021-01:20:54] [I] Iterations: 10
[03/10/2021-01:20:54] [I] Duration: 3s (+ 200ms warm up)
[03/10/2021-01:20:54] [I] Sleep time: 0ms
[03/10/2021-01:20:54] [I] Streams: 1
[03/10/2021-01:20:54] [I] ExposeDMA: Disabled
[03/10/2021-01:20:54] [I] Spin-wait: Disabled
[03/10/2021-01:20:54] [I] Multithreading: Disabled
[03/10/2021-01:20:54] [I] CUDA Graph: Disabled
[03/10/2021-01:20:54] [I] Skip inference: Disabled
[03/10/2021-01:20:54] [I] Inputs:
[03/10/2021-01:20:54] [I] === Reporting Options ===
[03/10/2021-01:20:54] [I] Verbose: Disabled
[03/10/2021-01:20:54] [I] Averages: 10 inferences
[03/10/2021-01:20:54] [I] Percentile: 99
[03/10/2021-01:20:54] [I] Dump output: Disabled
[03/10/2021-01:20:54] [I] Profile: Disabled
[03/10/2021-01:20:54] [I] Export timing to JSON file:
[03/10/2021-01:20:54] [I] Export output to JSON file:
[03/10/2021-01:20:54] [I] Export profile to JSON file:
[03/10/2021-01:20:54] [I]
[03/10/2021-01:21:05] [W] Dynamic dimensions required for input: input, but no shapes were provided. Automatically overriding shape to: 1x3x608x608
[03/10/2021-01:21:06] [I] Starting inference threads
[03/10/2021-01:21:09] [I] Warmup completed 15 queries over 200 ms
[03/10/2021-01:21:09] [I] Timing trace has 466 queries over 3.02287 s
[03/10/2021-01:21:09] [I] Trace averages of 10 runs:
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 10.3064 ms - Host latency: 11.2575 ms (end to end 21.3018 ms, enqueue 1.58119 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.2913 ms - Host latency: 7.24501 ms (end to end 12.5531 ms, enqueue 1.541 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.31232 ms - Host latency: 7.26759 ms (end to end 12.5699 ms, enqueue 1.54997 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.34031 ms - Host latency: 7.29834 ms (end to end 12.6406 ms, enqueue 1.54017 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.33834 ms - Host latency: 7.29172 ms (end to end 12.6288 ms, enqueue 1.50324 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.33023 ms - Host latency: 7.28496 ms (end to end 12.6171 ms, enqueue 1.49459 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.35021 ms - Host latency: 7.30355 ms (end to end 12.6524 ms, enqueue 1.56425 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.33003 ms - Host latency: 7.28468 ms (end to end 12.6167 ms, enqueue 1.49919 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.3475 ms - Host latency: 7.30049 ms (end to end 12.6528 ms, enqueue 1.53017 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.31423 ms - Host latency: 7.26859 ms (end to end 12.58 ms, enqueue 1.5191 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.32168 ms - Host latency: 7.27675 ms (end to end 12.6064 ms, enqueue 1.52623 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.33661 ms - Host latency: 7.29273 ms (end to end 12.6216 ms, enqueue 1.50298 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.335 ms - Host latency: 7.28808 ms (end to end 12.635 ms, enqueue 1.5046 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.37668 ms - Host latency: 7.32964 ms (end to end 12.7022 ms, enqueue 1.5267 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.46835 ms - Host latency: 7.42238 ms (end to end 12.8584 ms, enqueue 1.57943 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.59392 ms - Host latency: 7.54982 ms (end to end 13.1529 ms, enqueue 1.54055 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.39971 ms - Host latency: 7.35326 ms (end to end 12.7646 ms, enqueue 1.56095 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.37819 ms - Host latency: 7.33138 ms (end to end 12.7025 ms, enqueue 1.56886 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.34188 ms - Host latency: 7.29685 ms (end to end 12.6504 ms, enqueue 1.52433 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.3759 ms - Host latency: 7.33035 ms (end to end 12.6989 ms, enqueue 1.52706 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.3494 ms - Host latency: 7.30215 ms (end to end 12.6496 ms, enqueue 1.54875 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.40233 ms - Host latency: 7.35695 ms (end to end 12.7567 ms, enqueue 1.53451 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.3105 ms - Host latency: 7.26461 ms (end to end 12.5787 ms, enqueue 1.5144 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.34429 ms - Host latency: 7.30092 ms (end to end 12.6401 ms, enqueue 1.51837 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.37424 ms - Host latency: 7.32961 ms (end to end 12.6914 ms, enqueue 1.56772 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.35863 ms - Host latency: 7.31173 ms (end to end 12.6829 ms, enqueue 1.52521 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.35978 ms - Host latency: 7.31337 ms (end to end 12.6607 ms, enqueue 1.52924 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.32866 ms - Host latency: 7.28274 ms (end to end 12.6234 ms, enqueue 1.53496 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.34381 ms - Host latency: 7.29774 ms (end to end 12.6369 ms, enqueue 1.52136 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.34426 ms - Host latency: 7.29944 ms (end to end 12.6243 ms, enqueue 1.4979 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.59482 ms - Host latency: 7.54722 ms (end to end 13.1421 ms, enqueue 1.54148 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.42817 ms - Host latency: 7.38113 ms (end to end 12.8258 ms, enqueue 1.55986 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.37895 ms - Host latency: 7.33132 ms (end to end 12.7084 ms, enqueue 1.52351 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.38203 ms - Host latency: 7.3373 ms (end to end 12.7233 ms, enqueue 1.54202 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.36753 ms - Host latency: 7.32026 ms (end to end 12.6802 ms, enqueue 1.54561 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.35054 ms - Host latency: 7.30452 ms (end to end 12.6658 ms, enqueue 1.54045 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.3325 ms - Host latency: 7.2875 ms (end to end 12.6056 ms, enqueue 1.49817 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.33772 ms - Host latency: 7.29207 ms (end to end 12.6232 ms, enqueue 1.50571 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.38477 ms - Host latency: 7.33865 ms (end to end 12.7271 ms, enqueue 1.54875 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.3635 ms - Host latency: 7.31834 ms (end to end 12.6756 ms, enqueue 1.503 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.36074 ms - Host latency: 7.31372 ms (end to end 12.6804 ms, enqueue 1.58777 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.35349 ms - Host latency: 7.30757 ms (end to end 12.6509 ms, enqueue 1.52463 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.3543 ms - Host latency: 7.30801 ms (end to end 12.6764 ms, enqueue 1.52825 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.3792 ms - Host latency: 7.33228 ms (end to end 12.7003 ms, enqueue 1.54377 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.35442 ms - Host latency: 7.30952 ms (end to end 12.6701 ms, enqueue 1.52278 ms)
[03/10/2021-01:21:09] [I] Average on 10 runs - GPU latency: 6.4991 ms - Host latency: 7.45127 ms (end to end 12.935 ms, enqueue 1.54097 ms)
[03/10/2021-01:21:09] [I] Host Latency
[03/10/2021-01:21:09] [I] min: 7.22192 ms (end to end 12.4314 ms)
[03/10/2021-01:21:09] [I] max: 14.8126 ms (end to end 27.6749 ms)
[03/10/2021-01:21:09] [I] mean: 7.40889 ms (end to end 12.879 ms)
[03/10/2021-01:21:09] [I] median: 7.30386 ms (end to end 12.6557 ms)
[03/10/2021-01:21:09] [I] percentile: 14.8004 ms at 99% (end to end 27.6423 ms at 99%)
[03/10/2021-01:21:09] [I] throughput: 154.158 qps
[03/10/2021-01:21:09] [I] walltime: 3.02287 s
[03/10/2021-01:21:09] [I] Enqueue Time
[03/10/2021-01:21:09] [I] min: 1.46539 ms
[03/10/2021-01:21:09] [I] max: 1.80786 ms
[03/10/2021-01:21:09] [I] median: 1.48773 ms
[03/10/2021-01:21:09] [I] GPU Compute
[03/10/2021-01:21:09] [I] min: 6.26685 ms
[03/10/2021-01:21:09] [I] max: 13.8603 ms
[03/10/2021-01:21:09] [I] mean: 6.45483 ms
[03/10/2021-01:21:09] [I] median: 6.34949 ms
[03/10/2021-01:21:09] [I] percentile: 13.8505 ms at 99%
[03/10/2021-01:21:09] [I] total compute time: 3.00795 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8

With BS=2 | --shapes='input':2x3x608x608

$ /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8 --shapes=\'input\':2x3x608x608

Throughput: 0 qps

&&&& RUNNING TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8 --shapes='input':2x3x608x608
[03/10/2021-01:25:58] [I] === Model Options ===
[03/10/2021-01:25:58] [I] Format: *
[03/10/2021-01:25:58] [I] Model:
[03/10/2021-01:25:58] [I] Output:
[03/10/2021-01:25:58] [I] === Build Options ===
[03/10/2021-01:25:58] [I] Max batch: explicit
[03/10/2021-01:25:58] [I] Workspace: 16 MB
[03/10/2021-01:25:58] [I] minTiming: 1
[03/10/2021-01:25:58] [I] avgTiming: 8
[03/10/2021-01:25:58] [I] Precision: FP32+INT8
[03/10/2021-01:25:58] [I] Calibration: Dynamic
[03/10/2021-01:25:58] [I] Safe mode: Disabled
[03/10/2021-01:25:58] [I] Save engine:
[03/10/2021-01:25:58] [I] Load engine: yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine
[03/10/2021-01:25:58] [I] Builder Cache: Enabled
[03/10/2021-01:25:58] [I] NVTX verbosity: 0
[03/10/2021-01:25:58] [I] Inputs format: fp32:CHW
[03/10/2021-01:25:58] [I] Outputs format: fp32:CHW
[03/10/2021-01:25:58] [I] Input build shape: input=2x3x608x608+2x3x608x608+2x3x608x608
[03/10/2021-01:25:58] [I] Input calibration shapes: model
[03/10/2021-01:25:58] [I] === System Options ===
[03/10/2021-01:25:58] [I] Device: 0
[03/10/2021-01:25:58] [I] DLACore:
[03/10/2021-01:25:58] [I] Plugins:
[03/10/2021-01:25:58] [I] === Inference Options ===
[03/10/2021-01:25:58] [I] Batch: Explicit
[03/10/2021-01:25:58] [I] Input inference shape: input=2x3x608x608
[03/10/2021-01:25:58] [I] Iterations: 10
[03/10/2021-01:25:58] [I] Duration: 3s (+ 200ms warm up)
[03/10/2021-01:25:58] [I] Sleep time: 0ms
[03/10/2021-01:25:58] [I] Streams: 1
[03/10/2021-01:25:58] [I] ExposeDMA: Disabled
[03/10/2021-01:25:58] [I] Spin-wait: Disabled
[03/10/2021-01:25:58] [I] Multithreading: Disabled
[03/10/2021-01:25:58] [I] CUDA Graph: Disabled
[03/10/2021-01:25:58] [I] Skip inference: Disabled
[03/10/2021-01:25:58] [I] Inputs:
[03/10/2021-01:25:58] [I] === Reporting Options ===
[03/10/2021-01:25:58] [I] Verbose: Disabled
[03/10/2021-01:25:58] [I] Averages: 10 inferences
[03/10/2021-01:25:58] [I] Percentile: 99
[03/10/2021-01:25:58] [I] Dump output: Disabled
[03/10/2021-01:25:58] [I] Profile: Disabled
[03/10/2021-01:25:58] [I] Export timing to JSON file:
[03/10/2021-01:25:58] [I] Export output to JSON file:
[03/10/2021-01:25:58] [I] Export profile to JSON file:
[03/10/2021-01:25:58] [I]
[03/10/2021-01:26:10] [I] Starting inference threads
[03/10/2021-01:26:13] [I] Warmup completed 0 queries over 200 ms
[03/10/2021-01:26:13] [I] Timing trace has 0 queries over 3.04473 s
[03/10/2021-01:26:13] [I] Trace averages of 10 runs:
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 12.1201 ms - Host latency: 14.0069 ms (end to end 25.1735 ms, enqueue 1.49378 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.137 ms - Host latency: 12.0237 ms (end to end 20.2425 ms, enqueue 1.57295 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.0741 ms - Host latency: 11.9609 ms (end to end 20.111 ms, enqueue 1.5246 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.0996 ms - Host latency: 11.986 ms (end to end 20.1477 ms, enqueue 1.57426 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.0957 ms - Host latency: 11.9826 ms (end to end 20.1541 ms, enqueue 1.52883 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.0296 ms - Host latency: 11.9163 ms (end to end 20.0398 ms, enqueue 1.61193 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.1328 ms - Host latency: 12.0197 ms (end to end 20.1968 ms, enqueue 1.52691 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.0667 ms - Host latency: 11.9539 ms (end to end 20.0923 ms, enqueue 1.57367 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.316 ms - Host latency: 12.2041 ms (end to end 20.5112 ms, enqueue 1.47996 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.9389 ms - Host latency: 12.8307 ms (end to end 21.843 ms, enqueue 1.5741 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.4337 ms - Host latency: 12.3211 ms (end to end 20.8643 ms, enqueue 1.53197 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.4604 ms - Host latency: 12.3465 ms (end to end 20.8491 ms, enqueue 1.57727 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.3613 ms - Host latency: 12.2481 ms (end to end 20.7303 ms, enqueue 1.53014 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.0905 ms - Host latency: 11.9771 ms (end to end 20.1365 ms, enqueue 1.57793 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.0721 ms - Host latency: 11.9593 ms (end to end 20.1225 ms, enqueue 1.52898 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.0807 ms - Host latency: 11.9682 ms (end to end 20.098 ms, enqueue 1.53143 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.0834 ms - Host latency: 11.9693 ms (end to end 20.126 ms, enqueue 1.6184 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.1005 ms - Host latency: 11.9877 ms (end to end 20.1294 ms, enqueue 1.48313 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.5925 ms - Host latency: 12.4824 ms (end to end 21.1131 ms, enqueue 1.52883 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.7158 ms - Host latency: 12.6058 ms (end to end 21.4158 ms, enqueue 1.48391 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.3664 ms - Host latency: 12.2531 ms (end to end 20.7108 ms, enqueue 1.53105 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.6887 ms - Host latency: 12.5782 ms (end to end 21.3066 ms, enqueue 1.53381 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.2405 ms - Host latency: 12.1284 ms (end to end 20.4625 ms, enqueue 1.56887 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.3204 ms - Host latency: 12.208 ms (end to end 20.577 ms, enqueue 1.52947 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.2967 ms - Host latency: 12.1829 ms (end to end 20.5491 ms, enqueue 1.52668 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.4952 ms - Host latency: 12.3848 ms (end to end 20.9506 ms, enqueue 1.57275 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.1634 ms - Host latency: 12.0503 ms (end to end 20.3115 ms, enqueue 1.57158 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.2122 ms - Host latency: 12.1003 ms (end to end 20.3671 ms, enqueue 1.5252 ms)
[03/10/2021-01:26:13] [I] Average on 10 runs - GPU latency: 10.4627 ms - Host latency: 12.3501 ms (end to end 20.8417 ms, enqueue 1.52703 ms)
[03/10/2021-01:26:13] [I] Host Latency
[03/10/2021-01:26:13] [I] min: 11.7104 ms (end to end 19.7689 ms)
[03/10/2021-01:26:13] [I] max: 21.8932 ms (end to end 39.9739 ms)
[03/10/2021-01:26:13] [I] mean: 12.2437 ms (end to end 20.7003 ms)
[03/10/2021-01:26:13] [I] median: 12.1147 ms (end to end 20.4098 ms)
[03/10/2021-01:26:13] [I] percentile: 13.0126 ms at 99% (end to end 29.7882 ms at 99%)
[03/10/2021-01:26:13] [I] throughput: 0 qps
[03/10/2021-01:26:13] [I] walltime: 3.04473 s
[03/10/2021-01:26:13] [I] Enqueue Time
[03/10/2021-01:26:13] [I] min: 1.4682 ms
[03/10/2021-01:26:13] [I] max: 1.94559 ms
[03/10/2021-01:26:13] [I] median: 1.48224 ms
[03/10/2021-01:26:13] [I] GPU Compute
[03/10/2021-01:26:13] [I] min: 9.82251 ms
[03/10/2021-01:26:13] [I] max: 20.0069 ms
[03/10/2021-01:26:13] [I] mean: 10.3562 ms
[03/10/2021-01:26:13] [I] median: 10.2275 ms
[03/10/2021-01:26:13] [I] percentile: 11.1216 ms at 99%
[03/10/2021-01:26:13] [I] total compute time: 3.02401 s
&&&& PASSED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic.onnx_int8_trtexec_3.engine --int8 --shapes='input':2x3x608x608

@pranavm-nvidia
Copy link
Collaborator

Looks right - with batch size 1, the latency is 6.45ms and with batch size 2 it's 10.35ms. Those numbers seem reasonable to me.

@vilmara
Copy link
Author

vilmara commented Mar 10, 2021

@pranavm-nvidia, and the throughput? with shape>1 it shows 0 qps (I guest it means FPS)

@pranavm-nvidia
Copy link
Collaborator

@vilmara Yeah hadn't noticed that before. That looks like a bug in trtexec. I think the generated engine should be fine though

@vilmara
Copy link
Author

vilmara commented Mar 10, 2021

@pranavm-nvidia, It seems the generated model with trtexec has issues with its deployment on DS-Triton. Is there another sample/tool that shows how to optimize a YOLO Pytorch-ONNX to TensorRT engine INT8 mode with full INT8 calibration and dynamic input shapes?. I have reported the DS-Triton issue here triton-inference-server/server#2606

@pranavm-nvidia
Copy link
Collaborator

@ttyio ttyio added Release: 7.x Topic: Dynamic Shape triaged Issue has been triaged by maintainers labels Mar 11, 2021
@vilmara
Copy link
Author

vilmara commented Mar 16, 2021

You could look at https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/sampleDynamicReshape

Hi @pranavm-nvidia, I will take a look at it later on.

In regards to the === Build and Inference Batch Options === in trtexec. What options should I use to build the engine with dynamic input shapes, so it can be deployed later on with DS with Bs >1?. Right now I am getting the error TensorRT engine only supports max-batch 1 with DS

image

See below:
Build the engine with dymanic shapes:
$ /usr/src/tensorrt/bin/trtexec --onnx=yolov4_-1_3_608_608_dynamic.onnx --explicitBatch --minShapes=\'input\':1x3x608x608 --optShapes=\'input\':4x3x608x608 --maxShapes=\'input\':8x3x608x608 --workspace=4096 --saveEngine=yolov4_-1_3_608_608_dynamic_int8_.engine --int8

Run the inference with trtexec and default batch size
$ /usr/src/tensorrt/bin/trtexec --loadEngine=yolov4_-1_3_608_608_dynamic_onnx_int8_trtexec_4.engine --int8
Result:

[03/16/2021-00:23:14] [I] Host Latency
[03/16/2021-00:23:14] [I] min: 7.01904 ms (end to end 12.0889 ms)
[03/16/2021-00:23:14] [I] max: 7.89343 ms (end to end 13.8339 ms)
[03/16/2021-00:23:14] [I] mean: 7.15021 ms (end to end 12.3533 ms)
[03/16/2021-00:23:14] [I] median: 7.09982 ms (end to end 12.2517 ms)
[03/16/2021-00:23:14] [I] percentile: 7.88986 ms at 99% (end to end 13.818 ms at 99%)
[03/16/2021-00:23:14] [I] throughput: 160.912 qps
[03/16/2021-00:23:14] [I] walltime: 3.02029 s
[03/16/2021-00:23:14] [I] Enqueue Time
[03/16/2021-00:23:14] [I] min: 1.4646 ms
[03/16/2021-00:23:14] [I] max: 1.79004 ms
[03/16/2021-00:23:14] [I] median: 1.48828 ms
[03/16/2021-00:23:14] [I] GPU Compute
[03/16/2021-00:23:14] [I] min: 6.0675 ms
[03/16/2021-00:23:14] [I] max: 6.93729 ms
[03/16/2021-00:23:14] [I] mean: 6.1978 ms
[03/16/2021-00:23:14] [I] median: 6.14783 ms
[03/16/2021-00:23:14] [I] percentile: 6.9351 ms at 99%
[03/16/2021-00:23:14] [I] total compute time: 3.01213 s

Print the engine's input and output shapes:

input shape :  (-1, 3, 608, 608)
out shape :  (-1, 22743, 1, 4)

Deploy the engine with DS

Run inference on DS with max_batch_size=1
$ deepstream-app -c source1_primary_yolov4.txt

I0316 01:15:35.232182 159 model_repository_manager.cc:810] loading: yolov4_nvidia:1
I0316 01:15:46.895954 159 plan_backend.cc:333] Creating instance yolov4_nvidia_0_0_gpu0 on GPU 0 (7.5) using yolov4_-1_3_608_608_dynamic_onnx_int8_trtexec_4.engine
I0316 01:15:47.333165 159 plan_backend.cc:666] Created instance yolov4_nvidia_0_0_gpu0 on GPU 0 with stream priority 0 and optimization profile default[0];
I0316 01:15:47.334265 159 model_repository_manager.cc:983] successfully loaded 'yolov4_nvidia' version 1
INFO: infer_trtis_backend.cpp:206 TrtISBackend id:1 initialized model: yolov4_nvidia

Runtime commands:
        h: Print this help
        q: Quit

        p: Pause
        r: Resume

NOTE: To expand a source in the 2D tiled display and view object details, left-click on the source.
      To go back to the tiled display, right-click anywhere on the window.


**PERF:  FPS 0 (Avg)
**PERF:  0.00 (0.00)
** INFO: <bus_callback:181>: Pipeline ready

** INFO: <bus_callback:167>: Pipeline running

**PERF:  138.28 (138.17)
**PERF:  141.00 (139.60)
** INFO: <bus_callback:204>: Received EOS. Exiting ...

Quitting
I0316 01:16:00.336337 159 model_repository_manager.cc:837] unloading: yolov4_nvidia:1
I0316 01:16:00.338973 159 server.cc:280] Waiting for in-flight requests to complete.
I0316 01:16:00.338986 159 server.cc:295] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0316 01:16:00.378079 159 model_repository_manager.cc:966] successfully unloaded 'yolov4_nvidia' version 1
I0316 01:16:01.339052 159 server.cc:295] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
App run successful

Run inference on DS with max_batch_size=4
$ deepstream-app -c source1_primary_yolov4.txt
Error:

E0316 01:20:12.879238 195 model_repository_manager.cc:1705] unable to autofill for 'yolov4_nvidia', configuration specified max-batch 4 but TensorRT engine only supports max-batch 1
ERROR: infer_trtis_server.cpp:1044 Triton: failed to load model yolov4_nvidia, triton_err_str:Internal, err_msg:failed to load 'yolov4_nvidia', no version is available
ERROR: infer_trtis_backend.cpp:45 failed to load model: yolov4_nvidia, nvinfer error:NVDSINFER_TRTIS_ERROR
ERROR: infer_trtis_backend.cpp:184 failed to initialize backend while ensuring model:yolov4_nvidia ready, nvinfer error:NVDSINFER_TRTIS_ERROR
0:00:14.484600140   195 0x56007e1c7cf0 ERROR          nvinferserver gstnvinferserver.cpp:362:gst_nvinfer_server_logger:<primary_gie> nvinferserver[UID 1]: Error in createNNBackend() <infer_trtis_context.cpp:246> [UID = 1]: failed to initialize trtis backend for model:yolov4_nvidia, nvinfer error:NVDSINFER_TRTIS_ERROR
I0316 01:20:12.879481 195 server.cc:280] Waiting for in-flight requests to complete.
I0316 01:20:12.879488 195 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
0:00:14.484704250   195 0x56007e1c7cf0 ERROR          nvinferserver gstnvinferserver.cpp:362:gst_nvinfer_server_logger:<primary_gie> nvinferserver[UID 1]: Error in initialize() <infer_base_context.cpp:81> [UID = 1]: create nn-backend failed, check config file settings, nvinfer error:NVDSINFER_TRTIS_ERROR
0:00:14.484716684   195 0x56007e1c7cf0 WARN           nvinferserver gstnvinferserver_impl.cpp:439:start:<primary_gie> error: Failed to initialize InferTrtIsContext
0:00:14.484722696   195 0x56007e1c7cf0 WARN           nvinferserver gstnvinferserver_impl.cpp:439:start:<primary_gie> error: Config file path: /workspace/Deepstream_5.1_Triton/samples/configs/deepstream-app-trtis/config_infer_primary_yolov4.txt
0:00:14.485106084   195 0x56007e1c7cf0 WARN           nvinferserver gstnvinferserver.cpp:460:gst_nvinfer_server_start:<primary_gie> error: gstnvinferserver_impl start failed
** ERROR: <main:655>: Failed to set pipeline to PAUSED
Quitting
ERROR from primary_gie: Failed to initialize InferTrtIsContext
Debug info: gstnvinferserver_impl.cpp(439): start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie:
Config file path: /workspace/Deepstream_5.1_Triton/samples/configs/deepstream-app-trtis/config_infer_primary_yolov4.txt
ERROR from primary_gie: gstnvinferserver_impl start failed
Debug info: gstnvinferserver.cpp(460): gst_nvinfer_server_start (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInferServer:primary_gie
App run failed

@pranavm-nvidia
Copy link
Collaborator

@vilmara It looks like you've built the TRT engine correctly. Regarding the deepstream issue, you'd probably want to ask here: https://forums.developer.nvidia.com/c/accelerated-computing/intelligent-video-analytics/deepstream-sdk/15

@vilmara
Copy link
Author

vilmara commented Mar 16, 2021

Hi @pranavm-nvidia, thanks for helping to build the TRT engine correctly, I have submitted the new issue at the forum

@ttyio
Copy link
Collaborator

ttyio commented May 27, 2021

Closing since no remaining issue in this thread according to last comment, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Topic: Dynamic Shape triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants