Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to load Mobilenet V2 correctly in Cuda Engine (Python)? #259

Closed
Semihal opened this issue Dec 9, 2019 · 26 comments
Closed

How to load Mobilenet V2 correctly in Cuda Engine (Python)? #259

Semihal opened this issue Dec 9, 2019 · 26 comments

Comments

@Semihal
Copy link

Semihal commented Dec 9, 2019

Environment

TensorRT Version: 6.0.1.5
GPU Type: Quadro P5000
Nvidia Driver Version: 430.26
CUDA Version: Cuda compilation tools, release 10.1, V10.1.243
> nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019

> nvidia-smi

CUDA Version: 10.2

CUDNN Version:
Operating System + Version:
Python Version (if applicable):
TensorFlow Version (if applicable): v1.12.0-0-ga6d8ffae09 1.12.0
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt:19.11-py3

Steps To Reproduce

  1. Download Mobilenet v2 from Object Detection API (http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz)
  2. Export model to protobuff (use object_detection.export_inference_graph):
export PYTHONPATH=$(realpath ../tensorflow-models/research):$(realpath ../tensorflow-models/research/slim)
INPUT_TYPE=image_tensor
PIPELINE_CONFIG_PATH=$(realpath ssd_mobilenet_v2/pipeline.config)
TRAINED_CKPT_PREFIX=$(realpath ssd_mobilenet_v2/model.ckpt)
EXPORT_DIR=$(realpath ssd_mobilenet_v2/export/)
python3 -m object_detection.export_inference_graph \
    --input_type=image_tensor \
    --pipeline_config_path=${PIPELINE_CONFIG_PATH} \
    --trained_checkpoint_prefix=${TRAINED_CKPT_PREFIX} \
    --output_directory=${EXPORT_DIR}
  1. Convert ssd_mobilenet_v2/export/frozen_inference_graph.py to *.uff.
# config.py
import tensorflow as tf
import graphsurgeon as gs


def mobilenet_v2(ssd_graph, input_shape):
    """
    Reference:
        https://jkjung-avt.github.io/tensorrt-ssd/
    """
    
    # Find and remove all Assert Tensorflow nodes from the graph
    all_assert_nodes = ssd_graph.find_nodes_by_op("Assert")
    ssd_graph.remove(all_assert_nodes, remove_exclusive_dependencies=True)
    # Find all MultipleGridAnchorGenerator nodes and forward their inputs
    all_identity_nodes = ssd_graph.find_nodes_by_op("Identity")
    ssd_graph.forward_inputs(all_identity_nodes)

    # Create TRT plugin nodes to replace unsupported ops in Tensorflow graph
    channels, height, width = input_shape
    

    Input = gs.create_plugin_node(
        name="Input",
        op="Placeholder",
        dtype=tf.float32,
        shape=[1, channels, height, width]
    )
    
    PriorBox = gs.create_plugin_node(
        name="GridAnchor", 
        op="GridAnchor_TRT",
        minSize=0.2,
        maxSize=0.95,
        aspectRatios=[1.0, 2.0, 0.5, 3.0, 0.33],
        variance=[0.1,0.1,0.2,0.2],
        featureMapShapes=[19, 10, 5, 3, 2, 1], 
        numLayers=6
    )
    
    NMS = gs.create_plugin_node(
        name="NMS",
        op="NMS_TRT",
        shareLocation=1,
        varianceEncodedInTarget=0,
        backgroundLabelId=0,
        confidenceThreshold=1e-8,
        nmsThreshold=0.6,
        topK=100,
        keepTopK=100,
        numClasses=91,
        inputOrder=[1, 0, 2],
        confSigmoid=1,
        isNormalized=1,
        scoreConverter="SIGMOID"
    )
    
    concat_priorbox = gs.create_node(
        "concat_priorbox",
        op="ConcatV2",
        dtype=tf.float32,
        axis=2
    )
    
    concat_box_loc = gs.create_plugin_node(
        "concat_box_loc",
        op="FlattenConcat_TRT",
        dtype=tf.float32,
        axis=1,
        ignoreBatch=0
    )
    
    concat_box_conf = gs.create_plugin_node(
        "concat_box_conf",
        op="FlattenConcat_TRT",
        dtype=tf.float32,
        axis=1,
        ignoreBatch=0
    )

    # Create a mapping of namespace names -> plugin nodes.
    namespace_plugin_map = {
        "Postprocessor": NMS,
        "Preprocessor": Input,
        "ToFloat": Input,
        "image_tensor": Input,
        "MultipleGridAnchorGenerator": PriorBox,
        "Concatenate/concat": concat_priorbox,
        "Squeeze": concat_box_loc,
        "concat_1": concat_box_conf
    }

    # Create a new graph by collapsing namespaces
    ssd_graph.collapse_namespaces(namespace_plugin_map)
    # Remove the outputs, so we just have a single output node (NMS).
    # If remove_exclusive_dependencies is True, the whole graph will be removed!
    ssd_graph.remove(ssd_graph.graph_outputs, remove_exclusive_dependencies=False)
    # Disconnect the Input node from NMS, as it expects to have only 3 inputs
    ssd_graph.find_nodes_by_op('NMS_TRT')[0].input.remove('Input')

    return ssd_graph
# convert.py
from config import mobilenet_v2

dynamic_graph = gs.DynamicGraph(str(MODEL_PATH))
dynamic_graph = convert_plugin(dynamic_graph, INPUT_SHAPE)
_ = uff.from_tensorflow(
    dynamic_graph.as_graph_def(), 
    output_nodes=["NMS"], 
    output_filename=str(OUTPUT_UFF_FILENAME),
    text=True,
    debug_mode=False
)
  1. Load to CudaEngine.
ENGINE_PATH = "mobilenet_v2.engine"
TRT_LOGGER = trt.Logger(trt.Logger.INFO)
ctypes.CDLL('libflattenconcat.so')
trt.init_libnvinfer_plugins(TRT_LOGGER, '')

with trt.Builder(TRT_LOGGER) as builder, builder.create_network() as network, trt.UffParser() as parser:
    builder.max_workspace_size = 1 << 30
    builder.fp16_mode = True
    builder.max_batch_size = 1
    parser.register_input("Input", (3, 300, 300))
    parser.register_output("MarkOutput_0")
    parse_status = parser.parse(str(OUTPUT_UFF_FILENAME), network)
    print(f'Parse status: {parse_status}')

    trt_engine = builder.build_cuda_engine(network)
    print(f'Engine: {trt_engine}')

But I get this conclusion:

Parse status: False
Engine: None

How to determine what mistake was made? Does UffParser have any logs? How to properly import Mobilenet V2 into CudaEngine (Python)?

@rmccorm4 rmccorm4 changed the title How to load Mobiledit V2 correctly in Cuda Engine (Python)? How to load Mobilenet V2 correctly in Cuda Engine (Python)? Dec 9, 2019
@rmccorm4
Copy link
Collaborator

rmccorm4 commented Dec 9, 2019

Hi @Semihal,

You can try increasing the logger's verbosity: TRT_LOGGER(trt.Logger.VERBOSE), though I'm not sure how helpful it is when the parser fails - I think it generally provides more useful error catching outputs on building the engine.

For Mobilenet specifically, I think there are several public ONNX models you could try to use along with the ONNX Parser instead, such as from the ONNX model zoo: https://github.com/onnx/models/tree/master/vision/classification/mobilenet#model.

If you want to keep the TF code, you could also try to use tf2onnx to convert the .pb model to ONNX: https://github.com/onnx/tensorflow-onnx

There is some code in this repo that could help serve as a reference for converting the Mobilenet ONNX model to TensorRT: https://github.com/rmccorm4/tensorrt-utils/tree/master/classification/imagenet#onnx-model-zoo

You can also try using trtexec --onnx=<mobilenet.onnx>.

Also, if you encounter similar issues with the ONNX parser, it's API provides a get_error() method to help debug more. I don't believe the UFF parser has the same capability.

@rmccorm4
Copy link
Collaborator

rmccorm4 commented Dec 9, 2019

If you're determined to use UFF instead, I noticed that you're using CUDA 10.1, and I think there are currently some known issues with that due to TF binaries not being built with CUDA 10.1. There is some discussion on that in this issue: #123 (comment)

@Semihal
Copy link
Author

Semihal commented Dec 10, 2019

Hi @rmccorm4 ,

I found that there are some logs when running UFF -> CudaEngine conversion in the CLI (not present, for example, when running in Jupyter).

If you're determined to use UFF instead, I noticed that you're using CUDA 10.1, and I think there are currently some known issues with that due to TF binaries not being built with CUDA 10.1. There is some discussion on that in this issue: #123 (comment)

I tried running on Jetson Xavier with the following environment:

  • CUDNN: 7.5.0
  • Tensorflow: 1.13.1
  • TensorRT: 5.1.6.1
  • Python version: 3.6.8
  • Uff version: 0.6.3
    I get the following error:

[TensorRT] VERBOSE: Plugin Creator registration succeeded - GridAnchor_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - NMS_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Reorg_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Region_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Clip_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - LReLU_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - PriorBox_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - Normalize_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - RPROI_TRT
[TensorRT] VERBOSE: Plugin Creator registration succeeded - BatchedNMS_TRT
UFF Version 0.6.3
=== Automatically deduced input nodes ===
[name: "Input"
op: "Placeholder"
attr {
key: "dtype"
value {
type: DT_FLOAT
}
}
attr {
key: "shape"
value {
shape {
dim {
size: 1
}
dim {
size: 3
}
dim {
size: 300
}
dim {
size: 300
}
}
}
}
]
=========================================
Using output node NMS
Converting to UFF graph
Warning: No conversion function registered for layer: NMS_TRT yet.
Converting NMS as custom op: NMS_TRT
Warning: No conversion function registered for layer: FlattenConcat_TRT yet.
Converting concat_box_conf as custom op: FlattenConcat_TRT
Warning: No conversion function registered for layer: FlattenConcat_TRT yet.
Converting concat_box_loc as custom op: FlattenConcat_TRT
Warning: No conversion function registered for layer: GridAnchor_TRT yet.
Converting MultipleGridAnchorGenerator as custom op: GridAnchor_TRT
No. nodes: 585
[TensorRT] INFO: UFFParser: parsing MultipleGridAnchorGenerator
[libprotobuf FATAL /home/erisuser/p4sw/sw/gpgpu/MachineLearning/DIT/externals/protobuf/aarch64/10.0/include/google/protobuf/repeated_field.h:1408] CHECK failed: (index) < (current_size_):
Traceback (most recent call last):
File "ssd/build_engine.py", line 217, in
main()
File "ssd/build_engine.py", line 208, in main
parser.parse(spec['tmp_uff'], network)
RuntimeError: CHECK failed: (index) < (current_size_):

... UFF or ONNX?

For Mobilenet specifically, I think there are several public ONNX models you could try to use along with the ONNX Parser instead, such as from the ONNX model zoo: https://github.com/onnx/models/tree/master/vision/classification/mobilenet#model.

If you want to keep the TF code, you could also try to use tf2onnx to convert the .pb model to ONNX: https://github.com/onnx/tensorflow-onnx

I would like to use TensorFlow for model development, is there any fundamental difference in using UFF or ONNX?

@rmccorm4
Copy link
Collaborator

rmccorm4 commented Dec 10, 2019

I would like to use TensorFlow for model development, is there any fundamental difference in using UFF or ONNX?

The ONNX parser is generally better supported and tends to support more ops than UFF. There is also more infrastructure around ONNX, particularly with 3rd party tools for exporting, converting, etc. Especially Netron which is helpful for debugging bad ONNX models.

You can continue to use TF for development, but I would suggest trying tf2onnx to see if it works for you.

@Semihal
Copy link
Author

Semihal commented Dec 10, 2019

The ONNX parser is generally better supported and tends to support more ops than UFF. There is also more infrastructure around ONNX, particularly with 3rd party tools for exporting, converting, etc. Especially Netron which is helpful for debugging bad ONNX models.

You can continue to use TF for development, but I would suggest trying tf2onnx to see if it works for you.

During conversion sd_mobilenet_v2 (http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz) I get *.onnx files, but during conversion there are ReferenceError errors. A similar situation for mobilenet_v1 (http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz).

2019-12-10 12:45:48,624 - VERBOSE - tf2onnx.tfonnx: Summay Stats:
tensorflow ops: Counter({'Const': 1856, 'Gather': 549, 'Minimum': 452, 'Maximum': 360, 'Reshape': 305, 'Sub': 197, 'Cast': 185, 'Greater': 183, 'Where': 180, 'Split': 180, 'Add': 140, 'Mul': 135, 'StridedSlice': 121, 'Shape': 117, 'Pack': 115, 'ConcatV2': 108, 'Unpack': 94, 'Slice': 93, 'Squeeze': 92, 'ZerosLike': 92, 'NonMaxSuppressionV2': 90, 'Relu6': 35, 'Conv2D': 34, 'Switch': 29, 'Identity': 28, 'Enter': 26, 'RealDiv': 15, 'Merge': 14, 'DepthwiseConv2dNative': 13, 'Tile': 13, 'Range': 12, 'BiasAdd': 12, 'TensorArrayV3': 11, 'ExpandDims': 9, 'NextIteration': 8, 'TensorArrayWriteV3': 6, 'Exit': 6, 'TensorArraySizeV3': 6, 'TensorArrayGatherV3': 6, 'TensorArrayScatterV3': 5, 'TensorArrayReadV3': 5, 'Fill': 4, 'Assert': 3, 'Transpose': 3, 'Less': 2, 'LoopCond': 2, 'Exp': 2, 'Equal': 2, 'Placeholder': 1, 'ResizeBilinear': 1, 'Sigmoid': 1, 'Size': 1, 'TopKV2': 1})
tensorflow attr: Counter({'T': 3327, 'dtype': 1879, 'value': 1856, 'Tindices': 549, 'validate_indices': 549, 'Tparams': 549, 'Tshape': 305, 'N': 237, 'Index': 214, 'axis': 209, 'SrcT': 185, 'Truncate': 185, 'DstT': 185, 'num_split': 180, 'shrink_axis_mask': 121, 'begin_mask': 121, 'ellipsis_mask': 121, 'new_axis_mask': 121, 'end_mask': 121, 'Tidx': 120, 'out_type': 118, 'num': 94, 'squeeze_dims': 92, 'data_format': 59, 'dilations': 47, 'strides': 47, 'padding': 47, 'use_cudnn_on_gpu': 34, 'is_constant': 26, 'parallel_iterations': 26, 'frame_name': 26, 'element_shape': 17, 'Tmultiples': 13, 'dynamic_size': 11, 'clear_after_read': 11, 'identical_element_shapes': 11, 'tensor_array_name': 11, 'Tdim': 9, 'index_type': 4, 'summarize': 3, 'Tperm': 3, 'shape': 1, 'align_corners': 1, 'sorted': 1})
onnx mapped: Counter({'Const': 1808, 'Gather': 549, 'Minimum': 452, 'Maximum': 360, 'Reshape': 305, 'Sub': 197, 'Cast': 185, 'Greater': 183, 'Where': 180, 'Split': 180, 'Add': 140, 'Mul': 135, 'StridedSlice': 115, 'Pack': 115, 'Shape': 111, 'ConcatV2': 108, 'Unpack': 94, 'Slice': 93, 'Squeeze': 92, 'ZerosLike': 92, 'NonMaxSuppressionV2': 90, 'Relu6': 35, 'Conv2D': 34, 'RealDiv': 15, 'Tile': 13, 'DepthwiseConv2dNative': 13, 'BiasAdd': 12, 'ExpandDims': 9, 'Identity': 8, 'If': 6, 'Fill': 4, 'Transpose': 3, 'Loop': 2, 'Less': 2, 'Exp': 2, 'Placeholder': 1, 'ResizeBilinear': 1, 'Sigmoid': 1, 'TopKV2': 1, 'Range': 1})
onnx unmapped: Counter()
2019-12-10 12:45:56,116 - INFO - tf2onnx:
2019-12-10 12:45:59,663 - INFO - tf2onnx.optimizer: Optimizing ONNX model
2019-12-10 12:45:59,716 - VERBOSE - tf2onnx.optimizer: Apply optimize_transpose
2019-12-10 12:46:46,060 - VERBOSE - tf2onnx.optimizer.TransposeOptimizer: Add -47 (337->290), Const -20 (2076->2056), Gather +6 (554->560), Mul -13 (319->306), Reshape -13 (326->313), Transpose -81 (279->198)
2019-12-10 12:46:46,060 - VERBOSE - tf2onnx.optimizer: Apply fold_constants
2019-12-10 12:46:48,387 - WARNING - tf2onnx.optimizer: Failed to apply fold_constants
Traceback (most recent call last):
File "/home/jet/.local/lib/python3.6/site-packages/tf2onnx/optimizer/init.py", line 43, in optimize_graph
current = copy.deepcopy(graph)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 220, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/usr/lib/python3.6/copy.py", line 220, in
y = [deepcopy(a, memo) for a in x]
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 159, in deepcopy
copier = getattr(x, "deepcopy", None)
ReferenceError: weakly-referenced object no longer exists
2019-12-10 12:46:48,429 - VERBOSE - tf2onnx.optimizer: Apply merge_duplication
2019-12-10 12:46:49,340 - WARNING - tf2onnx.optimizer: Failed to apply merge_duplication
Traceback (most recent call last):
File "/home/jet/.local/lib/python3.6/site-packages/tf2onnx/optimizer/init.py", line 43, in optimize_graph
current = copy.deepcopy(graph)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 220, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/usr/lib/python3.6/copy.py", line 220, in
y = [deepcopy(a, memo) for a in x]
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 159, in deepcopy
copier = getattr(x, "deepcopy", None)
ReferenceError: weakly-referenced object no longer exists
2019-12-10 12:46:49,381 - VERBOSE - tf2onnx.optimizer: Apply remove_identity
2019-12-10 12:46:51,799 - WARNING - tf2onnx.optimizer: Failed to apply remove_identity
Traceback (most recent call last):
File "/home/jet/.local/lib/python3.6/site-packages/tf2onnx/optimizer/init.py", line 43, in optimize_graph
current = copy.deepcopy(graph)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 220, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/usr/lib/python3.6/copy.py", line 220, in
y = [deepcopy(a, memo) for a in x]
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/usr/lib/python3.6/copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "/usr/lib/python3.6/copy.py", line 150, in deepcopy
y = copier(x, memo)
File "/usr/lib/python3.6/copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/usr/lib/python3.6/copy.py", line 159, in deepcopy
copier = getattr(x, "deepcopy", None)
ReferenceError: weakly-referenced object no longer exists
2019-12-10 12:46:51,908 - INFO - tf2onnx.optimizer: After optimization: Add -47 (337->290), Const -20 (2076->2056), Gather +6 (554->560), Mul -13 (319->306), Reshape -13 (326->313), Transpose -81 (279->198)
2019-12-10 12:46:58,882 - INFO - tf2onnx:
2019-12-10 12:46:58,883 - INFO - tf2onnx: Successfully converted TensorFlow model /home/jet/Projects/TensorRT/work/ssd_mobilenet_v1_coco_2018_01_28/saved_model to ONNX
2019-12-10 12:46:59,906 - INFO - tf2onnx: ONNX model is saved at /home/jet/Projects/TensorRT/work/ssd_mobilenet_v1_coco_2018_01_28.onnx

But I'm still trying to run the conversion to engine through onnx_to_tensorrt.py and get the following:

jet@jet-desktop:~/Projects/TensorRT$ python3 onnx_to_tensorrt.py --onnx work/ssd_mobilenet_v1_coco_2018_01_28.onnx -o mobilenet_v1.engine
2019-12-10 13:03:06 - main - INFO - TRT_LOGGER Verbosity: Severity.ERROR
Unsupported ONNX data type: UINT8 (2)
2019-12-10 13:03:07 - main - ERROR - Failed to parse model.

@rmccorm4
Copy link
Collaborator

Hi @Semihal,

It looks like there are two steps to tf2onnx: (1) conversion and then (2) optimization. It seems like some errors happened during optimization, but it might not be necessary.

You can try to quickly verify the ONNX files with something like trtexec --onnx=model.onnx

You can also try to do some offline optimization with something like this: https://github.com/daquexian/onnx-simplifier#our-solution

But I haven't tried that myself.

@Semihal
Copy link
Author

Semihal commented Dec 11, 2019

You can try to quickly verify the ONNX files with something like trtexec --onnx=model.onnx

I tried running the trtexec command and got the following:

&&&& RUNNING TensorRT.trtexec # trtexec --onnx=work/ssd_mobilenet_v1_coco_2018_01_28.onnx
[11/11/2019-06:19:52] [I] === Model Options ===
[11/11/2019-06:19:52] [I] Format: ONNX
[11/11/2019-06:19:52] [I] Model: work/ssd_mobilenet_v1_coco_2018_01_28.onnx
[11/11/2019-06:19:52] [I] Output:
[11/11/2019-06:19:52] [I] === Build Options ===
[11/11/2019-06:19:52] [I] Max batch: 1
[11/11/2019-06:19:52] [I] Workspace: 16 MB
[11/11/2019-06:19:52] [I] minTiming: 1
[11/11/2019-06:19:52] [I] avgTiming: 8
[11/11/2019-06:19:52] [I] Precision: FP32
[11/11/2019-06:19:52] [I] Calibration:
[11/11/2019-06:19:52] [I] Safe mode: Disabled
[11/11/2019-06:19:52] [I] Save engine:
[11/11/2019-06:19:52] [I] Load engine:
[11/11/2019-06:19:52] [I] Inputs format: fp32:CHW
[11/11/2019-06:19:52] [I] Outputs format: fp32:CHW
[11/11/2019-06:19:52] [I] Input build shapes: model
[11/11/2019-06:19:52] [I] === System Options ===
[11/11/2019-06:19:52] [I] Device: 0
[11/11/2019-06:19:52] [I] DLACore:
[11/11/2019-06:19:52] [I] Plugins:
[11/11/2019-06:19:52] [I] === Inference Options ===
[11/11/2019-06:19:52] [I] Batch: 1
[11/11/2019-06:19:52] [I] Iterations: 10 (200 ms warm up)
[11/11/2019-06:19:52] [I] Duration: 10s
[11/11/2019-06:19:52] [I] Sleep time: 0ms
[11/11/2019-06:19:52] [I] Streams: 1
[11/11/2019-06:19:52] [I] Spin-wait: Disabled
[11/11/2019-06:19:52] [I] Multithreading: Enabled
[11/11/2019-06:19:52] [I] CUDA Graph: Disabled
[11/11/2019-06:19:52] [I] Skip inference: Disabled
[11/11/2019-06:19:52] [I] Input inference shapes: model
[11/11/2019-06:19:52] [I] === Reporting Options ===
[11/11/2019-06:19:52] [I] Verbose: Disabled
[11/11/2019-06:19:52] [I] Averages: 10 inferences
[11/11/2019-06:19:52] [I] Percentile: 99
[11/11/2019-06:19:52] [I] Dump output: Disabled
[11/11/2019-06:19:52] [I] Profile: Disabled
[11/11/2019-06:19:52] [I] Export timing to JSON file:
[11/11/2019-06:19:52] [I] Export profile to JSON file:
[11/11/2019-06:19:52] [I]
----------------------------------------------------------------
Input filename: work/ssd_mobilenet_v1_coco_2018_01_28.onnx
ONNX IR version: 0.0.5
Opset version: 10
Producer name: tf2onnx
Producer version: 1.5.3
Domain:
Model version: 0
Doc string:
----------------------------------------------------------------
WARNING: ONNX model has a newer ir_version (0.0.5) than this parser was built against (0.0.3).
Unsupported ONNX data type: UINT8 (2)
ERROR: ModelImporter.cpp:54 In function importInput:
[8] Assertion failed: convert_dtype(onnx_tensor_type.elem_type(), &trt_dtype)
[11/11/2019-06:19:53] [E] Failed to parse onnx file
[11/11/2019-06:19:53] [E] Parsing model failed
[11/11/2019-06:19:53] [E] Engine could not be created
&&&& FAILED TensorRT.trtexec # trtexec --onnx=work/ssd_mobilenet_v1_coco_2018_01_28.onnx

Unsupported ONNX data type: UINT8 error still appears :(

@Semihal
Copy link
Author

Semihal commented Dec 13, 2019

Hi @rmccorm4 ,

Do you have any more ideas how to deal with my problem?

@Semihal
Copy link
Author

Semihal commented Dec 14, 2019

I tried using graphsurgeon to change the data type and now get this error:

[TensorRT] ERROR: Parameter check failed at: ../builder/Network.cpp::addInput::671, condition: isValidDims(dims, hasImplicitBatchDimension())

, have any suggestions?

@rmccorm4
Copy link
Collaborator

Hi @Semihal,

I think another user had a similar issue here which was resolved: https://devtalk.nvidia.com/default/topic/1067555/tensorrt/tensorrt-inference-error-while-load-onnx-model/post/5409905/?offset=7#5412924

Please try the suggestions here and let me know if it helps.

@rmccorm4
Copy link
Collaborator

[TensorRT] ERROR: Parameter check failed at: ../builder/Network.cpp::addInput::671, condition: isValidDims(dims, hasImplicitBatchDimension())

You could try to use the --explicitBatch flag if using trtexec, or add the corresponding explicitBatch flag when creating the Network object if using the API.

@rmccorm4
Copy link
Collaborator

@Semihal your best bet would also be to try now with TensorRT 7.0 which was just released today: https://developer.nvidia.com/nvidia-tensorrt-download. It expanded a lot on the ONNX parser: https://docs.nvidia.com/deeplearning/sdk/tensorrt-release-notes/tensorrt-7.html#rel_7-0-0

@Semihal
Copy link
Author

Semihal commented Dec 19, 2019

@rmccorm4 , Thanks, I'll try!
But there is a problem, it seems TensorRT 7 has not yet come out for Jetson Nano :(

@rmccorm4
Copy link
Collaborator

Hi @Semihal,

Yes unfortunately that's correct: #292 (comment)

Did --explicitBatch help though?

@Semihal
Copy link
Author

Semihal commented Dec 25, 2019

Hi @rmccorm4,
No it didn't. I am now using Pytorch + ONNX.

@rmccorm4
Copy link
Collaborator

rmccorm4 commented Dec 25, 2019

Did that give you a new error?

Can you share an ONNX model to reproduce the error?

@lorenzolightsgdwarf
Copy link

Hi, I ran in the same issues with TensorTR 7 using the ONNX parser for a custom SSD Mobilinet v2. My net was trained with TF 1.14 using the object detection api and translated to onnx with tf2onnx using op 11. I've also changed the type of the image_tensor node to float32 to avoid the unsupported type (line 121 https://github.com/tensorflow/models/blob/master/research/object_detection/exporter.py), but then it fails to load some weights because they are int64. Have you find any solution? My last hope is that downgrading TF will make it work.

@rmccorm4
Copy link
Collaborator

Hi @lorenzolightsgdwarf,

Can you share the explicit commands you ran and errors you got? I thought TensorRT casted INT64 -> INT32 when possible and gave a warning, but I don't think I've seen that fail.

If possible as well, please share your ONNX model so I can reproduce the issue.

@JingyuQian
Copy link

@lorenzolightsgdwarf
I think I did exactly the same steps as you, including modifying exporter.py. Given the incoming
deadline of my project I switched to UFF in the end.

  1. this walkthrough is quite useful. Note that it used a model from the year 2018, so I checked out tf obj-det api to a 2018 commit and trained my model;
  2. I compiled TensorRT 7 from source, which gives me some freedom of changing the source code to examine input/outputs.
    Hope this helps.

@JingyuQian
Copy link

Hi @rmccorm4,
I once tried nvonnxparser in c++. The verbose log looks like this:

onnx2trt_utils.cpp:198: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
...(skipped)
node {
      input: "Resize__106:0"
      output: "Preprocessor/map/while/ResizeImage/resize/Squeeze:0"
      name: "Preprocessor/map/while/ResizeImage/resize/Squeeze"
      op_type: "Squeeze"
      attribute {
        name: "axes"
        ints: 0
        type: INTS
      }
      domain: ""
    }
    name: "tf2onnx"
    initializer {
      dims: 0
      data_type: FLOAT
      name: "roi__105"
      raw_data: ""
    }
    initializer {
      dims: 2
      data_type: FLOAT
      name: "one__100"
      raw_data: "\000\000\200?\000\000\200?"
    }
    initializer {
      dims: 1
      data_type: INT64
      name: "const_slice__95"
      raw_data: "\000\000\000\000\000\000\000\000"
    }
    initializer {
      dims: 1
      data_type: INT64
      name: "const_slice__94"
      raw_data: "\003\000\000\000\000\000\000\000"
    }
    initializer {
      dims: 1
      data_type: INT64
      name: "const_slice__93"
      raw_data: "\001\000\000\000\000\000\000\000"
    }
    initializer {
      dims: 2
      data_type: FLOAT
      name: "const_fold_opt__1602"
      raw_data: "\000\000 D\000\000 D"
    }
    doc_string: "graph for generic_loop_Loop__39 body"
    input {
      name: "i__22"
      type {
        tensor_type {
          elem_type: INT64
          shape {
          }
        }
      }
    }
    input {
      name: "cond__24"
      type {
        tensor_type {
          elem_type: BOOL
          shape {
          }
        }
      }
    }

The first line is what you suggested, but note that the following message contains int64. I wonder what that means. Thanks!

@rmccorm4
Copy link
Collaborator

Hi @JingyuQian,

For the TensorRT warning, it should be fine. It will give an error if the cast from INT64->INT32 failed (INT64 value was outside of the range of INT32).

As to why the INT64 types are generated, I'm not totally sure. Check this FAQ: Q: Does ONNX support implicit scalar datatype casting? https://pytorch.org/docs/stable/onnx.html
That might be related to why there are so commonly INT64 types. I guess that's the default when the type is inferred/unknown

@rmccorm4
Copy link
Collaborator

@Semihal did switching to PyTorch work out for you? Still having any issues?

@Semihal
Copy link
Author

Semihal commented Feb 28, 2020

@Semihal did switching to PyTorch work out for you? Still having any issues?

Hi! Yes, it is work.

@rmccorm4
Copy link
Collaborator

Awesome, glad to hear.

@Ram-Godavarthi
Copy link

Ram-Godavarthi commented Jun 29, 2020

I got the below error when i run ./trtexec --onxx=inception.onnx comand
Error repeated for boht custom trained and also standard ssd inception v2 model

WARNING: ONNX model has a newer ir_version (0.0.5) than this parser was built against (0.0.3).
Unsupported ONNX data type: UINT8 (2)
ERROR: ModelImporter.cpp:54 In function importInput:
[8] Assertion failed: convert_dtype(onnx_tensor_type.elem_type(), &trt_dtype)

Any suggestions for this??
I am using jetson nano with below specifications.

sudo apt-cache show nvidia-jetpack
Package: nvidia-jetpack
Version: 4.3-b134
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-container-csv-cuda (= 10.0.326-1), libopencv-python (= 4.1.1-2-gd5a58aa75), libvisionworks-sfm-dev (= 0.90.4), libvisionworks-dev (= 1.6.0.500n), libvisionworks-samples (= 1.6.0.500n), libnvparsers6 (= 6.0.1-1+cuda10.0), libcudnn7-doc (= 7.6.3.28-1+cuda10.0), libcudnn7-dev (= 7.6.3.28-1+cuda10.0), libnvinfer-samples (= 6.0.1-1+cuda10.0), libnvinfer-bin (= 6.0.1-1+cuda10.0), nvidia-container-csv-cudnn (= 7.6.3.28-1+cuda10.0), libvisionworks-tracking-dev (= 0.88.2), vpi-samples (= 0.1.0), tensorrt (= 6.0.1.10-1+cuda10.0), libopencv (= 4.1.1-2-gd5a58aa75), libnvinfer-doc (= 6.0.1-1+cuda10.0), libnvparsers-dev (= 6.0.1-1+cuda10.0), libcudnn7 (= 7.6.3.28-1+cuda10.0), libnvidia-container0 (= 0.9.0beta.1), cuda-toolkit-10-0 (= 10.0.326-1), nvidia-container-csv-visionworks (= 1.6.0.500n), graphsurgeon-tf (= 6.0.1-1+cuda10.0), libopencv-samples (= 4.1.1-2-gd5a58aa75), python-libnvinfer-dev (= 6.0.1-1+cuda10.0), libnvinfer-plugin-dev (= 6.0.1-1+cuda10.0), libnvinfer-plugin6 (= 6.0.1-1+cuda10.0), nvidia-container-toolkit (= 1.0.1-1), libnvinfer-dev (= 6.0.1-1+cuda10.0), libvisionworks (= 1.6.0.500n), libopencv-dev (= 4.1.1-2-gd5a58aa75), nvidia-l4t-jetson-multimedia-api (= 32.3.1-20191209225816), vpi-dev (= 0.1.0), vpi (= 0.1.0), python3-libnvinfer (= 6.0.1-1+cuda10.0), python3-libnvinfer-dev (= 6.0.1-1+cuda10.0), opencv-licenses (= 4.1.1-2-gd5a58aa75), nvidia-container-csv-tensorrt (= 6.0.1.10-1+cuda10.0), libnvinfer6 (= 6.0.1-1+cuda10.0), libnvonnxparsers-dev (= 6.0.1-1+cuda10.0), libnvonnxparsers6 (= 6.0.1-1+cuda10.0), uff-converter-tf (= 6.0.1-1+cuda10.0), nvidia-docker2 (= 2.2.0-1), libvisionworks-sfm (= 0.90.4), libnvidia-container-tools (= 0.9.0beta.1), nvidia-container-runtime (= 3.1.0-1), python-libnvinfer (= 6.0.1-1+cuda10.0), libvisionworks-tracking (= 0.88.2)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_4.3-b134_arm64.deb
Size: 29742
SHA256: 1fd73e258509822b928b274f61a413038a29c3705ee8eef351a914b9b1b060ce
SHA1: a7c4ab8b241ab1d2016d2c42f183c295e66d67fe
MD5sum: de856bb9607db87fd298faf7f7cc320f
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

@roborocklsm
Copy link

Actually, I would recommand to build pure TensorRT engine and load weights from pytorch/tensorflow models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants