Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add NMS with Tensorflow Model (that was converted to ONNX) #1379

Closed
letdivedeep opened this issue Jul 16, 2021 · 43 comments
Closed

How to add NMS with Tensorflow Model (that was converted to ONNX) #1379

letdivedeep opened this issue Jul 16, 2021 · 43 comments
Labels
ONNX triaged Issue has been triaged by maintainers

Comments

@letdivedeep
Copy link

I have taken an ssdlite mobile net v2 model from the tensorflow model zoo

steps :

  1. generated the onnx model using the tf2onnx lib
    python -m tf2onnx.convert --graphdef mv2/ssdlite_mobilenet_v2_coco_2018_05_09/frozen_inference_graph.pb --output MODEL_frozen.onnx \ --fold_const --opset 11 \ --inputs image_tensor:0 \ --outputs num_detections:0,detection_boxes:0,detection_scores:0,detection_classes:0

  2. add the nms layers in the onnx model based on refferences from this issue

import onnx_graphsurgeon as gs
import onnx
import numpy as np

input_model_path = "MODEL_frozen.onnx"
output_model_path = "model_gs.onnx"

@gs.Graph.register()
def trt_batched_nms(self, boxes_input, scores_input, nms_output,
                    share_location, num_classes):

    boxes_input.outputs.clear()
    scores_input.outputs.clear()
    nms_output.inputs.clear()

    attrs = {
        "shareLocation": share_location,
        "numClasses": num_classes,
        "backgroundLabelId": 0,
        "topK": 116740,
        "keepTopK": 100,
        "scoreThreshold": 0.3,
        "iouThreshold": 0.6,
        "isNormalized": True,
        "clipBoxes": True
    }
    return self.layer(op="BatchedNMS_TRT", attrs=attrs,
                      inputs=[boxes_input, scores_input],
                      outputs=[nms_output])


graph = gs.import_onnx(onnx.load(input_model_path))
graph.inputs[0].shape=[1,300,300,3]
print(graph.inputs[0].shape)

for inp in graph.inputs:
    inp.dtype = np.int

input = graph.inputs[0]

tmap = graph.tensors()

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/NonMaxSuppressionV5__1761:0"],
                      tmap["NonMaxSuppression__1763:0"],
                      share_location=False,
                      num_classes=8)

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_1/NonMaxSuppressionV5__1737:0"],
                      tmap["NonMaxSuppression__1739:0"],
                      share_location=False,
                      num_classes=8)

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1713:0"],
                      tmap["NonMaxSuppression__1715:0"],
                      share_location=False,
                      num_classes=8)

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/NonMaxSuppressionV5__1689:0"],
                      tmap["NonMaxSuppression__1691:0"],
                      share_location=False,
                      num_classes=8)


# Remove unused nodes, and topologically sort the graph.
# graph.cleanup()
# graph.toposort()
# graph.fold_constants().cleanup()

# Export the ONNX graph from graphsurgeon
onnx.checker.check_model(gs.export_onnx(graph))
onnx.save_model(gs.export_onnx(graph), output_model_path)

print("Saving the ONNX model to {}".format(output_model_path))

I am not able to figure it out in the onnx graph which nodes i should repalce in place of "Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0" and other

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
      
tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/NonMaxSuppressionV5__1761:0"],
                      tmap["NonMaxSuppression__1763:0"],
                      share_location=False,
                      num_classes=8)

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_1/NonMaxSuppressionV5__1737:0"],
                      tmap["NonMaxSuppression__1739:0"],
                      share_location=False,
                      num_classes=8)

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1713:0"],
                      tmap["NonMaxSuppression__1715:0"],
                      share_location=False,
                      num_classes=8)

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/NonMaxSuppressionV5__1689:0"],
                      tmap["NonMaxSuppression__1691:0"],
                      share_location=False,
                      num_classes=8)

MODEL_frozen.onnx.zip

I have also attach the onnx file. Any sugeestions how to find it ?

@letdivedeep
Copy link
Author

@pranavm-nvidia can you help me out with this

@pranavm-nvidia
Copy link
Collaborator

@letdivedeep It doesn't look like tf2onnx is generating an NMS node in your ONNX graph and it's not entirely clear to me which part of the model is responsible for NMS (I'm guessing at least the final loop node and some of its preceding nodes?). The NMS outputs seem fairly straightforward, as they're the network outputs, but for the inputs, you'll need to track down where the boxes and scores are generated.

@letdivedeep
Copy link
Author

@pranavm-nvidia Thanks for your reply.

This may be a very nerd question
- I am a bit confused about which nodes names go in those arrays and how is the decision made, can you help me on this (cant we have a simpler way )
- I also converted this model from tflite-> onnx which has NMS layer mentioned in it (for this too confused on which node names to be included in those arrays)

Attaching the tflite->onnx model for references
tflite_to_onnx.zip

@pranavm-nvidia
Copy link
Collaborator

@letdivedeep The goal of the script is to replace a group of nodes in the original graph with a BatchedNMSDynamic_TRT node which will be imported by TRT as a plugin. The names are the input/output tensors to this node (the existing subgraph between them will be replaced).

This is not required in TRT 8, which supports the ONNX NMS op out-of-the-box, so your tflite model should work without any modifications.

@letdivedeep
Copy link
Author

@pranavm-nvidia thanks for the clarification

The Nvidia jetpack SDK still officially supports TensorRT 7.1.3 : https://developer.nvidia.com/embedded/jetpack

Any Idea on when we going to get an upgrade to 8.0 of jetpack SDK?

@Source82
Copy link

Source82 commented Jul 23, 2021

@letdivedeep The goal of the script is to replace a group of nodes in the original graph with a BatchedNMSDynamic_TRT node which will be imported by TRT as a plugin. The names are the input/output tensors to this node (the existing subgraph between them will be replaced).

This is not required in TRT 8, which supports the ONNX NMS op out-of-the-box, so your tflite model should work without any modifications.

it says that "it has the limitation that the output shape is always padded to length" please can you clarify on this, what post processing is required. Have you got an example somewhere please.

@pranavm-nvidia
Copy link
Collaborator

Any Idea on when we going to get an upgrade to 8.0 of jetpack SDK?

I'm not sure, sorry

@pranavm-nvidia
Copy link
Collaborator

it says that "it has the limitation that the output shape is always padded to length" please can you clarify on this, what post processing is required. Have you got an example somewhere please.

Can you clarify what you mean by "it"? If you mean the NMS plugin, the output shape is determined by some of the plugin parameters.

@letdivedeep
Copy link
Author

letdivedeep commented Jul 27, 2021

names are the input/o
@letdivedeep The goal of the script is to replace a group of nodes in the original graph with a BatchedNMSDynamic_TRT node which will be imported by TRT as a plugin. The names are the input/output tensors to this node (the existing subgraph between them will be replaced).

This is not required in TRT 8, which supports the ONNX NMS op out-of-the-box, so your tflite model should work without any modifications.

@pranavm-nvidia Thanks for your reply and i really appreciate your efforts.

I want to use tensorRT 7.1 itself, thus need to replace those NMS layers for it to work

I am very new to this tensorRT part , I am still not clear about the names. I have taken the above shared tflite to onnx converted model as an references and attached an screen shot

Screenshot 2021-07-27 at 8 57 22 PM

Now as i want to replace the NonMaxSuppression node it has two inputs (Transpose__656:0,Concat__652:0) and two outputs (Slice__661:0,Slice__669:0)

##  parameters required are boxes_input, scores_input, nms_output, share_location, num_classes
graph.trt_batched_nms(tmap["what should come here ?=> boxes"],
tmap[["what should come here ?=> scores_input"],
                      tmap["what should come here ?=> nms_output"],
                      share_location=False,
                      num_classes=8)

Your Inputs will be of great help
tflite_to_onnx.zip

@pranavm-nvidia
Copy link
Collaborator

@letdivedeep One issue with that is that the behavior of the BatchedNMS plugin is different from ONNX NMS (e.g. different outputs), so it won't work as a drop-in replacement here.

So I'd recommend starting off by reading about the outputs of the plugin and then determining how those should map to the tensors in the model (I'm guessing you'll probably just want to connect the NMS directly to the model outputs).

Once you know the tensor names, it's just a matter of looking them up in the tensor map (tmap) and then using trt_batched_nms to insert the plugin node in that location. Using the tensors you mentioned, that would be:

graph.trt_batched_nms(tmap["Transpose__656"], tmap[["Concat__652"], ...)

Also keep in mind that the BatchedNMS plugin has 4 outputs (not just 1), so you'll probably want to change trt_batched_nms to let you pass in more than one output, i.e. instead of:

    return self.layer(op="BatchedNMS_TRT", attrs=attrs,
                      inputs=[boxes_input, scores_input],
                      outputs=[nms_output])

you could do:

    return self.layer(op="BatchedNMS_TRT", attrs=attrs,
                      inputs=[boxes_input, scores_input],
                      outputs=outputs) # Outputs is a List[Tensor]

@Source82
Copy link

it says that "it has the limitation that the output shape is always padded to length" please can you clarify on this, what post processing is required. Have you got an example somewhere please.

Can you clarify what you mean by "it"? If you mean the NMS plugin, the output shape is determined by some of the plugin parameters.

Hello,

Please what am I doing wrong. Help me please.

My tensorrt conversion keeps failing with this error.

WARNING: ONNX model has a newer ir_version (0.0.7) than this parser was built against (0.0.6).
[2021-07-26 14:57:16 INFO] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 0, GPU 254 (MiB)
Parsing model
[2021-07-26 14:57:16 WARNING] /content/nvidia/onnx-tensorrt/onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[2021-07-26 14:57:16 WARNING] /content/nvidia/onnx-tensorrt/onnx2trt_utils.cpp:390: One or more weights outside the range of INT32 was clamped
While parsing node number 255 [TopK → “StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/SortByField/TopKV2:0”]:
— Begin node —
input: “StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Select:0”
input: “Unsqueeze__600:0”
output: “StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/SortByField/TopKV2:0”
output: “StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/SortByField/TopKV2:1”
name: “StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/SortByField/TopKV2”
op_type: “TopK”
attribute {
name: “sorted”
i: 1
type: INT
}

— End node —
ERROR: /content/nvidia/onnx-tensorrt/builtin_op_importers.cpp:4293 In function importTopK:
[8] Assertion failed: (inputs.at(1).is_weights()) && “This version of TensorRT only supports input K as an initializer.”

I have used onnx-simplifier yet it the error persists. I have also used polygraphy surgeon all to no avail.

@letdivedeep
Copy link
Author

letdivedeep commented Jul 28, 2021

outputs of the plugin

@pranavm-nvidia thanks for the detailed explanation this helped me understand and debug better

I followed certain steps :

1) Get the output node names prior to NMS from frozen graph :

I checked the frozen graph to understand the model output prior to NMS as seen below and found the the reshape and logistic were the model output :

frozen

2) Clear the output of this two nodes and use this as inputs to the BatchedNMSDynamic_TRT node.

3) Clear the inputs to the output nodes (["TFLite_Detection_PostProcess", "TFLite_Detection_PostProcess:1", "TFLite_Detection_PostProcess:2","TFLite_Detection_PostProcess:3"]) and add this as output of BatchedNMSDynamic_TRT node.

the generated onnx graph after above steps looks like this

modified

the other nodes which were earlier there is the graph which contributed to the NMS output part were removed
Screenshot 2021-07-28 at 7 36 07 PM

Am i on the right direction or need to do something more

@pranavm-nvidia
Copy link
Collaborator

@letdivedeep Yes, that seems right to me. Were you able to try running it?

@letdivedeep
Copy link
Author

letdivedeep commented Jul 29, 2021

@pranavm-nvidia ... No i am getting this error

2021-07-29 20:23:07.824 [INFO ] [TRT] - Model Name: /usr/local/share/ubuntu/models/model_f32_replace_nms.onnx.ls
2021-07-29 20:23:09.969 [INFO ] [TRT] - Begin parsing model...
2021-07-29 20:23:10.024 [ERROR] [TRT] - Could not register plugin creator -  ::FlattenConcat_TRT version 1
2021-07-29 20:23:10.026 [WARN ] [TRT] - onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
2021-07-29 20:23:10.045 [ERROR] [TRT] - INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMSDynamic_TRT version 1
2021-07-29 20:23:10.046 [ERROR] [TRT] - Failure while parsing ONNX file
2021-07-29 20:23:10.046 [INFO ] [TRT] - End parsing model...

using tensorrt version 7.1.3

@pranavm-nvidia
Copy link
Collaborator

What does your code look like? Are you calling initLibNvInferPlugins()?

@letdivedeep
Copy link
Author

@pranavm-nvidia .. No i am not calling initLibNvInferPlugins()?

I have generated new model with BatchedNMS_TRT, as tensorRT v7.1 does not support BatchedNMSDynamic_TRT

this is my code base

import onnx_graphsurgeon as gs
import onnx
import numpy as np

# Input and output onnx model path
input_model_path = "model_f32.onnx"
output_model_path = "model_f32_nms.onnx"

# Add BatchedNMS_TRT plugin method
@gs.Graph.register()
def trt_batched_nms(self, boxes_input, scores_input, nms_output,
                    share_location, num_classes):
    # and other attrs
    boxes_input.outputs.clear()
    scores_input.outputs.clear()
    # nms_output.inputs.clear()

    attrs = {
        "share_location": share_location,
        "num_classes": num_classes,
        # etc.
    }
    return self.layer(op="BatchedNMS_TRT", attrs=attrs,
                      inputs=[boxes_input, scores_input],
                      outputs=nms_output)

# Load the onnx graph
graph = gs.import_onnx(onnx.load(input_model_path))

# Set the input shape
graph.inputs[0].shape = [1, 300, 300, 3]
print(graph.inputs[0].shape)

# Convert dtype into int
for inp in graph.inputs:
    inp.dtype = np.int


tmap = graph.tensors()

outArray = ["TFLite_Detection_PostProcess", "TFLite_Detection_PostProcess:1", "TFLite_Detection_PostProcess:2",
            "TFLite_Detection_PostProcess:3"]

# Clear all output nodes inputs
for i in range(len(outArray)):
    nms_out_test = tmap[outArray[i]]
    nms_out_test.inputs.clear()

# Create a list of the outputs
nms_out = []
for i in range(len(outArray)):
    nms_out.append(tmap[outArray[i]])

# Call the method
graph.trt_batched_nms(tmap["Squeeze"], tmap["convert_scores"],
                      nms_out, share_location=True,
                      num_classes=8)

# Clean the graph
graph.cleanup().toposort()

# Export the ONNX graph from graphsurgeon
onnx.checker.check_model(gs.export_onnx(graph))

# Save the model 
onnx.save_model(gs.export_onnx(graph), output_model_path)

print("Saving the ONNX model to {}".format(output_model_path))

model_nms.onnx.zip

@pranavm-nvidia
Copy link
Collaborator

What does your TRT code look like? i.e. how are you building the engine?

@ttyio ttyio added ONNX Topic: ONNX Plugin triaged Issue has been triaged by maintainers labels Aug 2, 2021
@letdivedeep
Copy link
Author

letdivedeep commented Aug 3, 2021

@pranavm-nvidia

I am trying to convert the onnx model to .trt using the trtexec on docker with tensort 7.2.3.4 :
trtexec --onnx=model.onnx --saveEngine=engine.trt --explicitBatch

Getting the following error :

[08/03/2021-12:46:38] [E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:704: --- Begin node ---
[08/03/2021-12:46:38] [E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:705: input: "Relu6__12:0"
input: "const_fold_opt__974"
input: "FeatureExtractor/MobilenetV2/expanded_conv/depthwise/depthwise_bias"
output: "FeatureExtractor/MobilenetV2/expanded_conv/depthwise/Relu6"
name: "FeatureExtractor/MobilenetV2/expanded_conv/depthwise/Relu6"
op_type: "Conv"
attribute {
  name: "strides"
  ints: 1
  ints: 1
  type: INTS
}
attribute {
  name: "dilations"
  ints: 1
  ints: 1
  type: INTS
}
attribute {
  name: "kernel_shape"
  ints: 3
  ints: 3
  type: INTS
}
attribute {
  name: "group"
  i: 32
  type: INT
}
attribute {
  name: "pads"
  ints: 1
  ints: 1
  ints: 1
  ints: 1
  type: INTS
}

[08/03/2021-12:46:38] [E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:706: --- End node ---
[08/03/2021-12:46:38] [E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:708: ERROR: /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/builtin_op_importers.cpp:548 In function importConv:
[8] Assertion failed: nbSpatialDims == kernelWeights.shape.nbDims - 2
[08/03/2021-12:46:38] [E] Failed to parse onnx file
[08/03/2021-12:46:38] [E] Parsing model failed
[08/03/2021-12:46:38] [E] Engine creation failed
[08/03/2021-12:46:38] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # trtexec --onnx=3_Aug/polygraphy-nms-static_batch-shape.onnx --saveEngine=rb_v8_engine.trt --explicitBatch

I Tried various approaches

  1. To replace the BatchedNMS_TRT with BatchedNMSDynamic_TRT to deal with dynamic shapes.

  2. Used Polygraphy to freeze the input shapes issue
    polygraphy

  3. To remove the reshape ops after the concat op as told in this issue

But all this lead to the same above error, @pranavm-nvidia can you help me figure out what's going wrong

I have attached all the three onnx model {BatchedNMS_TRT, BatchedNMSDynamic_TRT, without_Squeeze and polygraphy}
drive

@pranavm-nvidia
Copy link
Collaborator

@letdivedeep Can you check if the exported ONNX model (prior to adding the BatchedNMS node) works with ONNX-Runtime? You can use Polygraphy to try it out:

polygraphy run </path/to/model.onnx> --onnxrt

I'm wondering if it could be a bug in tf2onnx

@letdivedeep
Copy link
Author

polygraphy run </path/to/model.onnx> --onnxrt

sure will try this out

@letdivedeep
Copy link
Author

@pranavm-nvidia

its passes the test
Screenshot 2021-08-03 at 7 17 31 PM

Moreover i have converted to onnx using the tflite model

@pranavm-nvidia
Copy link
Collaborator

And if you run that original ONNX model with TRT, does it still fail in the same place? The error you posted seems to be happening well before the NMS, so I'm guessing the problem is probably not with your node replacement script.

@letdivedeep
Copy link
Author

letdivedeep commented Aug 3, 2021

@pranavm-nvidia Ya Agree

If i try to run the original ONNX model with TRT i get this error :

[08/03/2021-14:04:45] [I] [TRT] No importer registered for op: NonMaxSuppression. Attempting to import as plugin.
[08/03/2021-14:04:45] [I] [TRT] Searching for plugin: NonMaxSuppression, plugin_version: 1, plugin_namespace: 
[08/03/2021-14:04:45] [E] [TRT] INVALID_ARGUMENT: getPluginCreator could not find plugin NonMaxSuppression version 1
[08/03/2021-14:04:45] [E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:703: While parsing node number 200 [NonMaxSuppression -> "TFLite_Detection_PostProcess_NonMaxSuppression__655:0"]:
[08/03/2021-14:04:45] [E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:704: --- Begin node ---
[08/03/2021-14:04:45] [E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:705: input: "Concat__650:0"
input: "Transpose__654:0"
input: "max_boxes_per_class__653"
input: "iou_threshold__651"
input: "score_threshold__652"
output: "TFLite_Detection_PostProcess_NonMaxSuppression__655:0"
name: "TFLite_Detection_PostProcess_NonMaxSuppression__655"
op_type: "NonMaxSuppression"
attribute {
  name: "center_point_box"
  i: 0
  type: INT
}
domain: ""

[08/03/2021-14:04:45] [E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:706: --- End node ---
[08/03/2021-14:04:45] [E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:708: ERROR: /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/builtin_op_importers.cpp:4298 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[08/03/2021-14:04:45] [E] Failed to parse onnx file
[08/03/2021-14:04:45] [E] Parsing model failed
[08/03/2021-14:04:45] [E] Engine creation failed
[08/03/2021-14:04:45] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # trtexec --onnx=3_Aug/rb_v8_tflite_model.onnx --saveEngine=rb_v8_engine.trt --explicitBatch

the Above error suggests that the NMS op is not supported ... thus I added the BatchedNMS node using the following script
which is pretty straight forward as suggested above

 
import onnx_graphsurgeon as gs
import onnx
import numpy as np

# Input and output onnx model path
input_model_path = "model_f32.onnx"
output_model_path = "model_f32_nms.onnx"

# Add BatchedNMS_TRT plugin method
@gs.Graph.register()
def trt_batched_nms(self, boxes_input, scores_input, nms_output,
                    share_location, num_classes):
    # and other attrs
    boxes_input.outputs.clear()
    scores_input.outputs.clear()
    # nms_output.inputs.clear()

    attrs = {
        "share_location": share_location,
        "num_classes": num_classes,
        # etc.
    }
    return self.layer(op="BatchedNMS_TRT", attrs=attrs,
                      inputs=[boxes_input, scores_input],
                      outputs=nms_output)

# Load the onnx graph
graph = gs.import_onnx(onnx.load(input_model_path))

# Set the input shape
graph.inputs[0].shape = [1, 300, 300, 3]
print(graph.inputs[0].shape)

# Convert dtype into int
for inp in graph.inputs:
    inp.dtype = np.int


tmap = graph.tensors()

outArray = ["TFLite_Detection_PostProcess", "TFLite_Detection_PostProcess:1", "TFLite_Detection_PostProcess:2",
            "TFLite_Detection_PostProcess:3"]

# Clear all output nodes inputs
for i in range(len(outArray)):
    nms_out_test = tmap[outArray[i]]
    nms_out_test.inputs.clear()

# Create a list of the outputs
nms_out = []
for i in range(len(outArray)):
    nms_out.append(tmap[outArray[i]])

# Call the method
graph.trt_batched_nms(tmap["Squeeze"], tmap["convert_scores"],
                      nms_out, share_location=True,
                      num_classes=8)

# Clean the graph
graph.cleanup().toposort()

# Export the ONNX graph from graphsurgeon
onnx.checker.check_model(gs.export_onnx(graph))

# Save the model 
onnx.save_model(gs.export_onnx(graph), output_model_path)

print("Saving the ONNX model to {}".format(output_model_path))


cant figure out whats going wrong

@letdivedeep
Copy link
Author

Hi @pranavm-nvidia, i was able to resolve the above error by commenting these lines in the above code, which deal with change in the dtype

# Convert dtype into int
# for inp in graph.inputs:
#    inp.dtype = np.int

logs from polygraphy for this model where there is a warning for " Unsupported operator BatchedNMS_TRT. No schema registered for this operator."

root@b95c6856a974:/workspace# polygraphy surgeon sanitize 5_Aug/rb_v8_nms_BatchedNMS_TRT_5Aug.onnx     --override-input-shapes normalized_input_image_tensor:[1,300,300,3] 
    -o 5_Aug/polygraphy-nms-static-shape_5_aug_node.onnx
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] Loading model: /workspace/5_Aug/rb_v8_nms_BatchedNMS_TRT_5Aug.onnx
Warning: Unsupported operator BatchedNMS_TRT. No schema registered for this operator.
[I] Original Model:
    Name: tf2onnx | Opset: 11
    
    ---- 1 Graph Input(s) ----
    {normalized_input_image_tensor [dtype=float32, shape=(1, 300, 300, 3)]}
    
    ---- 4 Graph Output(s) ----
    {TFLite_Detection_PostProcess [dtype=float32, shape=(1, 20, 4)],
     TFLite_Detection_PostProcess:1 [dtype=float32, shape=(1, 20)],
     TFLite_Detection_PostProcess:2 [dtype=float32, shape=(1, 20)],
     TFLite_Detection_PostProcess:3 [dtype=float32, shape=(1,)]}
    
    ---- 180 Initializer(s) ----
    
    ---- 184 Node(s) ----
    
[I] Overriding input shapes to:
    {normalized_input_image_tensor [dtype=float32, shape=(1, 300, 300, 3)]}
[I] Saving ONNX model to: 5_Aug/polygraphy-nms-static-shape_5_aug_node.onnx
Warning: Unsupported operator BatchedNMS_TRT. No schema registered for this operator.
[I] New Model:
    Name: tf2onnx | Opset: 11
    
    ---- 1 Graph Input(s) ----
    {normalized_input_image_tensor [dtype=float32, shape=(1, 300, 300, 3)]}
    
    ---- 4 Graph Output(s) ----
    {TFLite_Detection_PostProcess [dtype=float32, shape=()],
     TFLite_Detection_PostProcess:1 [dtype=float32, shape=()],
     TFLite_Detection_PostProcess:2 [dtype=float32, shape=()],
     TFLite_Detection_PostProcess:3 [dtype=float32, shape=()]}
    
    ---- 180 Initializer(s) ----
    
    ---- 184 Node(s) ---- 

And when trying to convert to .trt with the trtexec command i get this error

[08/05/2021-10:03:16] [V] [TRT] Registering layer: convert_scores for ONNX node: convert_scores
[08/05/2021-10:03:16] [V] [TRT] Registering tensor: convert_scores for ONNX tensor: convert_scores
[08/05/2021-10:03:16] [V] [TRT] convert_scores [Sigmoid] outputs: [convert_scores -> (1, 1917, 9)], 
[08/05/2021-10:03:16] [V] [TRT] Parsing node: onnx_graphsurgeon_node_0 [BatchedNMS_TRT]
[08/05/2021-10:03:16] [V] [TRT] Searching for input: concat
[08/05/2021-10:03:16] [V] [TRT] Searching for input: convert_scores
[08/05/2021-10:03:16] [V] [TRT] onnx_graphsurgeon_node_0 [BatchedNMS_TRT] inputs: [concat -> (1, 1917, 1, 4)], [convert_scores -> (1, 1917, 9)], 
[08/05/2021-10:03:16] [I] [TRT] No importer registered for op: BatchedNMS_TRT. Attempting to import as plugin.
[08/05/2021-10:03:16] [I] [TRT] Searching for plugin: BatchedNMS_TRT, plugin_version: 1, plugin_namespace: 
terminate called after throwing an instance of 'std::out_of_range'
  what():  Attribute not found: shareLocation
Aborted (core dumped)

the detailed logs is available here
logs.txt

I even tried with using an lower of opset 11 as suggested in this issue

still the same error. @pranavm-nvidia can you guide me on this

@pranavm-nvidia
Copy link
Collaborator

@letdivedeep The plugin accepts various attributes, which you'd need to populate in attrs:

    attrs = {
        "shareLocation": share_location,
        "numClasses": num_classes,
        # etc.
    }

Note: The plugin attributes should be camelCase, not snake_case, which is why you're seeing the error:

  what():  Attribute not found: shareLocation

@letdivedeep
Copy link
Author

letdivedeep commented Aug 5, 2021

@pranavm-nvidia thanks for the reply

I even added this set of attribute

 attrs = {
        "shareLocation": share_location,
        "numClasses": num_classes,
        "backgroundLabelId": 0,
        "topK": 100,
        "keepTopK": 100,
        "scoreThreshold": 0.2,
        "iouThreshold": 0.6,
        "isNormalized": True,
        "clipBoxes": True,
         "scoreBits":16

    }

after that once i run the trtexec cmd i get the following error

[08/05/2021-15:23:46] [V] [TRT] convert_scores [Sigmoid] inputs: [concat_1 -> (1, 1917, 9)], 
[08/05/2021-15:23:46] [V] [TRT] Registering layer: convert_scores for ONNX node: convert_scores
[08/05/2021-15:23:46] [V] [TRT] Registering tensor: convert_scores for ONNX tensor: convert_scores
[08/05/2021-15:23:46] [V] [TRT] convert_scores [Sigmoid] outputs: [convert_scores -> (1, 1917, 9)], 
[08/05/2021-15:23:46] [V] [TRT] Parsing node: Squeeze [Reshape]
[08/05/2021-15:23:46] [V] [TRT] Searching for input: concat
[08/05/2021-15:23:46] [V] [TRT] Searching for input: const_fold_opt__793
[08/05/2021-15:23:46] [V] [TRT] Squeeze [Reshape] inputs: [concat -> (1, 1917, 1, 4)], [const_fold_opt__793 -> (3)], 
[08/05/2021-15:23:46] [V] [TRT] Registering layer: Squeeze for ONNX node: Squeeze
[08/05/2021-15:23:46] [V] [TRT] Registering tensor: Squeeze for ONNX tensor: Squeeze
[08/05/2021-15:23:46] [V] [TRT] Squeeze [Reshape] outputs: [Squeeze -> (1, 1917, 4)], 
[08/05/2021-15:23:46] [V] [TRT] Parsing node: onnx_graphsurgeon_node_0 [BatchedNMS_TRT]
[08/05/2021-15:23:46] [V] [TRT] Searching for input: Squeeze
[08/05/2021-15:23:46] [V] [TRT] Searching for input: convert_scores
[08/05/2021-15:23:46] [V] [TRT] onnx_graphsurgeon_node_0 [BatchedNMS_TRT] inputs: [Squeeze -> (1, 1917, 4)], [convert_scores -> (1, 1917, 9)], 
[08/05/2021-15:23:46] [I] [TRT] No importer registered for op: BatchedNMS_TRT. Attempting to import as plugin.
[08/05/2021-15:23:46] [I] [TRT] Searching for plugin: BatchedNMS_TRT, plugin_version: 1, plugin_namespace: 
[08/05/2021-15:23:46] [I] [TRT] Successfully created plugin: BatchedNMS_TRT
[08/05/2021-15:23:46] [V] [TRT] Registering layer: onnx_graphsurgeon_node_0 for ONNX node: onnx_graphsurgeon_node_0
[08/05/2021-15:23:46] [F] [TRT] Assertion failed: inputs[0].nbDims == 3
/home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp:105
Aborting...


@pranavm-nvidia
Copy link
Collaborator

I think you also need to set scoreBits:

...
        "clipBoxes": True,
        "scoreBits": 16,
    }

@letdivedeep
Copy link
Author

letdivedeep commented Aug 5, 2021

@pranavm-nvidia thanks for the quick reply ...
I did it and now this error prompts up

[08/05/2021-15:23:46] [V] [TRT] convert_scores [Sigmoid] inputs: [concat_1 -> (1, 1917, 9)], 
[08/05/2021-15:23:46] [V] [TRT] Registering layer: convert_scores for ONNX node: convert_scores
[08/05/2021-15:23:46] [V] [TRT] Registering tensor: convert_scores for ONNX tensor: convert_scores
[08/05/2021-15:23:46] [V] [TRT] convert_scores [Sigmoid] outputs: [convert_scores -> (1, 1917, 9)], 
[08/05/2021-15:23:46] [V] [TRT] Parsing node: Squeeze [Reshape]
[08/05/2021-15:23:46] [V] [TRT] Searching for input: concat
[08/05/2021-15:23:46] [V] [TRT] Searching for input: const_fold_opt__793
[08/05/2021-15:23:46] [V] [TRT] Squeeze [Reshape] inputs: [concat -> (1, 1917, 1, 4)], [const_fold_opt__793 -> (3)], 
[08/05/2021-15:23:46] [V] [TRT] Registering layer: Squeeze for ONNX node: Squeeze
[08/05/2021-15:23:46] [V] [TRT] Registering tensor: Squeeze for ONNX tensor: Squeeze
[08/05/2021-15:23:46] [V] [TRT] Squeeze [Reshape] outputs: [Squeeze -> (1, 1917, 4)], 
[08/05/2021-15:23:46] [V] [TRT] Parsing node: onnx_graphsurgeon_node_0 [BatchedNMS_TRT]
[08/05/2021-15:23:46] [V] [TRT] Searching for input: Squeeze
[08/05/2021-15:23:46] [V] [TRT] Searching for input: convert_scores
[08/05/2021-15:23:46] [V] [TRT] onnx_graphsurgeon_node_0 [BatchedNMS_TRT] inputs: [Squeeze -> (1, 1917, 4)], [convert_scores -> (1, 1917, 9)], 
[08/05/2021-15:23:46] [I] [TRT] No importer registered for op: BatchedNMS_TRT. Attempting to import as plugin.
[08/05/2021-15:23:46] [I] [TRT] Searching for plugin: BatchedNMS_TRT, plugin_version: 1, plugin_namespace: 
[08/05/2021-15:23:46] [I] [TRT] Successfully created plugin: BatchedNMS_TRT
[08/05/2021-15:23:46] [V] [TRT] Registering layer: onnx_graphsurgeon_node_0 for ONNX node: onnx_graphsurgeon_node_0
[08/05/2021-15:23:46] [F] [TRT] Assertion failed: inputs[0].nbDims == 3
/home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp:105
Aborting...

log.txt

@pranavm-nvidia
Copy link
Collaborator

Can you try using the BatchedNMSDynamic_TRT plugin instead and not squeezing the inputs?

@letdivedeep
Copy link
Author

Can you try using the BatchedNMSDynamic_TRT plugin instead and not squeezing the inputs?

cool will try that and let you know

@letdivedeep
Copy link
Author

letdivedeep commented Aug 5, 2021

@pranavm-nvidia

An different error related to output_trt_dtype

[08/05/2021-15:37:30] [V] [TRT] Registering tensor: concat for ONNX tensor: concat
[08/05/2021-15:37:30] [V] [TRT] concat [Concat] outputs: [concat -> (1, 1917, 1, 4)], 
[08/05/2021-15:37:30] [V] [TRT] Parsing node: convert_scores [Sigmoid]
[08/05/2021-15:37:30] [V] [TRT] Searching for input: concat_1
[08/05/2021-15:37:30] [V] [TRT] convert_scores [Sigmoid] inputs: [concat_1 -> (1, 1917, 9)], 
[08/05/2021-15:37:30] [V] [TRT] Registering layer: convert_scores for ONNX node: convert_scores
[08/05/2021-15:37:30] [V] [TRT] Registering tensor: convert_scores for ONNX tensor: convert_scores
[08/05/2021-15:37:30] [V] [TRT] convert_scores [Sigmoid] outputs: [convert_scores -> (1, 1917, 9)], 
[08/05/2021-15:37:30] [V] [TRT] Parsing node: onnx_graphsurgeon_node_0 [BatchedNMSDynamic_TRT]
[08/05/2021-15:37:30] [V] [TRT] Searching for input: concat
[08/05/2021-15:37:30] [V] [TRT] Searching for input: convert_scores
[08/05/2021-15:37:30] [V] [TRT] onnx_graphsurgeon_node_0 [BatchedNMSDynamic_TRT] inputs: [concat -> (1, 1917, 1, 4)], [convert_scores -> (1, 1917, 9)], 
[08/05/2021-15:37:30] [I] [TRT] No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[08/05/2021-15:37:30] [I] [TRT] Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[08/05/2021-15:37:30] [I] [TRT] Successfully created plugin: BatchedNMSDynamic_TRT
[08/05/2021-15:37:30] [V] [TRT] Registering layer: onnx_graphsurgeon_node_0 for ONNX node: onnx_graphsurgeon_node_0
[08/05/2021-15:37:30] [V] [TRT] Registering tensor: TFLite_Detection_PostProcess_0 for ONNX tensor: TFLite_Detection_PostProcess
[08/05/2021-15:37:30] [V] [TRT] Registering tensor: TFLite_Detection_PostProcess:1_1 for ONNX tensor: TFLite_Detection_PostProcess:1
[08/05/2021-15:37:30] [V] [TRT] Registering tensor: TFLite_Detection_PostProcess:2_2 for ONNX tensor: TFLite_Detection_PostProcess:2
[08/05/2021-15:37:30] [V] [TRT] Registering tensor: TFLite_Detection_PostProcess:3_3 for ONNX tensor: TFLite_Detection_PostProcess:3
[08/05/2021-15:37:30] [V] [TRT] onnx_graphsurgeon_node_0 [BatchedNMSDynamic_TRT] outputs: [TFLite_Detection_PostProcess -> (1, 1)], [TFLite_Detection_PostProcess:1 -> (1, 100, 4)], [TFLite_Detection_PostProcess:2 -> (1, 100)], [TFLite_Detection_PostProcess:3 -> (1, 100)], 
[08/05/2021-15:37:30] [V] [TRT] Marking TFLite_Detection_PostProcess_0 as output: TFLite_Detection_PostProcess
[08/05/2021-15:37:30] [E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:708: ERROR: /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:567 In function importModel:
[8] Assertion failed: output_tensor_ptr->getType() != nvinfer1::DataType::kINT32 || output_trt_dtype == nvinfer1::DataType::kINT32
[08/05/2021-15:37:30] [E] Failed to parse onnx file
[08/05/2021-15:37:30] [E] Parsing model failed
[08/05/2021-15:37:30] [E] Engine creation failed
[08/05/2021-15:37:30] [E] Engine set up failed

log.txt

@pranavm-nvidia
Copy link
Collaborator

You could probably work around this by setting the output data types in your script:

for out in graph.outputs:
    out.dtype = np.float32

@letdivedeep
Copy link
Author

@pranavm-nvidia

I have added the above part but it gives the same error. I think its while converting to .trt its cast down to int32

root@5241314d5653:/workspace# trtexec --onnx=5_Aug/polygraphy_godsu.onnx --saveEngine=Engine.trt  --explicitBatch --verbose > log.txt
[W] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:227: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[E] [TRT] /home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:708: ERROR: /home/jenkins/agent/workspace/OSS/OSS_L0_MergeReque
st/oss/parsers/onnx/ModelImporter.cpp:567 In function importModel:
[8] Assertion failed: output_tensor_ptr->getType() != nvinfer1::DataType::kINT32 || output_trt_dtype == nvinfer1::DataType::kINT32
[E] Failed to parse onnx file
[E] Parsing model failed
[E] Engine creation failed
[E] Engine set up failed

@pranavm-nvidia
Copy link
Collaborator

Would you be able to share the model?

@letdivedeep
Copy link
Author

@pranavm-nvidia
Copy link
Collaborator

pranavm-nvidia commented Aug 5, 2021

Ah I got the error backwards - it expects the first output to be int32, but it's float in the model.

graph.outputs[0].dtype = np.int32

fixes that, but now I'm seeing:

[08/05/2021-09:07:22] [TRT] [F] Assertion failed: in[0].desc.dims.d[2] == numLocClasses
batchedNMSPlugin/batchedNMSPlugin.cpp:413
Aborting...

I think the reason is that the model is only producing boxes for a single class (concat [dtype=float32, shape=(1, 1917, 1, 4)), but in your script, you're setting numClasses to 8.

@letdivedeep
Copy link
Author

@pranavm-nvidia .. thanks for the debug .. I really appreciated your help.

when I changed numClasses to 1. It successfully converted to .trt file

But I am still confused about why this is happening. My model has 8 classes

I even tried with the ssdlite mobilenetV2model from the tfod 1.x model zoo model

Generate an tflite out from it, see the graph its with 90 coco classes :
tflite_90_class

converted to onnx tf2onnx converter :
python -m tf2onnx.convert --opset 11 --tflite mv2_float.tflite --output mv2_tflite_model.onnx

And later used the same code to add BatchedNMSDynamic_TRT to the graph with num_classes as 90
it gives the same above error which you got

[08/05/2021-17:31:20] [V] [TRT] Tactic: 1002 time 0.121252
[08/05/2021-17:31:20] [V] [TRT] Tactic: 0 time 0.011772
[08/05/2021-17:31:20] [V] [TRT] Fastest Tactic: 0 Time: 0.011772
[08/05/2021-17:31:20] [V] [TRT] *************** Autotuning format combination: Float(1,4,4,7668), Float(1,91,174447) -> Int32(1,1), Float(1,4,400), Float(1,100), Float(1,100) ***************
[08/05/2021-17:31:20] [V] [TRT] Formats and tactics selection completed in 102.21 seconds.
[08/05/2021-17:31:20] [V] [TRT] After reformat layers: 109 layers
[08/05/2021-17:31:20] [V] [TRT] Block size 16777216
[08/05/2021-17:31:20] [V] [TRT] Block size 8640000
[08/05/2021-17:31:20] [V] [TRT] Block size 3240448
[08/05/2021-17:31:20] [V] [TRT] Block size 1440256
[08/05/2021-17:31:20] [V] [TRT] Block size 394240
[08/05/2021-17:31:20] [V] [TRT] Block size 218624
[08/05/2021-17:31:20] [V] [TRT] Block size 51200
[08/05/2021-17:31:20] [V] [TRT] Block size 19968
[08/05/2021-17:31:20] [V] [TRT] Block size 17408
[08/05/2021-17:31:20] [V] [TRT] Block size 9728
[08/05/2021-17:31:20] [V] [TRT] Block size 9216
[08/05/2021-17:31:20] [V] [TRT] Block size 2560
[08/05/2021-17:31:20] [V] [TRT] Block size 2560
[08/05/2021-17:31:20] [V] [TRT] Block size 1024
[08/05/2021-17:31:20] [V] [TRT] Block size 512
[08/05/2021-17:31:20] [V] [TRT] Total Activation Memory: 30824960
[08/05/2021-17:31:20] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[08/05/2021-17:31:20] [F] [TRT] Assertion failed: in[0].desc.dims.d[2] == numLocClasses
/home/jenkins/agent/workspace/OSS/OSS_L0_MergeRequest/oss/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp:321
Aborting...

Ideally, in tflite it was the NMS layer that is to handle this part, don't know if I am wrong on this

But still i will try to run the file and let you know @pranavm-nvidia

Once again @pranavm-nvidia thanks for your help

@pranavm-nvidia
Copy link
Collaborator

I think you'd have to set numClasses to 91 in that case (90 + background).
Not sure why your model with 8 classes ends up only having one in the ONNX model though; maybe an exporter bug?

@letdivedeep
Copy link
Author

0 + background).
Not sure why yo

Maybe ... let me try to infer over few images and check whats the output

@letdivedeep
Copy link
Author

Hi @pranavm-nvidia

What I found was the output we get from the model is not actual bbox coordinates, it is just anchor boxes which are 1917 in the count.

Ideally, the Tensorflow NMS operator handles this part internally. While in ONNX, I think the batchedNMS plugin needs the bbox coordinates as an input info

This is what is causing the issue

@letdivedeep
Copy link
Author

Hi @pranavm-nvidia

I have raised an support ticket on the Nvidia forum: https://forums.developer.nvidia.com/t/adding-batchednmsdynamic-trt-plugin-in-the-ssd-mobilenet-onnx-model/185874

But have received no reply yet.

@pranavm-nvidia
Copy link
Collaborator

@letdivedeep Maybe you could try with TRT 8 since it's available in JP now? https://developer.nvidia.com/embedded/jetpack

As I mentioned before, TRT 8 should be able to import the ONNX NMS node, so you won't need to replace it with the batchedNMS plugin

@letdivedeep
Copy link
Author

@pranavm-nvidia thanks for the update and make sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ONNX triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants