Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use NMS with Pytorch model (that was converted to ONNX -> TensorRT) #795

Closed
ivanpanshin opened this issue Sep 23, 2020 · 83 comments
Closed
Labels
good-reference Plugins triaged Issue has been triaged by maintainers

Comments

@ivanpanshin
Copy link

All right, so, I have a PyTorch detector SSD with MobileNet. Since I failed to convert model with NMS in it (to be more precise, I converted it, but TRT engine is built in a wrong way with that .onnx file), I decided to leave NMS part to TRT.

In general, there are several ways to add NMS in TRT:

  1. Use graphsurgeon with TensorFlow model and add NMS as graphsurgeon.create_plugin_node
  2. Use CPP code for plugin (https://github.com/NVIDIA/TensorRT/tree/master/plugin/batchedNMSPlugin)
  3. Use DeepStream that has NMS plugin

But, I have a PyTorch model that I converted to onnx and then to TRT without any CPP code (Python only). My question is very simple: how can I combine my current pipeline with the CPP plugin for NMS?

@pranavm-nvidia
Copy link
Collaborator

You can use ONNX-GraphSurgeon to modify the ONNX model to include a plugin node. You can look at the onnx_packnet example for details.

@ivanpanshin
Copy link
Author

Wow, ONNX-GS, nice! And just to be 100% sure: it's possible to use ONNX-GS in the same way as the regular GS to use this CPP NMS plugin (https://github.com/NVIDIA/TensorRT/tree/master/plugin/batchedNMSPlugin), correct?

@pranavm-nvidia
Copy link
Collaborator

pranavm-nvidia commented Sep 24, 2020

Yes, correct. You can replace the old NMS subgraph/node with a new node whose op is set to the plugin name. There are two ways to do this, which I've outlined below.

Old-GS Style

In the simplest case, you can create the node and insert it into the graph like old GS, e.g.:

# Find tensors
tmap = graph.tensors()
boxes, scores, nms_out = tmap["boxes"], tmap["scores"], tmap["nms_out"]

# Disconnect old subgraph
boxes.outputs.clear()
scores.outputs.clear()
nms_out.inputs.clear()

attrs = {
    "share_location": False,
    "num_classes": 8,
    # etc.
}
node = gs.Node(op="BatchedNMS_TRT", attrs=attrs, 
               inputs=[boxes, scores], outputs=[nms_out])
graph.nodes.append(node)

Layer + Register API

If you're reusing this across multiple models, or using the plugin multiple times in the model, then the layer/register API might simplify things.

First you'd register a function to insert the plugin:

@gs.Graph.register()
def trt_batched_nms(self, boxes_input, scores_input, nms_output, 
                    share_location, num_classes): # and other attrs
    boxes_input.outputs.clear()
    scores_input.outputs.clear()
    nms_output.inputs.clear()

    attrs = {
        "share_location": share_location,
        "num_classes": num_classes,
        # etc.
    }
    return self.layer(op="BatchedNMS_TRT", attrs=attrs, 
                      inputs=[boxes_input, scores_input], 
                      outputs=[nms_output])

And then in your models, you can use it without all the boilerplate:

tmap = graph.tensors()

# Can also get attributes from the original graph instead of hard-coding
graph.trt_batched_nms(tmap["boxes"], tmap["scores"], 
                      tmap["nms_out"], share_location=True, 
                      num_classes=81)

graph.trt_batched_nms(tmap["boxes2"], tmap["scores2"], 
                      tmap["nms_out2"], share_location=False, 
                      num_classes=4)

@qraleq
Copy link

qraleq commented Nov 4, 2020

Hi, @pranavm-nvidia, I tried using the ONNX-GS method as you proposed and I've successfully replaced the NMS operator with BatchedNMS_TRT using the following code:

import onnx_graphsurgeon as gs
import onnx
import numpy as np

input_model_path = "model.onnx"
output_model_path = "model_gs.onnx"

@gs.Graph.register()
def trt_batched_nms(self, boxes_input, scores_input, nms_output,
                    share_location, num_classes):

    boxes_input.outputs.clear()
    scores_input.outputs.clear()
    nms_output.inputs.clear()

    attrs = {
        "shareLocation": share_location,
        "numClasses": num_classes,
        "backgroundLabelId": 0,
        "topK": 116740,
        "keepTopK": 100,
        "scoreThreshold": 0.3,
        "iouThreshold": 0.6,
        "isNormalized": True,
        "clipBoxes": True
    }
    return self.layer(op="BatchedNMS_TRT", attrs=attrs,
                      inputs=[boxes_input, scores_input],
                      outputs=[nms_output])


graph = gs.import_onnx(onnx.load(input_model_path))
graph.inputs[0].shape=[1,1280,720,3]
print(graph.inputs[0].shape)

for inp in graph.inputs:
    inp.dtype = np.int

input = graph.inputs[0]

tmap = graph.tensors()

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/NonMaxSuppressionV5__1761:0"],
                      tmap["NonMaxSuppression__1763:0"],
                      share_location=False,
                      num_classes=4)

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_1/NonMaxSuppressionV5__1737:0"],
                      tmap["NonMaxSuppression__1739:0"],
                      share_location=False,
                      num_classes=4)

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1713:0"],
                      tmap["NonMaxSuppression__1715:0"],
                      share_location=False,
                      num_classes=4)

graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
                      tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/NonMaxSuppressionV5__1689:0"],
                      tmap["NonMaxSuppression__1691:0"],
                      share_location=False,
                      num_classes=4)


# Remove unused nodes, and topologically sort the graph.
# graph.cleanup()
# graph.toposort()
# graph.fold_constants().cleanup()

# Export the ONNX graph from graphsurgeon
onnx.checker.check_model(gs.export_onnx(graph))
onnx.save_model(gs.export_onnx(graph), output_model_path)

print("Saving the ONNX model to {}".format(output_model_path))

The problem I'm currently facing is that onnx.checker.check_model(gs.export_onnx(graph)) function returns following error:

onnx.onnx_cpp2py_export.checker.ValidationError: 
Node (NonMaxSuppression__1763) has output size 0 not in range [min=1, max=1].

==> Context: Bad node spec: 
input: "const_fold_opt__2119" 
input: "Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/iou_threshold:0" 
input:"Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/score_threshold:0" 
name: "NonMaxSuppression__1763" op_type: "NonMaxSuppression"

I think that the problem is in builtin_op_importers.cpp, where I added this code similar to #523:

{
DEFINE_BUILTIN_OP_IMPORTER(NonMaxSuppression)
    // NonMaxSuppression is not supported opset below 10.
    ASSERT(ctx->getOpsetVersion() >= 10, ErrorCode::kUNSUPPORTED_NODE);

    nvinfer1::ITensor* boxes_tensor = &convertToTensor(inputs.at(0), ctx);
    nvinfer1::ITensor* scores_tensor = &convertToTensor(inputs.at(1), ctx);
    const int numInputs = inputs.size();
    LOG_ERROR("no of inputs are "<<numInputs);
    LOG_ERROR("node outsize and op type are "<<node.output().size()<< " type " << node.op_type());

    const auto scores_dims = scores_tensor->getDimensions();
    const auto boxes_dims = boxes_tensor->getDimensions();
    
    LOG_ERROR("boxes dims "<< boxes_dims.nbDims << " dim3 has size "<<boxes_dims.d[2]);
    const std::string pluginName = "BatchedNMS_TRT";
    const std::string pluginVersion = "1";
    std::vector<nvinfer1::PluginField> f;

    /*
    bool share_location = true;
    const bool is_normalized = true;
    const bool clip_boxes = true;
    int backgroundLabelId = 0;
    
    // Initialize.
    f.emplace_back("shareLocation", &share_location, nvinfer1::PluginFieldType::kINT8, 1);
    f.emplace_back("isNormalized", &is_normalized, nvinfer1::PluginFieldType::kINT8, 1);
    f.emplace_back("clipBoxes", &clip_boxes, nvinfer1::PluginFieldType::kINT8, 1);
    f.emplace_back("backgroundLabelId", &backgroundLabelId, nvinfer1::PluginFieldType::kINT32, 1);
    */
    // Create plugin from registry
    //nvinfer1::IPluginV2* plugin = importPluginFromRegistry(ctx, pluginName, pluginVersion, node.name$
    nvinfer1::IPluginV2* plugin = createPlugin(node.name(), importPluginCreator(pluginName, pluginVers$

    ASSERT(plugin != nullptr && "NonMaxSuppression plugin was not found in the plugin registry!",
        ErrorCode::kUNSUPPORTED_NODE);

    std::vector<nvinfer1::ITensor*> nms_inputs ={boxes_tensor, scores_tensor};
    RETURN_FIRST_OUTPUT(ctx->network()->addPluginV2(nms_inputs.data(), nms_inputs.size(), *plugin));
}

Do you have an example of properly registering BatchedNMS_TRT as a plugin?
Also, do you have end to end example of replacing BatchedNMS with a TRT version?

@pranavm-nvidia
Copy link
Collaborator

@qraleq You should not need to modify the parser at all in TRT 7.1 and later (for any plugin, not just NMS). Modifying the node in the ONNX model should be enough.

Since BatchedNMS_TRT is not technically a valid ONNX op, it's expected that ONNX checker will fail.
From the error message though, it seems like you might have missed one?

name: "NonMaxSuppression__1763" op_type: "NonMaxSuppression"

Do you have an example of properly registering BatchedNMS_TRT as a plugin?

BatchedNMS_TRT is shipped as a TRT plugin, so you do not need any extra steps to register it.

Also, do you have end to end example of replacing BatchedNMS with a TRT version?

I don't know of an end-to-end example that uses NMS specifically, but we do have an example of GroupNorm (onnx-packnet).

However, I think the code you have right now should be nearly working if you uncomment these 2 lines:

# graph.cleanup()
# graph.toposort()

You can try the resulting ONNX model with trtexec to check, and it should automatically pick up the plugin.

@qraleq
Copy link

qraleq commented Nov 4, 2020

@pranavm-nvidia Tnx for the fast response. I commented out the changes to the parser and rebuilt onnx-tensorrt. Now when I uncomment the two lines as you proposed, I get the following error when running with trtexec:

[11/04/2020-21:04:04] [W] [TRT] /home/f/Development/onnx-tensorrt/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[11/04/2020-21:04:04] [W] [TRT] /home/f/Development/onnx-tensorrt/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/04/2020-21:04:04] [W] [TRT] /home/f/Development/onnx-tensorrt/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/04/2020-21:04:04] [W] [TRT] /home/f/Development/onnx-tensorrt/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[11/04/2020-21:04:04] [I] [TRT] /home/f/Development/onnx-tensorrt/ModelImporter.cpp:139: No importer registered for op: BatchedNMS_TRT. Attempting to import as plugin.
[11/04/2020-21:04:04] [I] [TRT] /home/f/Development/onnx-tensorrt/builtin_op_importers.cpp:3775: Searching for plugin: BatchedNMS_TRT, plugin_version: 1, plugin_namespace: 
[11/04/2020-21:04:04] [I] [TRT] /home/f/Development/onnx-tensorrt/builtin_op_importers.cpp:3792: Successfully created plugin: BatchedNMS_TRT
[11/04/2020-21:04:04] [E] [TRT] onnx_graphsurgeon_node_0: PluginV2Layer must be V2DynamicExt when there are runtime input dimensions.
[11/04/2020-21:04:04] [I] [TRT] /home/f/Development/onnx-tensorrt/ModelImporter.cpp:139: No importer registered for op: BatchedNMS_TRT. Attempting to import as plugin.
[11/04/2020-21:04:04] [I] [TRT] /home/f/Development/onnx-tensorrt/builtin_op_importers.cpp:3775: Searching for plugin: BatchedNMS_TRT, plugin_version: 1, plugin_namespace: 
[11/04/2020-21:04:04] [I] [TRT] /home/f/Development/onnx-tensorrt/builtin_op_importers.cpp:3792: Successfully created plugin: BatchedNMS_TRT
[11/04/2020-21:04:04] [E] [TRT] onnx_graphsurgeon_node_0: PluginV2Layer must be V2DynamicExt when there are runtime input dimensions.
[11/04/2020-21:04:04] [I] [TRT] /home/f/Development/onnx-tensorrt/ModelImporter.cpp:139: No importer registered for op: BatchedNMS_TRT. Attempting to import as plugin.
[11/04/2020-21:04:04] [I] [TRT] /home/f/Development/onnx-tensorrt/builtin_op_importers.cpp:3775: Searching for plugin: BatchedNMS_TRT, plugin_version: 1, plugin_namespace: 
[11/04/2020-21:04:04] [I] [TRT] /home/f/Development/onnx-tensorrt/builtin_op_importers.cpp:3792: Successfully created plugin: BatchedNMS_TRT
[11/04/2020-21:04:04] [E] [TRT] onnx_graphsurgeon_node_0: PluginV2Layer must be V2DynamicExt when there are runtime input dimensions.
[11/04/2020-21:04:04] [E] [TRT] onnx_graphsurgeon_node_0: PluginV2Layer must be V2DynamicExt when there are runtime input dimensions.
[11/04/2020-21:04:04] [E] [TRT] onnx_graphsurgeon_node_0: PluginV2Layer must be V2DynamicExt when there are runtime input dimensions.
[11/04/2020-21:04:04] [E] [TRT] onnx_graphsurgeon_node_0: PluginV2Layer must be V2DynamicExt when there are runtime input dimensions.
While parsing node number 351 [Slice]:
ERROR: /home/f/Development/onnx-tensorrt/builtin_op_importers.cpp:3154 In function importSlice:
[4] Assertion failed: -r <= axis && axis < r
[11/04/2020-21:04:04] [E] Failed to parse onnx file
[11/04/2020-21:04:04] [E] Parsing model failed
[11/04/2020-21:04:04] [E] Engine creation failed
[11/04/2020-21:04:04] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # /usr/src/tensorrt/bin/trtexec --onnx=model_gs.onnx

Do you have an idea what might be wrong?

@pranavm-nvidia
Copy link
Collaborator

@qraleq Looks like your model is using dynamic shapes somewhere:

[11/04/2020-21:04:04] [E] [TRT] onnx_graphsurgeon_node_0: PluginV2Layer must be V2DynamicExt when there are runtime input dimensions.

Seems like the model input dimensions are fixed though, so it must be due to some intermediate layer whose dimensions can't be resolved at build time.
Can you try using the BatchedNMSDynamic_TRT plugin instead?

The good news is that it does seem to be finding the plugin correctly at least:

[11/04/2020-21:04:04] [I] [TRT] /home/f/Development/onnx-tensorrt/builtin_op_importers.cpp:3792: Successfully created plugin: BatchedNMS_TRT

Also I think you'll probably need to populate the other parameters expected by the plugin (see here) or you might run into issues during inference.

@qraleq
Copy link

qraleq commented Nov 4, 2020

@pranavm-nvidia Yeah, I also figured that something "dynamic" is still leftover in the graph, but can't figure out what.

I tried to use BatchedNMSDynamic_TRT, but I get error saying:

[11/04/2020-21:32:10] [I] [TRT] /home/f/Development/onnx-tensorrt/ModelImporter.cpp:139: No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[11/04/2020-21:32:10] [I] [TRT] /home/f/Development/onnx-tensorrt/builtin_op_importers.cpp:3775: Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace: 
[11/04/2020-21:32:10] [E] [TRT] INVALID_ARGUMENT: getPluginCreator could not find plugin BatchedNMSDynamic_TRT version 1
While parsing node number 834 [BatchedNMSDynamic_TRT]:
ERROR: /home/f/Development/onnx-tensorrt/builtin_op_importers.cpp:3777 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"

I'm using TensorRT 7.1.3.0-1+cuda10.2 on Jetson AGX. How can I make sure that dynamic NMS gets recognized by the parser?

@pranavm-nvidia
Copy link
Collaborator

@qraleq The dynamic version was added in 7.2, so that would explain why it doesn't recognize it.

Would you be able to share the verbose trtexec logs (add --verbose) and/or your ONNX model?

@qraleq
Copy link

qraleq commented Nov 4, 2020

@pranavm-nvidia Please find the log attached.
log.txt

Can I send you the model directly somehow?

@pranavm-nvidia
Copy link
Collaborator

@qraleq You can email me the model/link to the model at pranavm@nvidia.com

Seems like it's this Resize that's introducing dynamic shapes:

ModelImporter.cpp:107: Parsing node: Resize__1007 [Resize]
ModelImporter.cpp:129: Resize__1007 [Resize] inputs: [Transpose__1002:0 -> (1, 3, 1280, 720)], [roi__998 -> ()], [roi__998 -> ()], [Concat__1006:0 -> (4)], 
ModelImporter.cpp:183: Resize__1007 [Resize] outputs: [Resize__1007:0 -> (-1, -1, -1, -1)], 

Note that the output of the previous layer has a fixed shape: Transpose__1002:0 -> (1, 3, 1280, 720)

Is it possible to express the resize using a constant sizes or scales input instead of rois? Then TRT would be able to compute the output shapes at build time

@mk-nvidia
Copy link

@qraleq Please look at @pranavm-nvidia 's comment above to see if it fixes your problem.

@mk-nvidia mk-nvidia added Plugins triaged Issue has been triaged by maintainers labels Dec 8, 2020
@pranavm-nvidia
Copy link
Collaborator

We discussed this offline, I believe @qraleq's issue is resolved now.

@qraleq
Copy link

qraleq commented Dec 9, 2020

@mk-nvidia Thank you for following up. @pranavm-nvidia was of great help and we resolved the issue! Best regards!

@xonobo
Copy link

xonobo commented Dec 10, 2020

@pranavm-nvidia or anyone, can you help me figure out the problem. I appended a BatchedNMS plugin to the end of an onnx file and can visually see the result in netron but I got segmentation fault in trtexec. Here is the trtexec output:

/opt/TensorRT/targets/x86_64-linux-gnu/bin/trtexec --onnx=best.opt_nms.onnx --explicitBatch 
&&&& RUNNING TensorRT.trtexec # /opt/TensorRT/targets/x86_64-linux-gnu/bin/trtexec --onnx=best.opt_nms.onnx --explicitBatch
[12/10/2020-13:21:36] [I] === Model Options ===
[12/10/2020-13:21:36] [I] Format: ONNX
[12/10/2020-13:21:36] [I] Model: best.opt_nms.onnx
[12/10/2020-13:21:36] [I] Output:
[12/10/2020-13:21:36] [I] === Build Options ===
[12/10/2020-13:21:36] [I] Max batch: explicit
[12/10/2020-13:21:36] [I] Workspace: 16 MiB
[12/10/2020-13:21:36] [I] minTiming: 1
[12/10/2020-13:21:36] [I] avgTiming: 8
[12/10/2020-13:21:36] [I] Precision: FP32
[12/10/2020-13:21:36] [I] Calibration: 
[12/10/2020-13:21:36] [I] Refit: Disabled
[12/10/2020-13:21:36] [I] Safe mode: Disabled
[12/10/2020-13:21:36] [I] Save engine: 
[12/10/2020-13:21:36] [I] Load engine: 
[12/10/2020-13:21:36] [I] Builder Cache: Enabled
[12/10/2020-13:21:36] [I] NVTX verbosity: 0
[12/10/2020-13:21:36] [I] Tactic sources: Using default tactic sources
[12/10/2020-13:21:36] [I] Input(s)s format: fp32:CHW
[12/10/2020-13:21:36] [I] Output(s)s format: fp32:CHW
[12/10/2020-13:21:36] [I] Input build shapes: model
[12/10/2020-13:21:36] [I] Input calibration shapes: model
[12/10/2020-13:21:36] [I] === System Options ===
[12/10/2020-13:21:36] [I] Device: 0
[12/10/2020-13:21:36] [I] DLACore: 
[12/10/2020-13:21:36] [I] Plugins:
[12/10/2020-13:21:36] [I] === Inference Options ===
[12/10/2020-13:21:36] [I] Batch: Explicit
[12/10/2020-13:21:36] [I] Input inference shapes: model
[12/10/2020-13:21:36] [I] Iterations: 10
[12/10/2020-13:21:36] [I] Duration: 3s (+ 200ms warm up)
[12/10/2020-13:21:36] [I] Sleep time: 0ms
[12/10/2020-13:21:36] [I] Streams: 1
[12/10/2020-13:21:36] [I] ExposeDMA: Disabled
[12/10/2020-13:21:36] [I] Data transfers: Enabled
[12/10/2020-13:21:36] [I] Spin-wait: Disabled
[12/10/2020-13:21:36] [I] Multithreading: Disabled
[12/10/2020-13:21:36] [I] CUDA Graph: Disabled
[12/10/2020-13:21:36] [I] Separate profiling: Disabled
[12/10/2020-13:21:36] [I] Skip inference: Disabled
[12/10/2020-13:21:36] [I] Inputs:
[12/10/2020-13:21:36] [I] === Reporting Options ===
[12/10/2020-13:21:36] [I] Verbose: Disabled
[12/10/2020-13:21:36] [I] Averages: 10 inferences
[12/10/2020-13:21:36] [I] Percentile: 99
[12/10/2020-13:21:36] [I] Dump refittable layers:Disabled
[12/10/2020-13:21:36] [I] Dump output: Disabled
[12/10/2020-13:21:36] [I] Profile: Disabled
[12/10/2020-13:21:36] [I] Export timing to JSON file: 
[12/10/2020-13:21:36] [I] Export output to JSON file: 
[12/10/2020-13:21:36] [I] Export profile to JSON file: 
[12/10/2020-13:21:36] [I] 
[12/10/2020-13:21:36] [I] === Device Information ===
[12/10/2020-13:21:36] [I] Selected Device: Quadro M1200
[12/10/2020-13:21:36] [I] Compute Capability: 5.0
[12/10/2020-13:21:36] [I] SMs: 5
[12/10/2020-13:21:36] [I] Compute Clock Rate: 1.148 GHz
[12/10/2020-13:21:36] [I] Device Global Memory: 4046 MiB
[12/10/2020-13:21:36] [I] Shared Memory per SM: 64 KiB
[12/10/2020-13:21:36] [I] Memory Bus Width: 128 bits (ECC disabled)
[12/10/2020-13:21:36] [I] Memory Clock Rate: 2.505 GHz
[12/10/2020-13:21:36] [I] 
----------------------------------------------------------------
Input filename:   best.opt_nms.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    
Producer version: 
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[12/10/2020-13:21:36] [W] [TRT] /home/bozkalayci/github/TensorRT/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/10/2020-13:21:36] [W] [TRT] /home/bozkalayci/github/TensorRT/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[12/10/2020-13:21:36] [W] [TRT] /home/bozkalayci/github/TensorRT/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[12/10/2020-13:21:36] [W] [TRT] /home/bozkalayci/github/TensorRT/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[12/10/2020-13:21:36] [W] [TRT] /home/bozkalayci/github/TensorRT/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[12/10/2020-13:21:36] [W] [TRT] /home/bozkalayci/github/TensorRT/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[12/10/2020-13:21:36] [W] [TRT] /home/bozkalayci/github/TensorRT/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[12/10/2020-13:21:36] [I] [TRT] /home/bozkalayci/github/TensorRT/parsers/onnx/ModelImporter.cpp:139: No importer registered for op: BatchedNMS_TRT. Attempting to import as plugin.
[12/10/2020-13:21:36] [I] [TRT] /home/bozkalayci/github/TensorRT/parsers/onnx/builtin_op_importers.cpp:3775: Searching for plugin: BatchedNMS_TRT, plugin_version: 1, plugin_namespace: 
[12/10/2020-13:21:36] [I] [TRT] /home/bozkalayci/github/TensorRT/parsers/onnx/builtin_op_importers.cpp:3792: Successfully created plugin: BatchedNMS_TRT
[12/10/2020-13:21:47] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[12/10/2020-13:23:10] [I] [TRT] Detected 1 inputs and 4 output network tensors.
[12/10/2020-13:23:10] [I] Engine built in 94.3465 sec.
[12/10/2020-13:23:10] [I] Starting inference
Segmentation fault (core dumped)

Here are the code snippets to add the nms plugins :

def append_nms(graph, num_classes, scoreThreshold, iouThreshold, keepTopK):
    out_tensors = graph.outputs
    bs = out_tensors[0].shape[0]
    num_priors = out_tensors[0].shape[1]

    nms_attrs = {'shareLocation': True,
                 'backgroundLabelId': -1,
                 'numClasses': num_classes,
                 'topK': num_priors,
                 'keepTopK': keepTopK,
                 'scoreThreshold': scoreThreshold,
                 'iouThreshold': iouThreshold,
                 'isNormalized': True,
                 'clipBoxes': True}

    num_detections = gs.Variable(name="num_detections", dtype=np.int32, shape=(bs, 1))
    boxes =  gs.Variable(name="boxes", dtype=np.float32, shape=(bs, keepTopK, 4))
    scores = gs.Variable(name="scores", dtype=np.float32, shape=(bs, keepTopK))
    classes = gs.Variable(name="classes", dtype=np.float32, shape=(bs, keepTopK))

    nms = gs.Node(op="BatchedNMS_TRT", attrs=nms_attrs, inputs=out_tensors, outputs=[num_detections, boxes, scores, classes])
    graph.nodes.append(nms)
    graph.outputs = [num_detections, boxes, scores, classes]

    return graph
def add_nms_to_onnx(model_file, num_classes, confidenceThreshold=0.3, nmsThreshold=0.6, keepTopK=100, opset=11):
    graph = gs.import_onnx(onnx.load(model_file))

    if opset == 11:
        graph = process_pad_nodes(graph)

    graph = append_nms(graph, num_classes, confidenceThreshold, nmsThreshold, keepTopK)

    # Remove unused nodes, and topologically sort the graph.
    graph.cleanup().toposort().fold_constants().cleanup()

    # Export the onnx graph from graphsurgeon
    out_name = model_file[:-5]+'_nms.onnx'
    onnx.save_model(gs.export_onnx(graph), out_name)

    print("Saving the ONNX model to {}".format(out_name))

add_nms_to_onnx(model_file, 10, confidenceThreshold=0.3, nmsThreshold=0.6, keepTopK=100, opset=11)

The output shapes of onnx file before appending the nms is like this:

onnx_outputs

@pranavm-nvidia
Copy link
Collaborator

@xonobo Can you try getting a backtrace with gdb? Verbose logs would also help (add --verbose)

@xonobo
Copy link

xonobo commented Dec 11, 2020

used gdb with trtexec_debug and got not much info


[12/11/2020-10:17:31] [V] [TRT] Layer(PluginV2): (Unnamed Layer* 269) [PluginV2Ext], Tactic: 0, 935[Float(5733,1,4)], 936[Float(5733,10)] -> num_detections[Int32()], boxes[Float(100,4)], scores[Float(100)], classes[Float(100)]
[12/11/2020-10:17:31] [I] Engine built in 100.447 sec.
[12/11/2020-10:17:31] [V] [TRT] Allocated persistent device memory of size 602184192
[12/11/2020-10:17:31] [V] [TRT] Allocated activation device memory of size 31899648
[12/11/2020-10:17:31] [V] [TRT] Assigning persistent memory blocks for various profiles
[12/11/2020-10:17:31] [I] Starting inference
[New Thread 0x7fffd1b05700 (LWP 9132)]

Thread 6 "trtexec_debug" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1b05700 (LWP 9132)]
0x0000000000000001 in ?? ()
(gdb) 

verbose output states num_detections[Int32()].
Is this ok or should it be num_detections[Int32(1)]?

@jonakola
Copy link

@pranavm-nvidia I am having the same issue as @qraleq with regards to dynamic shapes.
Specifically in my case, I am using SSD with mobilenetV2 trained in Tensorflow, and using TensorRT 7.1. I was able to configure the mapping between the NMS_TRT plugin and NonMaxSuppression by editing the cpp code, but when I now try to convert the onnx file to trt, I get the following error:

(Unnamed Layer* 1073) [PluginV2Ext]: PluginV2Layer must be V2DynamicExt when there are runtime input dimensions.

Could you please share your solution?

@qraleq
Copy link

qraleq commented Dec 11, 2020

@jonakola You should use the BatchedNMSDynamic_TRT plugin if you have dynamic shapes. This plugin is supported in TensorRT version > 7.2.

Since I'm using NVIDIA Jetsons which do not support this plugin at the moment, I ended up freezing the TensorFlow model to a fixed shape, and then everything worked out as supposed.

@jonakola
Copy link

Thanks @qraleq . I am also using NVIDIA Jetsons. How did you accomplish fixing the shape in TF in your case? I started off with a frozen_inference_graph.pb (exported after training with TF object detection api) which I would have thought already freezes all variables and shapes.

@qraleq
Copy link

qraleq commented Dec 11, 2020

@jonakola Explicitly define sizes when freezing the model from ckpt to pb.

Take a look here: https://github.com/tensorflow/models/blob/master/research/object_detection/export_inference_graph.py

and specifically:

flags.DEFINE_string('input_shape', None,
                    'If input_type is `image_tensor`, this can explicitly set '
                    'the shape of this input tensor to a fixed size. The '
                    'dimensions are to be provided as a comma-separated list '
                    'of integers. A value of -1 can be used for unknown '
                    'dimensions. If not specified, for an `image_tensor, the '
                    'default shape will be partially specified as '
                    '`[None, None, None, 3]`.')

@pranavm-nvidia
Copy link
Collaborator

@xonobo It's weird that one dimension is being cut out from each of the output shapes. Is it possible for you to share your model?

@xonobo
Copy link

xonobo commented Dec 14, 2020

Here I provide two onnx files. I zero outed all weights in order to make them small. The smaller model passes trtexec but the bigger one fails as I stated above. I appended the nms plugin to the models using the same python script.

nms_trt_examples.zip

@pranavm-nvidia
Copy link
Collaborator

@xonobo It looks like the topK parameter is out of range in the larger model:

Invalid parameter: NMS topK (5733) exceeds limit (4096)

@xonobo
Copy link

xonobo commented Dec 15, 2020

Thanks for figuring out the problem but can you share your inspection way, I can not see any log report about the invalid parameter. If you found this case by using nvidia's internal tools can you add these invalid cases to public log reports in the next release.

Thanks a lot for your help.

@jonakola
Copy link

Thank you for the previous help @qraleq. I am still unable to successfully convert my model from onnx to trt. Here are the specifics:

  • Original model is SSD mobilenetV2 from TF 1 model zoo and tuned with my own images, frozen to a fixed shape as suggested above. I'm using TF 1.15.
  • I am able to successfully convert from either saved_model or frozen_inference_graph.pb to onnx using tf2onnx 1.7.2 as follows
python3 -m tf2onnx.convert --saved-model saved_model --opset 11 --output prod.onnx
  • Using the Layer + Register API mentioned previously, I am able to replace NonMaxSuppression ops with BatchedNMS_TRT plugin.
  • After all this, converting to trt engine with onnx2trt tool fails with the following error:
Successfully created plugin: BatchedNMS_TRT
#assertionbatchedNMSPlugin.cpp,70

Looking at the batchedNMSPlugin.cpp file for TensorRT 7.1.3 tag, the error is raised in the BatchedNMSPlugin::getOutputDimensions function, specifically

ASSERT(inputs[0].nbDims == 3);

suggesting that the BatchedNMS plugin is not receiving an input with the shape it expects, which should be the detection boxes.

I'm using TensorRT 7.1 on Jetson Nano.

@pranavm-nvidia any ideas on what could be causing this?

@pranavm-nvidia
Copy link
Collaborator

@xonobo The friendlier error message will be included in the next version of TRT.

@jonakola The first input (boxes) is expected to have a shape of [batch_size, number_boxes, number_classes, number_box_parameters] (see here). Can you double check the input shape?

@vilmara
Copy link

vilmara commented Feb 24, 2021

@vilmara Ah sorry, I didn't realize you were using 7.0. Is it possible for you to upgrade to 7.1? Or maybe just build the 7.1 ONNX parser?
The plugin importer was introduced in 7.1, so for earlier versions, you would need to add a custom importer for BatchedNMS_TRT in builtin_op_importers.cpp.

Hi @pranavm-nvidia, since the DS-triton with TRT 7.2.1 support was released I upgraded to 7.2.1 to optimize the model. The BatchedNMSDynamic_TRT plugin was created successfully, then I got this new error: Assertion failed: inputs[0].nbDims == 4

[02/24/2021-21:35:33] [W] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively supcast down to INT32.
[02/24/2021-21:35:33] [W] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[02/24/2021-21:35:33] [W] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[02/24/2021-21:35:33] [I] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ModelImporter.cpp:139: No importer registered for op: BatchedNMSDynamic_TRT. Attempting to import as plugin.
[02/24/2021-21:35:33] [I] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/builtin_op_importers.cpp:3775: Searching for plugin: BatchedNMSDynamic_TRT, plugin_version: 1, plugin_namespace:
[02/24/2021-21:35:33] [I] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/builtin_op_importers.cpp:3792: Successfully created plugin: BatchedNMSDynamic_TRT
[02/24/2021-21:35:33] [V] [TRT] /home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/ImporterContext.hpp:154: Registering layer: (Unnamed Layer* 1435) [PluginV2DynamicExt] for ONNX node:
[02/24/2021-21:35:33] [F] [TRT] Assertion failed: inputs[0].nbDims == 4
/home/jenkins/workspace/OSS/L0_MergeRequest/oss/plugin/batchedNMSPlugin/batchedNMSPlugin.cpp:137

@pranavm-nvidia
Copy link
Collaborator

@vilmara The BatchedNMSDynamic plugin expects the input to have 4 dimensions: [batch_size, number_boxes, number_classes, number_box_parameters]. Can you see which one is missing from your model? Maybe one of them is 1 and is being squeezed out?

@vilmara
Copy link

vilmara commented Feb 26, 2021

Hi @pranavm-nvidia,

Hi @vilmara, the ONNX parser will automatically attempt to import unknown ops as plugins, so you don't need to modify builtin_op_importers.cpp.

When an unknown op is encountered, the parser searches the plugin registry for a plugin with the same name. So in this case, we just need to change the NMS node to a BatchedNMSDynamic_TRT node with the right attributes.

I think your script just needs a few modifications:

import onnx_graphsurgeon as gs
import onnx
import numpy as np

print ("Running BatchedNMSDynamic_TRT plugin the ONNX model.. ")

input_model_path = "/workspace/onnx-tensorrt/models/yolov3-10.onnx"
output_model_path = "/workspace/onnx-tensorrt/models/yolov3-10-with-plugin.onnx"

graph = gs.import_onnx(onnx.load(input_model_path))

tmap = graph.tensors()
# NOTE: Input and output tensors are model-dependent. 
# From your logging output, it looks like these are the ones of interest:
#     input: "yolonms_layer_1/ExpandDims_1:0"
#     input: "yolonms_layer_1/ExpandDims_3:0"
#     output: "casted"
# The other input tensors turn into plugin attributes (see `attrs` below)
boxes, scores, nms_out = tmap["yolonms_layer_1/ExpandDims_1:0"], tmap["yolonms_layer_1/ExpandDims_3:0"], tmap["casted"]

# Disconnect old subgraph
boxes.outputs.clear()
scores.outputs.clear()
nms_out.inputs.clear()

attrs = {
    "keepTopK": 20, # Based on max_output_boxes_per_class
    "iouThreshold": 0.5,
    "scoreThreshold": 0.6,
    # TODO: Fill out any other attributes you may need 
    # (see https://github.com/NVIDIA/TensorRT/tree/master/plugin/batchedNMSPlugin#parameters)

}
node = gs.Node(op="BatchedNMSDynamic_TRT", attrs=attrs, 
               inputs=[boxes, scores], outputs=[nms_out])
graph.nodes.append(node)

# NOTE: Need to cleanup to remove the old NMS node properly. 
# Finally, we can save the model. 
graph.cleanup().toposort()
onnx.save_model(gs.export_onnx(graph), output_model_path)

Hi @pranavm-nvidia, I replaced the old NMS subgraph/node with new node using BatchedNMSDynamic_TRT plugin and maybe I left a dimension out, how can I check the model and see which one is missing?

Here is how the output node looks after the conversion:
image

@pranavm-nvidia
Copy link
Collaborator

@vilmara The issue is the previous layer that supplies the boxes input. For whatever reason, it's only generating 3 dimensions: (unk__580, unk__581, 4). How are you exporting the model to ONNX?

@vilmara
Copy link

vilmara commented Feb 26, 2021

@pranavm-nvidia I am using the pre-trained model yolov3 from ONNX Model Zoo repo. I see this model only has 3 outputs out_boxes, out_scores, out_classes.

What other object detection pre-trained models do you recommend? or what could be the workaround solution?

@pranavm-nvidia
Copy link
Collaborator

@vilmara Have you looked at the YoloV3 example shipped with TensorRT?

I think the issue with the model zoo model is that the ONNX spec expects a boxes input of shape: [num_batches, spatial_dimension, 4], whereas TRT also supports per-class boxes.

Also I'm not sure if the model is valid, since both TRT and ONNX-Runtime are having trouble with other layers in the model, e.g:

onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Conv node. Name:'model_1/leaky_re_lu_2/LeakyRelu:0_nchwc' Status Message: Invalid input shape: {1,1}

@vilmara
Copy link

vilmara commented Feb 26, 2021

2. The ONNX model included in the yolov3_onnx example works out of the box with TRT, so it doesn't require plugins.

@pranavm-nvidia, I got it, I will explore different models/frameworks then. The issue with yolov3_onnx is that it works out of the box with TRTn and doesn't require plugins; however, I need to explore popular object detection models that usually have custom operations and require plugin implementations

@VeeranjaneyuluToka
Copy link

VeeranjaneyuluToka commented Apr 20, 2021

@pranavm-nvidia or @qraleq would you mind explaining a bit about the keys that you passed to the graph.tensor() as below
graph.trt_batched_nms(tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5__1712:0"],
tmap["Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/NonMaxSuppressionV5__1761:0"],
tmap["NonMaxSuppression__1763:0"],
share_location=False,
num_classes=4)

I am using the sample that you have shared above to modify the onnx model which is SSD mobilenet V2 fpn lite created using tensorflow object detection API 2 and getting the below error

File "modify_onnx.py", line 135, in main
add_nms_plugin()
File "modify_onnx.py", line 128, in add_nms_plugin
onnx.checker.check_model(gs.export_onnx(graph))
File "/home/veeru/.local/lib/python3.6/site-packages/onnx/checker.py", line 102, in check_model
C.check_model(protobuf_string)
onnx.onnx_cpp2py_export.checker.ValidationError: Node (NonMaxSuppression__961) has output size 0 not in range [min=1, max=1].

==> Context: Bad node spec: input: "Unsqueeze__924:0" input: "Unsqueeze__959:0" input: "const_fold_opt__1412" input: "StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/iou_threshold:0" input: "StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_1/score_threshold:0" name: "NonMaxSuppression__961" op_type: "NonMaxSuppression"

Am bit unsure how you passed keys to the graph.trt_batched_nms(). Below are the keys esp related to NMS when i print from graph.tensor() dict

StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/iou_threshold:0
StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_1/score_threshold:0
NonMaxSuppression__995:0
Slice__999:0
Squeeze__1001:0
StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores/NonMaxSuppressionV5:1
NonMaxSuppression__961:0
Slice__965:0
Squeeze__967:0
StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_1/NonMaxSuppressionV5:1
NonMaxSuppression__927:0
Slice__931:0
Squeeze__933:0
StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_2/NonMaxSuppressionV5:1
NonMaxSuppression__893:0
Slice__897:0
Squeeze__899:0
StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_3/NonMaxSuppressionV5:1
NonMaxSuppression__859:0
Slice__863:0
Squeeze__865:0
StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_4/NonMaxSuppressionV5:1
StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/non_max_suppression_with_scores_1/NonMaxSuppressionV5:0

@VeeranjaneyuluToka
Copy link

VeeranjaneyuluToka commented Apr 20, 2021

I have used different approach to feed input as some thing like looping the below lists

**boxt = "Unsqueeze__924:0"
scores_list = ["Unsqueeze__857:0", "Unsqueeze__891:0", "Unsqueeze__925:0", "Unsqueeze__959:0", "Unsqueeze__993:0"]
nms_list = ["NonMaxSuppression__995:0", "NonMaxSuppression__961:0", "NonMaxSuppression__927:0", "NonMaxSuppression__893:0", "NonMaxSuppression__859:0"]**

Looks like it is fine updating NMS with BatchedNMS_TRT based on netron output, attached for your reference
Screenshot from 2021-04-20 14-56-23
Screenshot from 2021-04-20 14-56-30

But when i am converting from onnx to trt using the below command
./trtexec --onnx=/tf_git_hubs/tensorflow/workspace/training_demo/exported-models/my_ssd_mobnetv2_fpnlite_model/ssd-mobnetv2_fpnlite_model_no_nms.onnx --saveEngine=/tf_git_hubs/tensorflow/workspace/training_demo/exported-models/my_ssd_mobnetv2_fpnlite_model/ssd-mobnetv2_fpnlite_model.trt --verbose
am getting segmentation fault, here is a few lines of verbose output

[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_1/BatchNorm/feature_0/FusedBatchNormV3/ReadVariableOp_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_fold_opt__1402
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_2/BatchNorm/feature_0/ReadVariableOp:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_2/BatchNorm/feature_0/ReadVariableOp_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_2/BatchNorm/feature_0/FusedBatchNormV3/ReadVariableOp:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_2/BatchNorm/feature_0/FusedBatchNormV3/ReadVariableOp_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3/BatchNorm/feature_0/ReadVariableOp:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3/BatchNorm/feature_0/ReadVariableOp_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3/BatchNorm/feature_0/FusedBatchNormV3/ReadVariableOp:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/WeightSharedConvolutionalBoxPredictor/PredictionTower/conv2d_3/BatchNorm/feature_0/FusedBatchNormV3/ReadVariableOp_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_fold_opt__1354
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_fold_opt__1324
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_starts__757
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_ends__758
[04/20/2021-14:37:57] [V] [TRT] onnx2trt_utils.cpp:236: Weight at index 0: 9223372036854775807 is out of range. Clamping to: 2147483647
[04/20/2021-14:37:57] [V] [TRT] onnx2trt_utils.cpp:236: Weight at index 1: 9223372036854775807 is out of range. Clamping to: 2147483647
[04/20/2021-14:37:57] [V] [TRT] onnx2trt_utils.cpp:236: Weight at index 2: 9223372036854775807 is out of range. Clamping to: 2147483647
[04/20/2021-14:37:57] [W] [TRT] onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/Decode/truediv_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/Decode/truediv:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: ConstantFolding/StatefulPartitionedCall/Postprocessor/Decode/truediv_2_recip:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/Decode/get_center_coordinates_and_sizes/add_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/Decode/get_center_coordinates_and_sizes/add:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_starts__760
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_ends__761
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_starts__763
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_ends__764
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_starts__766
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_ends__767
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_starts__769
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_ends__770
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_starts__772
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_ends__773
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/PadOrClipBoxList/Pad__1118
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/Decode/get_center_coordinates_and_sizes/sub:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/Decode/get_center_coordinates_and_sizes/sub_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: ConstantFolding/StatefulPartitionedCall/Postprocessor/Decode/truediv_7_recip:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_fold_opt__1323
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Minimum_5/x:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_fold_opt__1406
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/mul:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/Reshape_3:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Concatenate/concat_5:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/mul_5/x:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/range_6/delta:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/PadOrClipBoxList/Select/e:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const__1218
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: largest_int_val__1219
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/PadOrClipBoxList/zeros_10:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/PadOrClipBoxList/zeros:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/PadOrClipBoxList/sub_17/x:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/PadOrClipBoxList/sub_3/x:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_fold_opt__1301
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:90: Importing initializer: const_fold_opt__1350
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:103: Parsing node: __inference_map_while_cond_8144_19055_map/while/Less_1 [Less]
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:119: Searching for input: const_fold_opt__1415
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/add/y:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:125: __inference_map_while_cond_8144_19055_map/while/Less_1 [Less] inputs: [const_fold_opt__1415 -> ()], [StatefulPartitionedCall/add/y:0 -> ()],
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:150: Registering constant layer: const_fold_opt__1415 for ONNX initializer: const_fold_opt__1415
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:150: Registering constant layer: StatefulPartitionedCall/add/y:0 for ONNX initializer: StatefulPartitionedCall/add/y:0
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:154: Registering layer: __inference_map_while_cond_8144_19055_map/while/Less_1 for ONNX node: __inference_map_while_cond_8144_19055_map/while/Less_1
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:120: Registering tensor: __inference_map_while_cond_8144_19055_map/while/Less_1:0 for ONNX tensor: __inference_map_while_cond_8144_19055_map/while/Less_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:179: __inference_map_while_cond_8144_19055_map/while/Less_1 [Less] outputs: [__inference_map_while_cond_8144_19055_map/while/Less_1:0 -> ()],
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:103: Parsing node: __inference_map_while_cond_8144_19055_map/while/LogicalAnd [And]
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:119: Searching for input: __inference_map_while_cond_8144_19055_map/while/Less_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:119: Searching for input: __inference_map_while_cond_8144_19055_map/while/Less_1:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:125: __inference_map_while_cond_8144_19055_map/while/LogicalAnd [And] inputs: [__inference_map_while_cond_8144_19055_map/while/Less_1:0 -> ()], [__inference_map_while_cond_8144_19055_map/while/Less_1:0 -> ()],
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:154: Registering layer: __inference_map_while_cond_8144_19055_map/while/LogicalAnd for ONNX node: __inference_map_while_cond_8144_19055_map/while/LogicalAnd
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:120: Registering tensor: __inference_map_while_cond_8144_19055_map/while/LogicalAnd:0 for ONNX tensor: __inference_map_while_cond_8144_19055_map/while/LogicalAnd:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:179: __inference_map_while_cond_8144_19055_map/while/LogicalAnd [And] outputs: [__inference_map_while_cond_8144_19055_map/while/LogicalAnd:0 -> ()],
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/map/while_loop [Loop]
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/map/while/maximum_iterations:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:119: Searching for input: __inference_map_while_cond_8144_19055_map/while/LogicalAnd:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:119: Searching for input: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/range_5/start:0
[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:125: StatefulPartitionedCall/map/while_loop [Loop] inputs: [StatefulPartitionedCall/map/while/maximum_iterations:0 -> ()], [__inference_map_while_cond_8144_19055_map/while/LogicalAnd:0 -> ()], [StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/range_5/start:0 -> ()],
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:150: Registering constant layer: StatefulPartitionedCall/map/while/maximum_iterations:0 for ONNX initializer: StatefulPartitionedCall/map/while/maximum_iterations:0
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:120: Registering tensor: map_while_map_while_loop_counter:0 for ONNX tensor: map_while_map_while_loop_counter:0
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:120: Registering tensor: map_while_map_while_loop_counter:0 tripLimit for ONNX tensor: map_while_map_while_loop_counter:0 tripLimit
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:120: Registering tensor: map_while_placeholder:0 for ONNX tensor: map_while_placeholder:0
[04/20/2021-14:37:57] [V] [TRT] ImporterContext.hpp:150: Registering constant layer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/range_5/start:0 for ONNX initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/range_5/start:0
Segmentation fault (core dumped)

@pranavm-nvidia Anything that i missed or any inputs to further debug?

@pranavm-nvidia
Copy link
Collaborator

@VeeranjaneyuluToka From the error output, it seems like it's failing on a Loop node, e.g. see the last Parsing node line before the error:

[04/20/2021-14:37:57] [V] [TRT] ModelImporter.cpp:103: Parsing node: StatefulPartitionedCall/map/while_loop [Loop]

So I'm not sure this is necessarily related to the BatchedNMS node. Maybe you could open a separate issue?

@xonobo
Copy link

xonobo commented Apr 21, 2021

Hello xonobo:
Have you convert your best.opt_trt_nms.onnx model to tensorrt successfully? I have encountered the same BatchedNMS_TRT plugin convert problems with you in my yolov3_dynamic_postprocess_nms.onnx model. The code
to add the NMS plugins is like yours.
The following is my adding-NMS-plugin code:

import onnx_graphsurgeon as gs
import onnx
import numpy as np

def append_nms(graph, num_classes, scoreThreshold, iouThreshold, keepTopK):
    out_tensors = graph.outputs
    bs = out_tensors[0].shape[0]
    num_priors = out_tensors[0].shape[1]

    nms_attrs = {'shareLocation': True,
                 'backgroundLabelId': -1,
                 'numClasses': num_classes,
                 'topK': 1024,
                 'keepTopK': keepTopK,
                 'scoreThreshold': scoreThreshold,
                 'iouThreshold': iouThreshold,
                 'isNormalized': True,
                 'clipBoxes': True}

    nms_num_detections = gs.Variable(name="nms_num_detections", dtype=np.int32, shape=(bs, 1))
    nms_boxes =  gs.Variable(name="nms_boxes", dtype=np.float32, shape=(bs, keepTopK, 4))
    nms_scores = gs.Variable(name="nms_scores", dtype=np.float32, shape=(bs, keepTopK))
    nms_classes = gs.Variable(name="nms_classes", dtype=np.float32, shape=(bs, keepTopK))

    nms = gs.Node(op="BatchedNMSDynamic_TRT", attrs=nms_attrs, inputs=out_tensors, outputs=[nms_num_detections, nms_boxes, nms_scores, nms_classes])
    graph.nodes.append(nms)
    graph.outputs = [nms_num_detections, nms_boxes, nms_scores, nms_classes]

    return graph


def add_nms_to_onnx(model_file, num_classes, confidenceThreshold=0.3, nmsThreshold=0.6, keepTopK=100, opset=11):
    graph = gs.import_onnx(onnx.load(model_file))
    
    graph = append_nms(graph, num_classes, confidenceThreshold, nmsThreshold, keepTopK)
    
    # Remove unused nodes, and topologically sort the graph.
    graph.cleanup().toposort().fold_constants().cleanup()

    # Export the onnx graph from graphsurgeon
    out_name = model_file[:-5]+'_nms.onnx'
    onnx.save_model(gs.export_onnx(graph), out_name)

    print("Saving the ONNX model to {}".format(out_name))


if __name__ == "__main__":

    model_file = "yolov3_dynamic_postprocess.onnx"
    add_nms_to_onnx(model_file, 80, confidenceThreshold=0.3, nmsThreshold=0.6, keepTopK=100, opset=11)    

Hi @MAhaitao999. A late reply but I guess you have to reshape the input blobs of the nms plugin as the one defined in the plugin documentation.

In my case (shared prior boxes over classes) I need to reshape the boxes and update the inputs blobs of the plugin like this:

    # Reshape boxes
    boxes_reshaped = gs.Variable(name="boxes_reshaped", dtype=np.float32, shape=(bs, num_priors, 1, 4))
    boxes_shape = gs.Variable(name="boxes_shape", dtype=np.int32).to_constant(np.array([bs, -1, 1, 4]).astype(np.int32))
    reshape_boxes = gs.Node(op="Reshape", inputs=[out_tensors[0], boxes_shape], outputs=[boxes_reshaped])
    graph.nodes.append(reshape_boxes)

nms = gs.Node(op="BatchedNMS_TRT", attrs=nms_attrs, inputs=[boxes_reshaped, out_tensors[1]], outputs=[nms_num_detections, nms_boxes, nms_scores, nms_classes])

@VeeranjaneyuluToka
Copy link

@pranavm-nvidia, would you mind sharing a reference or explaining a bit about topK, keeptopK parameters which are getting passed to self.layer in @qraleq nms plugin register sample? Do these parameters change from model to model?

@pranavm-nvidia
Copy link
Collaborator

@VeeranjaneyuluToka The parameters are described in the plugin documentation here. The values you'd pass for them would be specific to your particular model/use-case.

For topK and keepTopK in particular:

  • topK refers to the number of boxes that are fed into the NMS. Boxes are sorted by confidence before deciding which K boxes to keep. This is mostly to reduce the computational load on the plugin by reducing the number of IOU calculations we need to do.
  • keepTopK refers to the number of boxes kept by the NMS itself.

@VeeranjaneyuluToka
Copy link

VeeranjaneyuluToka commented Apr 22, 2021

@pranavm-nvidia Thanks for reference and explanation about the requested parameters.
About topK: does that mean even if we have 51150 boxes from predictions (that is the case from my model), we do not need to pass this number rather pass max possible detections in our test samples?

I have debugged the tf model and looks like no.of boxes before NMS step is 51150, I have passed same thing while adding BatchedNMS_plugin, still it gives Segmentation fault as below (i was assuming that as i am passing wrong no boxes, it might be giving seg fault) but looks like that is not the case.

[04/22/2021-07:42:04] [V] [TRT] ImporterContext.hpp:120: Registering tensor: map_while_map_while_loop_counter:0 for ONNX tensor: map_while_map_while_loop_counter:0
[04/22/2021-07:42:04] [V] [TRT] ImporterContext.hpp:120: Registering tensor: map_while_map_while_loop_counter:0 tripLimit for ONNX tensor: map_while_map_while_loop_counter:0 tripLimit
[04/22/2021-07:42:04] [V] [TRT] ImporterContext.hpp:120: Registering tensor: map_while_placeholder:0 for ONNX tensor: map_while_placeholder:0
[04/22/2021-07:42:04] [V] [TRT] ImporterContext.hpp:150: Registering constant layer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/range_5/start:0 for ONNX initializer: StatefulPartitionedCall/Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/range_5/start:0
Segmentation fault (core dumped)

You have suggested me to raise a new issue on this, i did raise it #1205 but no reply yet. Could you please have a look and let me know how to analyze this issue further.

@pranavm-nvidia
Copy link
Collaborator

@VeeranjaneyuluToka

About topK: does that mean even if we have 51150 boxes from predictions (that is the case from my model), we do not need to pass this number rather pass max possible detections in our test samples?

Yes, exactly. We don't need to check 51150 boxes if you expect far fewer detections than that. In fact, the NMS plugin is currently limited to 4096 input boxes, so your topK value must be less than or equal to that. That's probably what's triggering the segfault.

@Edwardmark
Copy link

Yes, correct. You can replace the old NMS subgraph/node with a new node whose op is set to the plugin name. There are two ways to do this, which I've outlined below.

Old-GS Style

In the simplest case, you can create the node and insert it into the graph like old GS, e.g.:

# Find tensors
tmap = graph.tensors()
boxes, scores, nms_out = tmap["boxes"], tmap["scores"], tmap["nms_out"]

# Disconnect old subgraph
boxes.outputs.clear()
scores.outputs.clear()
nms_out.inputs.clear()

attrs = {
    "share_location": False,
    "num_classes": 8,
    # etc.
}
node = gs.Node(op="BatchedNMS_TRT", attrs=attrs, 
               inputs=[boxes, scores], outputs=[nms_out])
graph.nodes.append(node)

Layer + Register API

If you're reusing this across multiple models, or using the plugin multiple times in the model, then the layer/register API might simplify things.

First you'd register a function to insert the plugin:

@gs.Graph.register()
def trt_batched_nms(self, boxes_input, scores_input, nms_output, 
                    share_location, num_classes): # and other attrs
    boxes_input.outputs.clear()
    scores_input.outputs.clear()
    nms_output.inputs.clear()

    attrs = {
        "share_location": share_location,
        "num_classes": num_classes,
        # etc.
    }
    return self.layer(op="BatchedNMS_TRT", attrs=attrs, 
                      inputs=[boxes_input, scores_input], 
                      outputs=[nms_output])

And then in your models, you can use it without all the boilerplate:

tmap = graph.tensors()

# Can also get attributes from the original graph instead of hard-coding
graph.trt_batched_nms(tmap["boxes"], tmap["scores"], 
                      tmap["nms_out"], share_location=True, 
                      num_classes=81)

graph.trt_batched_nms(tmap["boxes2"], tmap["scores2"], 
                      tmap["nms_out2"], share_location=False, 
                      num_classes=4)

hi, @@pranavm-nvidia, could you please help me with some problems about graph-surgeon? my onnx model is exported without nms, as the following code,

torch.onnx.export(model, img, f, \
                      verbose=True, \
                      opset_version=11, \
                      input_names=['images'], \
                      output_names=['boxes', 'scores'])

so how can I add tmap['nms_out']? Thanks in advance.

@pranavm-nvidia
Copy link
Collaborator

@Edwardmark You can create a new output tensor. Something like:

import numpy as np

nms_out = gs.Variable(name="nms_out", dtype=np.float32, shape=(...)) # TODO: Fill out shape
graph.outputs = [nms_out]

and then use that instead of tmap["nms_out"]

@ttyio
Copy link
Collaborator

ttyio commented May 21, 2021

Close since no activation for more than 3 weeks, please reopen if you still have question, thanks!

@ttyio ttyio closed this as completed May 21, 2021
@ttanzhiqiang
Copy link

https://github.com/ttanzhiqiang/onnx_tensorrt_project

@Source82
Copy link

@jonakola You should use the BatchedNMSDynamic_TRT plugin if you have dynamic shapes. This plugin is supported in TensorRT version > 7.2.

Since I'm using NVIDIA Jetsons which do not support this plugin at the moment, I ended up freezing the TensorFlow model to a fixed shape, and then everything worked out as supposed.

@jonakola You should use the BatchedNMSDynamic_TRT plugin if you have dynamic shapes. This plugin is supported in TensorRT version > 7.2.

Since I'm using NVIDIA Jetsons which do not support this plugin at the moment, I ended up freezing the TensorFlow model to a fixed shape, and then everything worked out as supposed.

Please was your model trained using TF2 api, how did you go about freezing the the shape.

@lucasjinreal
Copy link

Why does BatchedNMSTRT using a different definition than standard onnx?

onnx need input to be dimension: [num_batches, spatial_dimension, 4]
while TRT need: [num_batches, spatial_dimension, 1, 4]

what if I want graph surgeon to add a NonMaxSupression in onnx onset? How to expand that dimension?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good-reference Plugins triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests