-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] crash on poolings with larger-than-317 pool sizes #2094
Comments
We limit the max kernel volume must less than 100,000 for 2d pooling kernel(volume(windowSize) < MAX_KERNEL_DIMS_PRODUCT(nbSpatialDims==2)), in your case 317x317=100489 which exceed the limit, is this a real case in your scenario? |
@zerollzeng Thanks for the explanation. This use case is not from a real-world model but from our project which is related to automatic model generation. Thanks for the feedback and I will lower down the kernel size during generation. BTW, just curious, does tensorrt's python end wants to ensure that errors should be returned by exception instead of crash? |
Hi @zerollzeng, I just wanted to mention that we are actually using this in a real world segmentation network in our project. We face the problem that we get a hard crash (SEGFAULT) when the user increases the image above a certain size. Even if there is a limit for the kernel size I would expect that an exception is thrown and not a hard crash. Since the problem stems from the unchecked return value here: https://github.com/onnx/onnx-tensorrt/blob/0462dc31ae78f48744b6141ae376df1f96d3f459/onnx2trt_utils.cpp#L1511 Thank you very much! |
Hi @drproktor Thank you :-) The hard crash looks like a bug to me, could you please share a reproduce with us? Many thanks! |
Hi @zerollzeng, to reproduce you can still use the steps as described by @ganler in the initial post. It also occurs with the latest version of TensorRT. Is this sufficient? If it helps I would be willing to help and provide a patch for the problem. There are multiple places within https://github.com/onnx/onnx-tensorrt/blob/0462dc31ae78f48744b6141ae376df1f96d3f459/onnx2trt_utils.cpp Thank you very much! |
@zerollzeng I created an issue in the onnx-tensorrt repository onnx/onnx-tensorrt#937 |
File internal bug 4291317 for this |
If the goal is to do GlobalAvgPool or GlobalMaxPool, it is recommended to use the Reduce layer instead of the Pooling layer for better performance and better support. |
segfault issue is expected to be fixed in in TRT 9.1 |
@nvpohanh : Thanks for the information. But in this case its really just a very large pooling op not a global op.
|
@drproktor We are currently considering to remove this limitation, but it will require more time to support it. Likely we will need to wait at least next major release. |
Description
A simple pooling 2d (can be AvgPool/MaxPool etc.) layer whose sizes are greater or equal to
317
will lead to TRT crash. This can be fixed on user side by using multiple smaller poolings but just want report this case to improve TRT's robustness. :-)I am new here so I just randomly cc some active developers: @nvpohanh @zerollzeng @ttyio
Environment
TensorRT Version: 8.4.1.5
NVIDIA GPU: 3080Ti
NVIDIA Driver Version: 510.73.08
CUDA Version: 11.6
CUDNN Version: 8.4.1
Operating System: Ubuntu 20.04
Python Version (if applicable): 3.8
PyTorch Version (if applicable): 1.13.0.dev20220624+cu113
Relevant Files
The ONNX model can be generated by script below.
Steps To Reproduce
Errors:
The text was updated successfully, but these errors were encountered: