Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensorrt supported detection networks #6

Open
HilmiK opened this issue Jul 24, 2018 · 30 comments
Open

Tensorrt supported detection networks #6

HilmiK opened this issue Jul 24, 2018 · 30 comments

Comments

@HilmiK
Copy link

HilmiK commented Jul 24, 2018

Hi,

It is seen that tensorrt supports resnet in classification task. Does it also support the detection networks with a resnet backbone?

What are the exception modules which tensorrt does not support ?

Thanks in advance

@ghost
Copy link

ghost commented Jul 25, 2018

It's possible that it would work, but we haven't tested it. Currently, the build_detection_graph method that we provide in this repository is tested to work only against the listed models.

That said, it is possible that for similar meta-architectures (SSD), configurations with different feature extractors would work. A list of feature extractors registered with the tensorflow/models repository is listed here.

https://github.com/tensorflow/models/blob/master/research/object_detection/builders/model_builder.py#L47

You would need to update the object detection configuration proto to select the desired feature extractor.

Theoretically, the TensorRT integration in TensorFlow should support any model, as the operations that are not supported by TensorRT are run in native TensorFlow. That said, there may be caveats.

Please let me know if you run into issues.

@HilmiK
Copy link
Author

HilmiK commented Jul 25, 2018

Thank you for answer. I will report here after I try faster-rcnn with different backbones.

@bezero
Copy link

bezero commented Sep 20, 2018

I was able to convert "faster_rcnn_resnet101_coco"; however in order to be able to use it you should modify config file to use fixed input images. Modify line 4-8:
keep_aspect_ratio_resizer { min_dimension: 600 max_dimension: 1024 } ==> fixed_shape_resizer { height: 600 width: 1024 }
Use any dimension you like

@jkjung-avt
Copy link

@jaybdub-nv , @bezero , When I tried to convert "faster_rcnn_resnet50_coco" with TF-TRT on TX2, I met a few other issues. I wonder how you got around them. Any help/suggestion is highly appreciated.

  1. TX2 ran out of memory, especially when I tried to load an image and do tf_sess.run(...). And the program just got killed.

  2. Same issue as tensorflow.python.framework.errors_impl.InvalidArgumentError #11

File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tensorrt/python/trt_convert.py", line 115, in create_inference_graph
int(msg[0]))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid graph: Frame ids for node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 does not match frame ids for it's fanout.
  1. The following error, which seems to be solved by bezero's fix as shown above.
<log time omitted>: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: ../builder/Network.cpp::addInput::377, condition: isValidDims(dims)
  1. The following error, which I think is because the 2nd stage classifier needs to handle input tensor of larger batch size (300).
<log time omitted>: F tensorflow/contrib/tensorrt/kernels/trt_engine_op.
cc:82] input tensor batch larger than max_batch_size: 1

@bezero
Copy link

bezero commented Oct 1, 2018

@jkjung-avt I also had memory issues. I solved it by closing my browser, since it is using your memory resources (if possible close all idle applications that are using memory resources. If I am not wrong, scripts in this repo work with max_batch_size=1, so try to work with single images. For batch size >1 TX2 memory might not be sufficient.

@jkjung-avt
Copy link

@bezero Thanks for the reply. But closing the web browser and all other applications on TX2 did not solve the OOM issue for me. I also used single-image input for the faster_rcnn_resnet50. I had to reduce number of proposals/detections in the model config to some very small numbers to get around that...

@tevisgehr
Copy link

I am able to run faster_rcnn_resnet50_coco, which is included in the list of supported models, but I don't seem to be getting any speedup, which makes me skeptical that any subgraphs are being optimized at all.

In order to get it to run, I used the following command to build the graph (along with the other code included in the Jupyter example):

trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=output_names,
max_batch_size=1,
max_workspace_size_bytes=1 << 25,
precision_mode='FP16',
minimum_segment_size=3,
maximum_cached_engines=3
)

I am wondering if anyone has had success in speeding up any form of Faster R-CNN, and if so, could you share some insight into what settings need to be adjusted or how to go about getting the graph conversions to work correctly?

@jkjung-avt
Copy link

I shared my test results on Jetson TX2 developer forum before: https://devtalk.nvidia.com/default/topic/1037019/jetson-tx2/tensorflow-object-detection-and-image-classification-accelerated-for-nvidia-jetson/post/5288250/#5288250

Note that I had to reduce number of region proposals in the Faster RCNN models otherwise it runs too slowly. All code I used for testing could be found in my GitHub repository: https://github.com/jkjung-avt/tf_trt_models

@inders
Copy link

inders commented Dec 13, 2018

I am facing the following error while trying to get the FasterRCNN model on TensorRT. I have tried changing the resizer as per @bezero comment 6 but still doesn't help yet. Any pointers would be highly appreciated.

.cc:724] Can't determine the device, constructing an allocator at device 0
2018-12-13 09:49:37.182205: E tensorflow/contrib/tensorrt/log/trt_logger.cc:38] DefaultLogger Parameter check failed at: Network.cpp::addInput::281, condition: isIndexedCHW(dims) && volume(dims) < MAX_TENSOR_SIZE
2018-12-13 09:49:37.182317: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:857] Engine creation for segment 0, composed of 3 nodes failed: Invalid argument: Failed to create Input layer tensor InputPH_0 rank=-2. Skipping...
2018-12-13 09:49:37.182353: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:724] Can't determine the device, constructing an allocator at device 0

@atyshka
Copy link

atyshka commented Dec 13, 2018

@bezero @jkjung-avt Did you run your faster rcnn models in the jupyter notebook? The notebook code works fine for me for the ssd models but if try the faster rcnn models I'm getting Engine buffer is full. buffer limit=1, current entries=1, requested batch=100. I'm using the exact notebook code with three modifications:
1: MODEL = 'faster_rcnn_resnet50_coco'
2: removed score_threshold=0.3 from build_detection_graph(...
3: changed to fixed_shape_resizer { height: 600 width: 1024 } in the config file

Can either of you reproduce this issue? I'm using Tensorflow 1.12 and TensorRT 5.0

@jkjung-avt
Copy link

I haven't managed to get the faster_rcnn_resnet50 model to work with tensorflow 1.12.0 and TensorRT. Previously I got it to work using tensorflow 1.8.0, with some tweaks. Details are all in my GitHub repository: https://github.com/jkjung-avt/tf_trt_models/blob/master/data/faster_rcnn_resnet50_egohands.config

@CharlieXie
Copy link

Hi @jkjung-avt, I used your configure file: https://github.com/jkjung-avt/tf_trt_models/blob/master/data/faster_rcnn_inception_v2_egohands.config to train a model on my dataset(class num 13) and then tried to convert it to TRT but still got the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid graph: Frame ids for node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 does not match frame ids for it's fanout.
How to did u get rid of this error?
Another issue I'm facing is that my trained FRCNN-inception-v2 checkpoint file (103.5MB) is about twice size of fined-tuned checkpoint file(53.3MB). Do you have any idea about this?
Thanks in advance.

@jkjung-avt
Copy link

@CharlieXie, try setting 'remove_assert' to False. I recall that's how I got rid of the problem previously.

https://github.com/NVIDIA-AI-IOT/tf_trt_models/blob/master/tf_trt_models/detection.py#L108

@xiaowenhe
Copy link

@jkjung-avt , I use your tf_trt_models, when I run python3 camera_tf_trt.py --image --filename=xxx, --model=faster_rcnn_resnet50_coco --build .I met an error ,like:
1
I do not know how to deal with it? Can you help me!

And when I run python3 camera_tf_trt.py --image --filename=xxx, --model=faster_rcnn_resnet50_coco . Do not build, no error,but detec result is not ideal,like:

2

@jkjung-avt
Copy link

@xiaowenhe The segmentation fault could be caused by "out of memory" issue. You could use 'tegrastat' to monitor JTX2 memory usage and try to confirm if that's the case.

As to the bad detection result by TF-TRT optimized faster_rcnn_resnet50_coco model, I'm not exactly sure what the problem is. There could be many causes, e.g.

  • mismatching tensorflow versions between training and inferencing,
  • TF-TRT does not optimize certain operations in the model correctly,
  • ...

@xiaowenhe
Copy link

@jkjung-avt ,thank you! But I bo not use TX2,. I want to test it in other first and then use tx2. And GPU like :
3

From the pic,only 5285M used!

@hoangtuanvu
Copy link

I can not force performance by using optimized TensorRT. Can someone tell my why? After optimizing the frozen graph, I get bigger model ???

@bezero
Copy link

bezero commented Mar 1, 2019

@hoangtuanvu What do you mean by not being able to optimize? TensorRT optimizes your frozen model for inference, which does not mean that you get smaller model. Did you compare inference time before and after TensorRT optimization?

@hoangtuanvu
Copy link

@bezero I used TensorRT to optimize the frozen graph, but I did not get better speed for inference. I am currently working on person detection.

@TomKomar
Copy link

TomKomar commented Mar 7, 2019

I'm having a similar situation to @atyshka - no improvement whatsoever. Only difference after generating an 'optimized' graph is that with every frame I'm getting a warning "Engine buffer is full". Has anyone figured out how to deal with this?
Xavier TF1.12+TRT5

@zhucheng725
Copy link

Although I ran the detection demo using ssd_mobilenet_v1_coco.pb, I found that if I used the TP16 in
trt.create_inference_graph(), and the result shows that the benchmark is about 0.041013 seconds and I used the INT8, the result shows that the benchmark is about 0.383557 seconds. Why will INT8 slower than TP16

@ashispapu
Copy link

ashispapu commented Jun 3, 2019

@hoangtuanvu I am facing an issue while running the inference on a tensorflow object detection model(242MB). I have TF 1.13 and TensorRT 5.1.2 . Below is the log details.
2019-06-03 15:05:21.432164: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1030] TensorRT node resnet_v1_101/conv1/TRTEngineOp_123 added for segment 123 consisting of 2 nodes succeeded.
2019-06-03 15:05:21.432437: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:1030] TensorRT node rpn_proposals/softmax/TRTEngineOp_124 added for segment 124 consisting of 3 nodes succeeded.
2019-06-03 15:05:22.384389: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:616] Optimization results for grappler item: tf_graph
2019-06-03 15:05:22.384595: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] constant folding: Graph size after: 2014 nodes (-599), 2353 edges (-637), time = 4514.9751ms.
2019-06-03 15:05:22.384653: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] layout: Graph size after: 2063 nodes (49), 2422 edges (69), time = 462.632ms.
2019-06-03 15:05:22.384702: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] constant folding: Graph size after: 2059 nodes (-4), 2422 edges (0), time = 908.786ms.
2019-06-03 15:05:22.384748: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:618] TensorRTOptimizer: Graph size after: 1653 nodes (-406), 2000 edges (-422), time = 57351.3477ms.
time(s) (trt_conversion): 72.7292
graph_size(MB)(native_tf): 230.8
graph_size(MB)(trt): 493.0
num_nodes(trt_only): 125
2019-06-03 15:05:49.531006: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for TRTEngineOp_0 with batch size 720
2019-06-03 15:05:49.543881: W tensorflow/contrib/tensorrt/log/trt_logger.cc:34] DefaultLogger Tensor DataType is determined at build time for tensors not marked as input or output.
2019-06-03 15:05:55.369363: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/TRTEngineOp_23 with batch size 1
2019-06-03 15:05:55.837386: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/conv1/TRTEngineOp_123 with batch size 1
2019-06-03 15:05:57.403776: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/TRTEngineOp_24 with batch size 1
2019-06-03 15:06:10.529445: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_1/bottleneck_v1/TRTEngineOp_25 with batch size 1
2019-06-03 15:06:13.628441: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_1/bottleneck_v1/TRTEngineOp_26 with batch size 1
2019-06-03 15:06:20.675574: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/TRTEngineOp_27 with batch size 1
2019-06-03 15:06:25.591558: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_2/bottleneck_v1/TRTEngineOp_28 with batch size 1
2019-06-03 15:06:28.377901: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/unit_2/bottleneck_v1/TRTEngineOp_29 with batch size 1
2019-06-03 15:06:35.168358: I tensorflow/contrib/tensorrt/kernels/trt_engine_op.cc:496] Building a new TensorRT engine for resnet_v1_101/block1/TRTEngineOp_31 with batch size 1
Killed

========================================================================
when i run dmesg --follow to check the process details.

[10663.666441] [12648] 1000 12648 6243614 1507814 3582 12 0 0 python3
[10663.666444] Out of memory: Kill process 12648 (python3) score 751 or sacrifice child
[10663.674368] Killed process 12648 (python3) total-vm:24974456kB, anon-rss:5768628kB, file-rss:262628kB, shmem-rss:0kB
[10664.011176] oom_reaper: reaped process 12648 (python3), now anon-rss:0kB, file-rss:262708kB, shmem-rss:0kB

Any suggestion or feedback is appreciated.

@VincentChong123
Copy link

Hi @zhucheng725

Why will INT8 slower than TP16

Do you have any update on this?

Thanks

@srkm009
Copy link

srkm009 commented Aug 6, 2019

Hello,
Did anyone manage to resolve this issue? or is it still an issue from the TF-TRT?
I see the same issue with TF2.0 as well.

@zhucheng725
Copy link

Hi @zhucheng725

Why will INT8 slower than TP16

Do you have any update on this?

Thanks

Not yet

@spurani
Copy link

spurani commented Oct 25, 2020

I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts.
thanks

InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''.

@Some-random
Copy link

I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts.
thanks

InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''.

I'm having the same issue. Is there any update on this one? What is the meaning of this error anyway!

@spurani
Copy link

spurani commented Mar 30, 2021 via email

@Some-random
Copy link

If I am not wrong the error states that the system does not have enough memory to run faster_rcnn_inception_v2 model Get Outlook for Androidhttps://aka.ms/ghei36

________________________________ From: Bob_JIANG @.***> Sent: Tuesday, March 30, 2021, 12:04 p.m. To: NVIDIA-AI-IOT/tf_trt_models Cc: spurani; Comment Subject: Re: [NVIDIA-AI-IOT/tf_trt_models] Tensorrt supported detection networks (#6) I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts. thanks InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSup pression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''. I'm having the same issue. Is there any update on this one? What is the meaning of this error anyway! — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#6 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2BYGE7CIWHR774SEZXBSLTGHZA3ANCNFSM4FLQTZPA.

Thanks for the quick answer! I'm running a different model using TRT and my memory is normal during execution... Do you know the meaning of 'has inputs from different frames' in the error message?

@TClan8023
Copy link

I tried to run faster_rcnn_inception_v2 and got the following error. Does anyone have any clue about this? Any suggestion or advice would definitely help me to continue my learning by understanding these concepts. thanks

InvalidArgumentError: node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) has inputs from different frames. The input node BatchMultiClassNonMaxSuppression/map/while/Reshape_1 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame 'BatchMultiClassNonMaxSuppression/map/while/while_context'. The input node BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/Slice/begin (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) is in frame ''.

Hi, I'm using TF-TRT on windows10, with tf_gpu =2.10.0 and tensorrt = 7.2.3 based on cuda 11.2 and cudnn 8.1.0. I have met the same error while building TRT engine for inference. Do you know how to deal with it? Thanks a lot for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests