Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Srivastava kshitij new trt ops #332

Merged
merged 29 commits into from Jun 10, 2020
Merged

Conversation

jaybdub
Copy link
Contributor

@jaybdub jaybdub commented Jun 9, 2020

  • adds "enabled" flag to tensorrt_converter and add_module_test
  • adds support for many TensorRT 7+ operations
  • switches plugin build system to use PyTorch Extension

@jaybdub
Copy link
Contributor Author

jaybdub commented Jun 9, 2020

Tested with

Platform/GPU PyTorch Version TensorRT Version Notes
Jetson Xavier NX 1.4 7.1 JetPack 4.4 DP

@jaybdub
Copy link
Contributor Author

jaybdub commented Jun 9, 2020

@SrivastavaKshitij I've created this PR based off of your changes in #324 .

It applies some refactoring

  • adds '--plugins' flag to at least support PyTorch versions < 1.3 without plugins
  • replaces get_trt_version() -> float with trt_version() -> str
    • Pythons lexigraphical ordering can handle patches like '5.2' > '5.1.23' etc.
  • adds 'enabled' flag to allow inline filtering of converters / tests
    • this avoids having to explicitly separate based on TRT version, also allows us to re-use test cases for different converter implementations

Are you able to test this for the configurations you use to make sure nothing is broken in the refactor?

@SrivastavaKshitij
Copy link
Contributor

SrivastavaKshitij commented Jun 10, 2020

We should update README.md file

Test Results

NGC Container TRT version Pytorch version Status
Pytorch 19.07 5.1.5 PyTorch 1.2.0a0
Pytorch 19.12 6.0.1 PyTorch 1.4.0a0+a5b4d78 ✔️
Pytorch 20.03 7.0.0 1.5.0a0+8f84ded ✔️
Custom container 5.1 Pytorch=1.4, torchvision = 0.5 ✔️

@jaybdub : I am not sure how to build for Pytorch < 1.3 without plugins. When u say
adds '--plugins' flag to at least support PyTorch versions < 1.3 without plugins, I dont see a way not to use --plugins flag in setup.py.

Error related to Ngc container 19.07

/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/serialize/output-archive.h:47:8: note:   no known conversion for argument 2 from ‘c10::IValue’ to ‘torch::serialize::OutputArchive&’
torch2trt/plugins/interpolate.cpp: In member function ‘virtual nvinfer1::Dims torch2trt::InterpolatePlugin::getOutputDimensions(int, const nvinfer1::Dims*, int)’:
torch2trt/plugins/interpolate.cpp:123:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (int i = 0; i < size.size(); i++) {
                     ~~^~~~~~~~~~~~~
error: command 'gcc' failed with exit status 1

@jaybdub
Copy link
Contributor Author

jaybdub commented Jun 10, 2020

Thanks for the fast response!

Sorry, I forgot to push the --plugins addition to setup.py. It should be there now.

By update README do you mean to add the test platform matrix?

@SrivastavaKshitij
Copy link
Contributor

SrivastavaKshitij commented Jun 10, 2020

Ok . Let me test it quickly.

Under Setup section in README, we should say use plugins flag for torch < 1.3 and dont use the flag for > 1.3 . Something like that

@SrivastavaKshitij
Copy link
Contributor

SrivastavaKshitij commented Jun 10, 2020

Getting the following error when running unit tests:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/sw/torch2trt/torch2trt/test.py", line 114, in <module>
    max_error, fps, fps_trt, ms, ms_trt = run(test)
  File "/sw/torch2trt/torch2trt/test.py", line 23, in run
    module_trt = torch2trt(module, inputs_conversion, max_workspace_size=1 << 20,  **self.torch2trt_kwargs)
  File "/sw/torch2trt/torch2trt/torch2trt.py", line 407, in torch2trt
    outputs = module(*inputs)
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 525, in __call__
    result = self.forward(*input, **kwargs)
  File "/sw/torch2trt/torch2trt/converters/interpolate.py", line 95, in forward
    return F.interpolate(x, self.size, mode=self.mode, align_corners=self.align_corners)
  File "/sw/torch2trt/torch2trt/torch2trt.py", line 217, in wrapper
    converter["converter"](ctx)
  File "/sw/torch2trt/torch2trt/converters/interpolate.py", line 36, in convert_interpolate_plugin
    plugin = get_interpolate_plugin(size=size, mode=mode, align_corners=align_corners)
  File "/sw/torch2trt/torch2trt/converters/interpolate.py", line 9, in get_interpolate_plugin
    from torch2trt.plugins import InterpolatePlugin
ImportError: cannot import name 'InterpolatePlugin'

which makes sense because we didnt register plugin. I think we can have an if condition under converters/__init__.py where we dont register interpolate plugin op but prints out a warning saying Interpolate function is not compatible with Pytorch < 1.3

@jaybdub
Copy link
Contributor Author

jaybdub commented Jun 10, 2020

Added disclaimer to README.

Also, I set enabled = ... and torch.__version__ >= '1.3' for plugin based interpolate converter and relevant interpolate test cases.

@SrivastavaKshitij
Copy link
Contributor

SrivastavaKshitij commented Jun 10, 2020

That's a very good idea ! We can also add test platform matrix. People use different combinations of Pytorch, torchvision and TRT. Test platform matrix will give them an idea as to which combinations have been tried and tested. We can add a dockerfile for reference

Something like this:

FROM nvcr.io/nvidia/tensorrt:19.07-py3
RUN apt-get update
RUN pip install torchvision==0.5.0 torchvision==1.4.0 

RUN git clone --recursive https://github.com/NVIDIA-AI-IOT/torch2trt.git /sw/torch2trt && \
    cd /sw/torch2trt && 
    python setup.py build_ext --inplace

RUN pip install termcolor graphviz

This will help community in testing their environment easily

@SrivastavaKshitij
Copy link
Contributor

Final Results

NGC Container TRT version Pytorch version Status
Pytorch 19.07 5.1.5 PyTorch 1.2.0a0 ✔️
Pytorch 19.12 6.0.1 PyTorch 1.4.0a0+a5b4d78 ✔️
Pytorch 20.03 7.0.0 1.5.0a0+8f84ded ✔️
Custom container 5.1 Pytorch=1.4, torchvision = 0.5 ✔️

@jaybdub
Copy link
Contributor Author

jaybdub commented Jun 10, 2020

Thanks! This gives some confidence, I'll consider adding a test matrix.

It might be useful to log warnings like you suggested. But we can probably save this for another smaller PR.

Dockerfile would also be great. Jetson platforms are now heavily supporting cloud-native integration, so could be used there as well.

I'd like to get this PR merged first, but then would be great to consider these other features.

@SrivastavaKshitij
Copy link
Contributor

Thats perfect. I think this PR is ready to be merged :-)

@SrivastavaKshitij
Copy link
Contributor

SrivastavaKshitij commented Jun 10, 2020

[DO NOT MERGE]: I ran the build again and I cant get plugins to build . I dont know what broke

Now

Step 6/7 : RUN git clone --recursive https://github.com/NVIDIA-AI-IOT/torch2trt.git /sw/torch2trt &&     cd /sw/torch2trt &&     git fetch origin pull/332/head:PR332 &&     git checkout PR332 &&     python setup.py build_ext --inplace
 ---> Running in 4c2bd93fe0b4
Cloning into '/sw/torch2trt'...
From https://github.com/NVIDIA-AI-IOT/torch2trt
 * [new ref]         refs/pull/332/head -> PR332
Switched to branch 'PR332'
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
running build_ext

Earlier:

Step 6/7 : RUN git clone --recursive https://github.com/NVIDIA-AI-IOT/torch2trt.git /sw/torch2trt &&     cd /sw/torch2trt &&     git fetch origin pull/332/head:PR332 &&     git checkout PR332 &&     python setup.py build_ext --inplace
 ---> Running in 7c200b4d45c4                 
Cloning into '/sw/torch2trt'...                       
From https://github.com/NVIDIA-AI-IOT/torch2trt                                                                                                                                                                                                                                  
 * [new ref]         refs/pull/332/head -> PR332                                                                                                                                                                                                                                 
Switched to branch 'PR332'                                                                                                                            
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'                                    
running build_ext                                                                                                                    
building 'plugins' extension                                                                                                            
creating /sw/torch2trt/build                             
creating /sw/torch2trt/build/temp.linux-x86_64-3.6     
creating /sw/torch2trt/build/temp.linux-x86_64-3.6/torch2trt                                        
creating /sw/torch2trt/build/temp.linux-x86_64-3.6/torch2trt/plugins                              
Emitting ninja build file /sw/torch2trt/build/temp.linux-x86_64-3.6/build.ninja...                       
Compiling objects...                                                                                            
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] c++ -MMD -MF /sw/torch2trt/build/temp.linux-x86_64-3.6/torch2trt/plugins/interpolate.o.d -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/include/aarch64-linux-gnu -I/opt/conda/lib/py
thon3.6/site-packages/torch/include -I/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.6/site-packages/torch/include/TH -I/opt/conda/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/inc
lude/python3.6m -c -c /sw/torch2trt/torch2trt/plugins/interpolate.cpp -o /sw/torch2trt/build/temp.linux-x86_64-3.6/torch2trt/plugins/interpolate.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=plugins -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/utils.h:5:0,
                 from /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn.h:10,  
                 from /opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,                    
                 from /opt/conda/lib/python3.6/site-packages/torch/include/torch/extension.h:4,                                  
                 from /sw/torch2trt/torch2trt/plugins/interpolate.cpp:1:                                                             
/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/utils/rnn.h: In function ‘torch::nn::utils::rnn::PackedSequence torch::nn::utils::rnn::pack_sequence(c10::ArrayRef<at::Tensor>, bool)’:
/opt/conda/lib/python3.6/site-packages/torch/include/torch/csrc/api/include/torch/nn/utils/rnn.h:336:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   for (int64_t i = 0; i < sequences.size(); i++) {
                       ~~^~~~~~~~~~~~~~~~~~
/sw/torch2trt/torch2trt/plugins/interpolate.cpp: In member function ‘virtual nvinfer1::Dims torch2trt::InterpolatePlugin::getOutputDimensions(int, const nvinfer1::Dims*, int)’:
/sw/torch2trt/torch2trt/plugins/interpolate.cpp:123:23: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (int i = 0; i < size.size(); i++) {
                     ~~^~~~~~~~~~~~~
/sw/torch2trt/torch2trt/plugins/interpolate.cpp: In member function ‘virtual bool torch2trt::InterpolatePlugin::supportsFormat(nvinfer1::DataType, nvinfer1::PluginFormat) const’:
/sw/torch2trt/torch2trt/plugins/interpolate.cpp:131:33: warning: ‘kNCHW’ is deprecated [-Wdeprecated-declarations]
     if (format != PluginFormat::kNCHW) {
                                 ^~~~~
In file included from /usr/include/x86_64-linux-gnu/NvInferRuntime.h:59:0,
                 from /usr/include/x86_64-linux-gnu/NvInfer.h:53,
                 from /sw/torch2trt/torch2trt/plugins/interpolate.cpp:6:
/usr/include/x86_64-linux-gnu/NvInferRuntimeCommon.h:243:5: note: declared here
     kNCHW TRT_DEPRECATED_ENUM = kLINEAR, //! <-- Deprecated, used for backward compatibility
     ^~~~~
/sw/torch2trt/torch2trt/plugins/interpolate.cpp:131:33: warning: ‘kNCHW’ is deprecated [-Wdeprecated-declarations]
     if (format != PluginFormat::kNCHW) {
                                 ^~~~~
In file included from /usr/include/x86_64-linux-gnu/NvInferRuntime.h:59:0,
                 from /usr/include/x86_64-linux-gnu/NvInfer.h:53,
                 from /sw/torch2trt/torch2trt/plugins/interpolate.cpp:6:
/usr/include/x86_64-linux-gnu/NvInferRuntimeCommon.h:243:5: note: declared here
     kNCHW TRT_DEPRECATED_ENUM = kLINEAR, //! <-- Deprecated, used for backward compatibility
     ^~~~~
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/torch2trt
g++ -pthread -shared -B /opt/conda/compiler_compat -L/opt/conda/lib -Wl,-rpath=/opt/conda/lib -Wl,--no-as-needed -Wl,--sysroot=/ /sw/torch2trt/build/temp.linux-x86_64-3.6/torch2trt/plugins/interpolate.o -L/usr/lib/aarch64-linux-gnu -L/opt/conda/lib/python3.6/site-packages/
torch/lib -L/usr/local/cuda/lib64 -lnvinfer -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-3.6/torch2trt/plugins.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/torch2trt/plugins.cpython-36m-x86_64-linux-gnu.so -> torch2trt

@jaybdub
Copy link
Contributor Author

jaybdub commented Jun 10, 2020

Which system configuration is this? I only made a small change since last message, but shouldn't effect building.

@SrivastavaKshitij
Copy link
Contributor

SrivastavaKshitij commented Jun 10, 2020

Ohhhh! It broke after 57c8188. My docker intermediate images were cached , hence I didnt catch it. This time i ran with --no-cache and bisected all the commits and 57c8188 is the commit that broke plugin. Sorry for the oversight

@SrivastavaKshitij
Copy link
Contributor

SrivastavaKshitij commented Jun 10, 2020

Makes sense.

setup(
    name='torch2trt',
    version='0.1.0',
    description='An easy to use PyTorch to TensorRT converter',
    packages=find_packages(),
    ext_package='torch2trt',
    ext_modules=ext_modules,
    cmdclass={'build_ext': BuildExtension}
)

and ext_modules= [] on Line 12. I think its not appending properly

@SrivastavaKshitij
Copy link
Contributor

There is no problem in the workflow

@jaybdub : Sorry John, my bad. there was confusion on my side. I didnt add --plugins flag.

When i ran the following command, it works:
python setup.py build_ext --inplace --plugins

So after 57c8188 , I was supposed to add --plugins flag but I didn't and the docker images were cached so i never got an error. When I ran with --no-cached when building docker file, I got the error and thought that the workflow is broken.

@jaybdub
Copy link
Contributor Author

jaybdub commented Jun 10, 2020

Did you add --plugins to setup.py call?

Judging from the error you sent, it seems like it's attempting to build the plugin.

My guess would be it's a linking / include directory issue in the extension.

Right now it's hard coded to add the TensorRT paths for Jetson platforms. Maybe your docker had the environment set up so it didn't need this.

Can you verify if this is the issue?

If so, we can probably search for the correct path in setup.py, or if it can't be found allow user to pass path using flags.

@jaybdub
Copy link
Contributor Author

jaybdub commented Jun 10, 2020

Ah, I see. So just to confirm -- building with --plugins resolved issues, and test matrix is what you listed previously (all succeed)?

@SrivastavaKshitij
Copy link
Contributor

SrivastavaKshitij commented Jun 10, 2020

@jaybdub : I ran build command again for all the combinations in test matrix and also ran unit tests. Everything is fine. Sorry for the false alarm !

@jaybdub
Copy link
Contributor Author

jaybdub commented Jun 10, 2020

No worries, good to hear!

Going to do a few sanity tests on torchvision models and then merge.

@jaybdub jaybdub merged commit 1f66266 into master Jun 10, 2020
@jaybdub jaybdub deleted the SrivastavaKshitij-new_trt_ops branch June 10, 2020 04:09
jaybdub added a commit that referenced this pull request Jun 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants