Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Show PyTorch code of unsupported operators #521

Conversation

lamhoangtung
Copy link
Contributor

@lamhoangtung lamhoangtung commented Jul 1, 2021

Signed-off-by: lamhoangtung lamhoangtung.vz@gmail.com

Description

Inspired by my problem in #511, I'm trying to enable TensorRT for a TorchScript model and getting a bunch of Unsupported operators. I'm willing to change the implementation to avoid those unsupported operators or even trying to add support for it but struggling to find which line of code in my model are causing it.

Thanks to @narendasan guidance. I have tried to add a similar traceback feature like TorchScript where TRTorch will point to the exact line of PyTorch that cause the unsupported operators.

So instead of showing a vague operator schema like this:

ERROR: [TRTorch] - Method requested cannot be compiled by TRTorch.
Unsupported operators listed below:
  -  aten::__contains__.str_list(str[] l, str item) -> (bool)
  -  ...
You can either implement converters for these ops in your application or request implementation
https://www.github.com/nvidia/TRTorch/issues

TRTorch should show the exact line of PyTorch that cause the unsupported operators like this:

ERROR: [TRTorch] - Method requested cannot be compiled by TRTorch.
Unsupported operators listed below:
  - aten::__contains__.str_list(str[] l, str item) -> (bool)
    Related PyTorch code:
  File "/home/techainer/anaconda3/envs/test_pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 3446
        )

    if mode in ("nearest", "area"):
       ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        if align_corners is not None:
            raise ValueError(
  - ...
You can either implement converters for these ops in your application or request implementation
https://www.github.com/nvidia/TRTorch/issues

So user can have additional context to the problem.

This should fix #511

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • My code follows the style guidelines of this project (I used the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation: I reckon there is no documentation change required for this feature
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes

Signed-off-by: lamhoangtung <lamhoangtung.vz@gmail.com>
Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! I'm sure its going to be incredibly useful!

It might be nice to keep the straight list of the operators that are not supported, which is especially helpful in characterizing the effort to support a model, then an op by op breakdown further down in the message since there might be the case of multiple instances of these operators in the code base. Then users will see all instances they will need to address.

Implementation wise I think it might be better to instead of changing GetUnsupportedOpsInBlock, to return std::set<std::pair<std::string, std::string>> it should probably return a set of torch::jit::FunctionSchemas or the FunctionScema::operator_name()s, so it is more reusable in the code base. We might want to use GetUnsupportedOpsInBlock for other things in the future. The VerifyConverterSupportForBlock should take it from schema to string and create the message and for the source locations basically we should add an additional message to the end of the unsupported_msg which is created by iterating through the block checking if nodes match any schema in the unsupported set and printing the op and source locations. You can compare the schemas by comparing there schema->operator_name()s

So the new message could look like:

ERROR: [TRTorch] - Method requested cannot be compiled by TRTorch.
Unsupported operators listed below:
  - aten::__contains__.str_list(str[] l, str item) -> (bool)
  - aten::flatten(....

You can either implement converters for these ops in your application or request implementation
https://www.github.com/nvidia/TRTorch/issues

In Module: 
  Unsupported operator:  aten::__contains__.str_list(str[] l, str item) -> (bool)
  File "/home/techainer/anaconda3/envs/test_pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 3446
        )

    if mode in ("nearest", "area"):
       ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        if align_corners is not None:
            raise ValueError(
  
   Unsupported operator: aten::flatten(....)
   File "/home/techainer/test.py", line 10
    torch.flatten(x) 
    ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    
   Unsupported operator:  aten::__contains__.str_list(str[] l, str item) -> (bool)
   File "/home/techainer/anaconda3/envs/test_pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 4246
        )

    if mode in ("bicubic", "nearest"):
       ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        if align_corners is not None:
            raise ValueError(

An additional nice little detail would be to throw the red error prefix we use for errors in front of each instance of the unsupported operator so its easy to separate each instance similar to like youd see in like gcc.

@lamhoangtung
Copy link
Contributor Author

Thanks for the review @narendasan. I will try to update and get back to you latter.

@lamhoangtung
Copy link
Contributor Author

lamhoangtung commented Jul 12, 2021

Hi @narendasan. I'm trying to make GetUnsupportedOpsInBlock return std::set<torch::jit::FunctionSchema> like this:

std::set<torch::jit::FunctionSchema> GetUnsupportedOpsInBlock(const torch::jit::Block* b) {
  std::set<torch::jit::FunctionSchema> unsupported_ops;
  for (const auto n : b->nodes()) {
    if (n->kind() != torch::jit::prim::Loop && n->kind() != torch::jit::prim::If && !OpSupported(n)) {
      auto schema = n->maybeSchema();
      TRTORCH_CHECK(
          schema,
          "Unable to get schema for Node " << util::node_info(n) << " (conversion.VerifyCoverterSupportForBlock)");
      unsupported_ops.insert((const torch::jit::FunctionSchema)*schema);
    }
    for (const auto sub_b : n->blocks()) {
      auto sub_b_unsupported_ops = GetUnsupportedOpsInBlock(sub_b);
      unsupported_ops.insert(sub_b_unsupported_ops.begin(), sub_b_unsupported_ops.end());
    }
  }
  return unsupported_ops;
}

But I've been stuck because of this error:

In file included from /usr/include/c++/9/string:48,
                 from /usr/include/c++/9/bits/locale_classes.h:40,
                 from /usr/include/c++/9/bits/ios_base.h:41,
                 from /usr/include/c++/9/ios:42,
                 from /usr/include/c++/9/istream:38,
                 from /usr/include/c++/9/sstream:38,
                 from core/conversion/conversion.cpp:1:
/usr/include/c++/9/bits/stl_function.h: In instantiation of 'constexpr bool std::less<_Tp>::operator()(const _Tp&, const _Tp&) const [with _Tp = c10::FunctionSchema]':                                                                       /usr/include/c++/9/bits/stl_tree.h:2095:11:   required from 'std::pair<std::_Rb_tree_node_base*, std::_Rb_tree_node_base*> std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_get_insert_unique_pos(const key_type&) [with _Key = c1
0::FunctionSchema; _Val = c10::FunctionSchema; _KeyOfValue = std::_Identity<c10::FunctionSchema>; _Compare = std::less<c10::FunctionSchema>; _Alloc = std::allocator<c10::FunctionSchema>; std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _A
lloc>::key_type = c10::FunctionSchema]'
/usr/include/c++/9/bits/stl_tree.h:2148:4:   required from 'std::pair<std::_Rb_tree_iterator<_Val>, bool> std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_insert_unique(_Arg&&) [with _Arg = const c10::FunctionSchema&; _Key = c
10::FunctionSchema; _Val = c10::FunctionSchema; _KeyOfValue = std::_Identity<c10::FunctionSchema>; _Compare = std::less<c10::FunctionSchema>; _Alloc = std::allocator<c10::FunctionSchema>]'
/usr/include/c++/9/bits/stl_set.h:511:48:   required from 'std::pair<typename std::_Rb_tree<_Key, _Key, std::_Identity<_Tp>, _Compare, typename __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other>::const_iterator, bool> std::set<_Key,
_Compare, _Alloc>::insert(const value_type&) [with _Key = c10::FunctionSchema; _Compare = std::less<c10::FunctionSchema>; _Alloc = std::allocator<c10::FunctionSchema>; typename std::_Rb_tree<_Key, _Key, std::_Identity<_Tp>, _Compare, type
name __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other>::const_iterator = std::_Rb_tree_const_iterator<c10::FunctionSchema>; std::set<_Key, _Compare, _Alloc>::value_type = c10::FunctionSchema]'
core/conversion/conversion.cpp:439:71:   required from here
/usr/include/c++/9/bits/stl_function.h:386:20: error: no match for 'operator<' (operand types are 'const c10::FunctionSchema' and 'const c10::FunctionSchema')
  386 |       { return __x < __y; }
...

And not really know how to get around this ... Can you help me out here ? Thanks

@narendasan
Copy link
Collaborator

Seems like the issue is there is no comparison operator implemented for the torch::jit::FunctionSchema type. Maybe its worth trying out torch::jit::FunctionSchema::operator_name which we use for comparison in the evaluator and converter libraries.

@lamhoangtung
Copy link
Contributor Author

lamhoangtung commented Jul 14, 2021

Thanks for the help @narendasan. I also tried torch::jit:FunctionSchema::operator_name, which is c10::OperatorName and the code ended up like this:

std::set<c10::OperatorName> GetUnsupportedOpsInBlock(const torch::jit::Block* b) {
  std::set<c10::OperatorName>unsupported_ops;
  for (const auto n : b->nodes()) {
    if (n->kind() != torch::jit::prim::Loop && n->kind() != torch::jit::prim::If && !OpSupported(n)) {
      auto schema = n->maybeSchema();
      TRTORCH_CHECK(
          schema,
          "Unable to get schema for Node " << util::node_info(n) << " (conversion.VerifyCoverterSupportForBlock)");
      unsupported_ops.insert(schema->operator_name());
    }
    for (const auto sub_b : n->blocks()) {
      auto sub_b_unsupported_ops = GetUnsupportedOpsInBlock(sub_b);
      unsupported_ops.insert(sub_b_unsupported_ops.begin(), sub_b_unsupported_ops.end());
    }
  }
  return unsupported_ops;
}

But looks like i'm getting the same problem:

In file included from /usr/include/c++/9/string:48,
                 from /usr/include/c++/9/bits/locale_classes.h:40,
                 from /usr/include/c++/9/bits/ios_base.h:41,
                 from /usr/include/c++/9/ios:42,
                 from /usr/include/c++/9/istream:38,
                 from /usr/include/c++/9/sstream:38,
                 from core/conversion/conversion.cpp:1:
/usr/include/c++/9/bits/stl_function.h: In instantiation of 'constexpr bool std::less<_Tp>::operator()(const _Tp&, const _Tp&) const [with _Tp = c10::OperatorName]':
/usr/include/c++/9/bits/stl_tree.h:2095:11:   required from 'std::pair<std::_Rb_tree_node_base*, std::_Rb_tree_node_base*> std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_get_insert_unique_pos(const key_type&) [with _Key = c10::OperatorName; _Val = c10::OperatorName; _KeyOfValue = std::_Identity<c10::OperatorName>; _Compare = std::less<c10::OperatorName>; _Alloc = std::allocator<c10::OperatorName>; std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::key_type = c10::OperatorName]'
/usr/include/c++/9/bits/stl_tree.h:2148:4:   required from 'std::pair<std::_Rb_tree_iterator<_Val>, bool> std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_insert_unique(_Arg&&) [with _Arg = const c10::OperatorName&; _Key = c10::OperatorName; _Val = c10::OperatorName; _KeyOfValue = std::_Identity<c10::OperatorName>; _Compare = std::less<c10::OperatorName>; _Alloc = std::allocator<c10::OperatorName>]'
/usr/include/c++/9/bits/stl_set.h:511:48:   required from 'std::pair<typename std::_Rb_tree<_Key, _Key, std::_Identity<_Tp>, _Compare, typename __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other>::const_iterator, bool> std::set<_Key, _Compare, _Alloc>::insert(const value_type&) [with _Key = c10::OperatorName; _Compare = std::less<c10::OperatorName>; _Alloc = std::allocator<c10::OperatorName>; typename std::_Rb_tree<_Key, _Key, std::_Identity<_Tp>, _Compare, typename __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other>::const_iterator = std::_Rb_tree_const_iterator<c10::OperatorName>; std::set<_Key, _Compare, _Alloc>::value_type = c10::OperatorName]'
core/conversion/conversion.cpp:439:53:   required from here
/usr/include/c++/9/bits/stl_function.h:386:20: error: no match for 'operator<' (operand types are 'const c10::OperatorName' and 'const c10::OperatorName')
  386 |       { return __x < __y; }
      |                ~~~~^~~~~
...

I think this is due to std::set require it's member to have comparison operator. So I tried std::vector and it indeed worked. But using std::vector will cause a lot of duplicated operator, so it quickly consume way to much memory and take a tons of time to iterate. I think we might have these fews option:

  1. Kept using std::string ?
  2. Add comparison operator for torch::jit::FunctionSchema or c10::OperatorName (I don't want to mess with libtorch internal yet)
  3. Use something else other than std::set, don't require comparison operator for it element but still allow to filter multiple identical values.

Any suggestion @narendasan ?

@narendasan
Copy link
Collaborator

narendasan commented Jul 14, 2021

Yeah maybe we could switch to a map. Something like std::unordered_map<c10::OperatorName, std::string> where the key is operator name and the value is the schema string

Signed-off-by: lamhoangtung <lamhoangtung.vz@gmail.com>
…OperatorName, std::string>

Signed-off-by: lamhoangtung <lamhoangtung.vz@gmail.com>
@lamhoangtung
Copy link
Contributor Author

Thanks for the suggestion @narendasan. First I thought that std::unordered_set<c10::OperatorName> might be enough but soon realized that I really need std::unordered_map<c10::OperatorName, std::string>. How silly I am 🤣

Everything is working as expected now. Can you take a look again ?

@lamhoangtung
Copy link
Contributor Author

The CI for C++ Linting are failing with:

Run docker exec cpplinter bash -c "cd /workspace && python3 /workspace/.github/scripts/run_cpp_linter.py"
  docker exec cpplinter bash -c "cd /workspace && python3 /workspace/.github/scripts/run_cpp_linter.py"
  shell: /usr/bin/bash -e {0}
Unable to submit in depth review, please review logs for linting issues
Error: Process completed with exit code 1.

Labeler are also failing with:

Run actions/labeler@v2
Error: HttpError: Resource not accessible by integration
Error: Resource not accessible by integration

I don't know which cause this and cannot see any linting logs. Can you help me @narendasan ?

Signed-off-by: lamhoangtung <lamhoangtung.vz@gmail.com>
@lamhoangtung
Copy link
Contributor Author

I've found the problem and pushed a fix to resolve the C++ linting issue. It should pass the CI now

@lamhoangtung
Copy link
Contributor Author

lamhoangtung commented Jul 17, 2021

Only the Labeler are failing but in my research that's due to Github Actions does not allow it to run from fork of outside collaborator. So all the CI check that needed to pass from my part have been passed. Can you take a look and merge this @narendasan. Thanks ^^

@lamhoangtung
Copy link
Contributor Author

I hope you haven't forgotten about this @narendasan :D

@narendasan
Copy link
Collaborator

Nope, sorry the final review is taking so long, probably will get to it by the end of the week

Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks basically ready to go once you make the change to LOG_ERROR

core/conversion/conversion.cpp Outdated Show resolved Hide resolved
Signed-off-by: Naren Dasan <naren@narendasan.com>
Signed-off-by: Naren Dasan <narens@nvidia.com>
@narendasan
Copy link
Collaborator

I opened a PR with the patch onto your branch lamhoangtung#1

…_traceback

Adding error prefix to pytorch traceback
@lamhoangtung
Copy link
Contributor Author

Thanks for the patch @narendasan

Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, looks good to go! Thanks for working on this, Im sure its going to be useful to a lot of people!

@narendasan narendasan merged commit 7113cbe into pytorch:master Jul 22, 2021
@lamhoangtung
Copy link
Contributor Author

Thanks for helping me implement this @narendasan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

❓ [Question] How can I trace the code that causing Unsupported operators
2 participants