New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Show PyTorch code of unsupported operators #521
feat: Show PyTorch code of unsupported operators #521
Conversation
Signed-off-by: lamhoangtung <lamhoangtung.vz@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this on! I'm sure its going to be incredibly useful!
It might be nice to keep the straight list of the operators that are not supported, which is especially helpful in characterizing the effort to support a model, then an op by op breakdown further down in the message since there might be the case of multiple instances of these operators in the code base. Then users will see all instances they will need to address.
Implementation wise I think it might be better to instead of changing GetUnsupportedOpsInBlock
, to return std::set<std::pair<std::string, std::string>>
it should probably return a set of torch::jit::FunctionSchema
s or the FunctionScema::operator_name()
s, so it is more reusable in the code base. We might want to use GetUnsupportedOpsInBlock
for other things in the future. The VerifyConverterSupportForBlock
should take it from schema to string and create the message and for the source locations basically we should add an additional message to the end of the unsupported_msg
which is created by iterating through the block checking if nodes match any schema in the unsupported set and printing the op and source locations. You can compare the schemas by comparing there schema->operator_name()
s
So the new message could look like:
ERROR: [TRTorch] - Method requested cannot be compiled by TRTorch.
Unsupported operators listed below:
- aten::__contains__.str_list(str[] l, str item) -> (bool)
- aten::flatten(....
You can either implement converters for these ops in your application or request implementation
https://www.github.com/nvidia/TRTorch/issues
In Module:
Unsupported operator: aten::__contains__.str_list(str[] l, str item) -> (bool)
File "/home/techainer/anaconda3/envs/test_pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 3446
)
if mode in ("nearest", "area"):
~~~~~~~~~~~~~~~~~~~~~ <--- HERE
if align_corners is not None:
raise ValueError(
Unsupported operator: aten::flatten(....)
File "/home/techainer/test.py", line 10
torch.flatten(x)
~~~~~~~~~~~~~~~~~~~~~ <--- HERE
Unsupported operator: aten::__contains__.str_list(str[] l, str item) -> (bool)
File "/home/techainer/anaconda3/envs/test_pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 4246
)
if mode in ("bicubic", "nearest"):
~~~~~~~~~~~~~~~~~~~~~ <--- HERE
if align_corners is not None:
raise ValueError(
An additional nice little detail would be to throw the red error prefix we use for errors in front of each instance of the unsupported operator so its easy to separate each instance similar to like youd see in like gcc.
Thanks for the review @narendasan. I will try to update and get back to you latter. |
Hi @narendasan. I'm trying to make std::set<torch::jit::FunctionSchema> GetUnsupportedOpsInBlock(const torch::jit::Block* b) {
std::set<torch::jit::FunctionSchema> unsupported_ops;
for (const auto n : b->nodes()) {
if (n->kind() != torch::jit::prim::Loop && n->kind() != torch::jit::prim::If && !OpSupported(n)) {
auto schema = n->maybeSchema();
TRTORCH_CHECK(
schema,
"Unable to get schema for Node " << util::node_info(n) << " (conversion.VerifyCoverterSupportForBlock)");
unsupported_ops.insert((const torch::jit::FunctionSchema)*schema);
}
for (const auto sub_b : n->blocks()) {
auto sub_b_unsupported_ops = GetUnsupportedOpsInBlock(sub_b);
unsupported_ops.insert(sub_b_unsupported_ops.begin(), sub_b_unsupported_ops.end());
}
}
return unsupported_ops;
} But I've been stuck because of this error: In file included from /usr/include/c++/9/string:48,
from /usr/include/c++/9/bits/locale_classes.h:40,
from /usr/include/c++/9/bits/ios_base.h:41,
from /usr/include/c++/9/ios:42,
from /usr/include/c++/9/istream:38,
from /usr/include/c++/9/sstream:38,
from core/conversion/conversion.cpp:1:
/usr/include/c++/9/bits/stl_function.h: In instantiation of 'constexpr bool std::less<_Tp>::operator()(const _Tp&, const _Tp&) const [with _Tp = c10::FunctionSchema]': /usr/include/c++/9/bits/stl_tree.h:2095:11: required from 'std::pair<std::_Rb_tree_node_base*, std::_Rb_tree_node_base*> std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_get_insert_unique_pos(const key_type&) [with _Key = c1
0::FunctionSchema; _Val = c10::FunctionSchema; _KeyOfValue = std::_Identity<c10::FunctionSchema>; _Compare = std::less<c10::FunctionSchema>; _Alloc = std::allocator<c10::FunctionSchema>; std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _A
lloc>::key_type = c10::FunctionSchema]'
/usr/include/c++/9/bits/stl_tree.h:2148:4: required from 'std::pair<std::_Rb_tree_iterator<_Val>, bool> std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_insert_unique(_Arg&&) [with _Arg = const c10::FunctionSchema&; _Key = c
10::FunctionSchema; _Val = c10::FunctionSchema; _KeyOfValue = std::_Identity<c10::FunctionSchema>; _Compare = std::less<c10::FunctionSchema>; _Alloc = std::allocator<c10::FunctionSchema>]'
/usr/include/c++/9/bits/stl_set.h:511:48: required from 'std::pair<typename std::_Rb_tree<_Key, _Key, std::_Identity<_Tp>, _Compare, typename __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other>::const_iterator, bool> std::set<_Key,
_Compare, _Alloc>::insert(const value_type&) [with _Key = c10::FunctionSchema; _Compare = std::less<c10::FunctionSchema>; _Alloc = std::allocator<c10::FunctionSchema>; typename std::_Rb_tree<_Key, _Key, std::_Identity<_Tp>, _Compare, type
name __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other>::const_iterator = std::_Rb_tree_const_iterator<c10::FunctionSchema>; std::set<_Key, _Compare, _Alloc>::value_type = c10::FunctionSchema]'
core/conversion/conversion.cpp:439:71: required from here
/usr/include/c++/9/bits/stl_function.h:386:20: error: no match for 'operator<' (operand types are 'const c10::FunctionSchema' and 'const c10::FunctionSchema')
386 | { return __x < __y; }
... And not really know how to get around this ... Can you help me out here ? Thanks |
Seems like the issue is there is no comparison operator implemented for the |
Thanks for the help @narendasan. I also tried std::set<c10::OperatorName> GetUnsupportedOpsInBlock(const torch::jit::Block* b) {
std::set<c10::OperatorName>unsupported_ops;
for (const auto n : b->nodes()) {
if (n->kind() != torch::jit::prim::Loop && n->kind() != torch::jit::prim::If && !OpSupported(n)) {
auto schema = n->maybeSchema();
TRTORCH_CHECK(
schema,
"Unable to get schema for Node " << util::node_info(n) << " (conversion.VerifyCoverterSupportForBlock)");
unsupported_ops.insert(schema->operator_name());
}
for (const auto sub_b : n->blocks()) {
auto sub_b_unsupported_ops = GetUnsupportedOpsInBlock(sub_b);
unsupported_ops.insert(sub_b_unsupported_ops.begin(), sub_b_unsupported_ops.end());
}
}
return unsupported_ops;
} But looks like i'm getting the same problem: In file included from /usr/include/c++/9/string:48,
from /usr/include/c++/9/bits/locale_classes.h:40,
from /usr/include/c++/9/bits/ios_base.h:41,
from /usr/include/c++/9/ios:42,
from /usr/include/c++/9/istream:38,
from /usr/include/c++/9/sstream:38,
from core/conversion/conversion.cpp:1:
/usr/include/c++/9/bits/stl_function.h: In instantiation of 'constexpr bool std::less<_Tp>::operator()(const _Tp&, const _Tp&) const [with _Tp = c10::OperatorName]':
/usr/include/c++/9/bits/stl_tree.h:2095:11: required from 'std::pair<std::_Rb_tree_node_base*, std::_Rb_tree_node_base*> std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_get_insert_unique_pos(const key_type&) [with _Key = c10::OperatorName; _Val = c10::OperatorName; _KeyOfValue = std::_Identity<c10::OperatorName>; _Compare = std::less<c10::OperatorName>; _Alloc = std::allocator<c10::OperatorName>; std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::key_type = c10::OperatorName]'
/usr/include/c++/9/bits/stl_tree.h:2148:4: required from 'std::pair<std::_Rb_tree_iterator<_Val>, bool> std::_Rb_tree<_Key, _Val, _KeyOfValue, _Compare, _Alloc>::_M_insert_unique(_Arg&&) [with _Arg = const c10::OperatorName&; _Key = c10::OperatorName; _Val = c10::OperatorName; _KeyOfValue = std::_Identity<c10::OperatorName>; _Compare = std::less<c10::OperatorName>; _Alloc = std::allocator<c10::OperatorName>]'
/usr/include/c++/9/bits/stl_set.h:511:48: required from 'std::pair<typename std::_Rb_tree<_Key, _Key, std::_Identity<_Tp>, _Compare, typename __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other>::const_iterator, bool> std::set<_Key, _Compare, _Alloc>::insert(const value_type&) [with _Key = c10::OperatorName; _Compare = std::less<c10::OperatorName>; _Alloc = std::allocator<c10::OperatorName>; typename std::_Rb_tree<_Key, _Key, std::_Identity<_Tp>, _Compare, typename __gnu_cxx::__alloc_traits<_Alloc>::rebind<_Key>::other>::const_iterator = std::_Rb_tree_const_iterator<c10::OperatorName>; std::set<_Key, _Compare, _Alloc>::value_type = c10::OperatorName]'
core/conversion/conversion.cpp:439:53: required from here
/usr/include/c++/9/bits/stl_function.h:386:20: error: no match for 'operator<' (operand types are 'const c10::OperatorName' and 'const c10::OperatorName')
386 | { return __x < __y; }
| ~~~~^~~~~
... I think this is due to
Any suggestion @narendasan ? |
Yeah maybe we could switch to a map. Something like |
Signed-off-by: lamhoangtung <lamhoangtung.vz@gmail.com>
…OperatorName, std::string> Signed-off-by: lamhoangtung <lamhoangtung.vz@gmail.com>
Thanks for the suggestion @narendasan. First I thought that Everything is working as expected now. Can you take a look again ? |
The CI for C++ Linting are failing with: Run docker exec cpplinter bash -c "cd /workspace && python3 /workspace/.github/scripts/run_cpp_linter.py"
docker exec cpplinter bash -c "cd /workspace && python3 /workspace/.github/scripts/run_cpp_linter.py"
shell: /usr/bin/bash -e {0}
Unable to submit in depth review, please review logs for linting issues
Error: Process completed with exit code 1. Labeler are also failing with: Run actions/labeler@v2
Error: HttpError: Resource not accessible by integration
Error: Resource not accessible by integration I don't know which cause this and cannot see any linting logs. Can you help me @narendasan ? |
Signed-off-by: lamhoangtung <lamhoangtung.vz@gmail.com>
I've found the problem and pushed a fix to resolve the C++ linting issue. It should pass the CI now |
Only the Labeler are failing but in my research that's due to Github Actions does not allow it to run from fork of outside collaborator. So all the CI check that needed to pass from my part have been passed. Can you take a look and merge this @narendasan. Thanks ^^ |
I hope you haven't forgotten about this @narendasan :D |
Nope, sorry the final review is taking so long, probably will get to it by the end of the week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks basically ready to go once you make the change to LOG_ERROR
Signed-off-by: Naren Dasan <naren@narendasan.com> Signed-off-by: Naren Dasan <narens@nvidia.com>
I opened a PR with the patch onto your branch lamhoangtung#1 |
…_traceback Adding error prefix to pytorch traceback
Thanks for the patch @narendasan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, looks good to go! Thanks for working on this, Im sure its going to be useful to a lot of people!
Thanks for helping me implement this @narendasan |
Signed-off-by: lamhoangtung lamhoangtung.vz@gmail.com
Description
Inspired by my problem in #511, I'm trying to enable TensorRT for a TorchScript model and getting a bunch of Unsupported operators. I'm willing to change the implementation to avoid those unsupported operators or even trying to add support for it but struggling to find which line of code in my model are causing it.
Thanks to @narendasan guidance. I have tried to add a similar traceback feature like TorchScript where TRTorch will point to the exact line of PyTorch that cause the unsupported operators.
So instead of showing a vague operator schema like this:
TRTorch should show the exact line of PyTorch that cause the unsupported operators like this:
So user can have additional context to the problem.
This should fix #511
Type of change
Checklist: