Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Unexpected crash by tricky parameters in TVM's Python API #8988

Closed
wants to merge 9 commits into from

Conversation

ganler
Copy link
Contributor

@ganler ganler commented Sep 11, 2021

Background

This is a fixing proposal to Discussion#10988 and #8979 .

Approach

This PR fixed unexpected C++ crash by:

  1. check nullptr before using operator-> (in this case, it must crash);
  2. allow None containers (Array) being capable of range-based for; (otherwise crash);
  3. add specific nullptr checking to other complicated cases (e.g., ICHECK(false) << ... << statement_causing_TVMError will result in crash);

This PR passed test cases contributed in #8979 (the crashes that can be trigger by giving None parameters in TVM's core python APIs)

Comparison with other options

According to the discussion, I also tried other proposals but don't think they are good approaches:
(1) NOTNULLABLE and tvm::runtime::Optional suggested by @junrushao1994 and @Hzfengsy ; -> still requires lots of code changes on legacy codes;
(2) Add checkers for all those .set_body* statements; -> not fundamental that checkers will scale when new interface coming in;

This PR fixed a fundamental issue causing invalid memory access crashes: avoid nullptr objects using -> operator because it must result in program crash; It only requires ~40 lines of edit and very few future engineering (since the fundamental issue is resolved).

Known issue

Statements like ICHECK(false) << ... << statement_causing_TVMError will still crash so that we have to manually add new checkers before it. e.g., line 112 in BlockReadWriteDetector::operator().

@junrushao
Copy link
Member

I have to agree that failing, even on adversarially constructed cases by a fuzzer, is actual failing. Thanks for pointing this out 😃🙏

On the other hand, I feel this particular change to the object system might be a bit aggressive, given in TVM nullability by default is allowed by design, and enforcing nullptr checks every time might be contradictory with the assumptions in some pre-historic pieces of the codebase (I'm sure there isn't many though).

Just wanted to share some recent efforts on improving nullability related stuff:

However, the effort was paused because the support is not perfect for composite TVM objects, i.e. TVM objects that contain other TVM objects, and it's less prioritized task given it only helps with adversarially constructed cases, we haven't seen too much progress on this side.

I am happy to chat more if you are interested! More fundamental contribution is warmly welcome ❤️

@ganler
Copy link
Contributor Author

ganler commented Sep 11, 2021

@junrushao1994 Thanks for the comments.

Yes, they are just some tricky edge cases that users would not intentionally meet in most cases (though it is possible by API misuse). I think they are only useful in some extreme and rare security cases, e.g., some hostile server attack to model servers using tvm.

As you said, a more fundamental and elegant workaround would be applying Optional, though I found it needs to change countless legacy codebase (e.g., changing many APIs into Optional<T> and convert them into .value() style). This PR is like a quick fix avoiding most crash issues by such tricky inputs (since the root cause of most issues is that applying data access (->) through nullptr). And it only considers avoiding bad inputs from Python API.

Since the conclusion seems to be clear (I know you guys are busy and such issues are minor and should not be prioritized), I think we can close this PR and simply record this proposal on the channel. Always tyty for the instant reply and patience ❤️

Comment on lines +713 to +718
const ObjectName* operator->() const { \
auto ptr = static_cast<const ObjectName*>(data_.get()); \
ICHECK(nullptr != ptr) << "Calling `->` to <" #ObjectName ">(null)"; \
return ptr; \
} \
const ObjectName* get() const { return static_cast<const ObjectName*>(data_.get()); } \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TVM nullability by default is allowed by design, and enforcing nullptr checks every time might be contradictory with the assumptions...

BTW, I don't think it violates the assumption. Since here ptr->SomeThing is called, crash happens if it's a nullptr, given that the macro here is TVM_DEFINE_OBJECT_REF_METHODS not TVM_DEFINE_**NOTNULLABLE**_OBJECT_REF_METHODS. The checking here can help convert the crash into a readable exception and this can already fix 95% crash resulted from unintentional parameters.

For TVM_DEFINE_NOTNULLABLE_OBJECT_REF_METHODS objects, it is still the same (not violating the assumption by applying redundant checking). :-)

@ganler ganler closed this Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants