[TVM][Bugfix] fix storage_rewrite bug when input is big #2580

zhiics · 2019-02-10T08:08:41Z

This PR fixes a bug when input size is large. For example, the size of a tensor in terms of bits may exceed the maximum of int32. It causes core dump at runtime. This bug is identified with the help from @yzhliu

@yzhliu @tqchen @wweic please review

wweic · 2019-02-10T18:53:47Z

vta/tests/python/unittest/test_vta_insn.py is failing at line https://github.com/dmlc/tvm/blob/919bea8c79de5de9996cb4714fdb92b2149a023b/vta/python/vta/ir_pass.py#L716. It's because your change will wrap index expression inside a Cast expression like
(int64((((i0*64) + (i1*16)) + i3)) + (int64(cthread.s)*(int64)128)).

https://github.com/dmlc/tvm/blob/919bea8c79de5de9996cb4714fdb92b2149a023b/src/arithmetic/detect_linear_equation.cc#L29 needs to support Cast node.

zhiics · 2019-02-12T08:02:37Z

@tqchen Could you please take a look?
Should we also do ComputeReduce for e->allocs[0]->extents at line 554? The test is evaluated to 16384 * 16384 = 268435456 on CPU test in the CI, but is kept as 16384 x 16384 in ci_i386 test. Any advice? Thanks.

tqchen · 2019-02-12T18:47:40Z

Let us make the behavior so that for now, we can keep arithmetic in int32 if there is no overflow, and use int64 only when there is a chance of overflow

src/pass/storage_rewrite.cc

zhiics · 2019-02-12T19:12:43Z

@tqchen Thanks. I will add it and update the PR.

tqchen · 2019-02-13T17:59:54Z

LGTM from my side. @wweic @yzhliu , please https://docs.tvm.ai/contribute/code_review.html#approve-and-request-changes-explicitly

wweic

lgtm.

Anthony-Mai · 2019-02-13T19:20:43Z

I have reservation about this code change. It seems to me this is too much a cost of increased risk and code complexity for something not of too much practical value, e.g., number of bits exceeds max int32. You end up having code which uses int32 in one case and int64 in another? Super confusing! If number of bits exceeds max int32 you likely have other problem to worry about, like, unable to allocate memory that big.

I suggest either this should be left not fixed. Or fix in more conservative way, like using unsigned int32 instead of signed, so you can have up to 4 billion bits. Or if you feel strongly about supporting more than 4 billion bits, uniformly use int64. Just don't have a mixed case of int32/int64 which is too risky.

tqchen · 2019-02-13T19:46:59Z

@Anthony-Mai has some point on this. Perhaps a simple way forward is to first use a CHECK to make sure we do not go OOM and use int32 atm. As discussed in #2588 , we want to transition to int64 once we have a proper narrowing pass that detects the best data types, and we can move on from there

zhiics · 2019-02-13T19:47:41Z

@Anthony-Mai Thanks for your comment. I think allocating more than 2G or even 4G memory on a 64-bit system should be okay. We have an usage that requires a large amount of memory. We are also probably going to test very large input images on some networks, where uint32 might not be enough.

I am not sure if we need to use int64 all around the other places because it seems that VTA at least only uses int32, it would result in many casts. In addition, as per @yzhliu comment, we are warning the users here. I understand that using one consistent type is ideal, but casting as needed in this case is more like a small optimization. Anyway, I am happy to hear other voices. @tqchen @yzhliu @wweic

zhiics · 2019-02-13T19:57:50Z

@tqchen Sorry, I didn't see your comment above. It seems we sent around the same time... Please take another look. But it seems we need more than max int32 to support the usage.

tqchen · 2019-02-13T20:03:49Z

I think we all agree on fixing the problem. The question is should we fix it now or wait and come back to fix it after we transition most defaults from int32->int64 and apply optimal narrowing.

yzhliu · 2019-02-13T21:04:34Z

Does uint32 work? If so this can be a valid fix which satisfy both the special use case @zhiics and the team met so far, and avoiding type mismatch.

zhiics · 2019-02-13T21:29:41Z

@yzhliu uint32 should be enough for that special case. NVM, for even larger tests, I think we might be able to wait till everything is fixed. Let me change it to uint32.

tqchen · 2019-02-13T21:51:23Z

I would advise against uint32, int64 is a better choice mainly because the additional 1 bit wont really bring too much benefit

zhiics · 2019-02-13T21:55:27Z

To be honest, I personally also think int64 is better. uint32 doesn't change anything in this context because they are still different types.

zhiics · 2019-02-13T21:58:51Z

okay, if nobody disagrees, I will change it back to int64.

yzhliu · 2019-02-13T22:05:50Z

oh I thought with uint32 we can avoid mismatch. If not I'm good with the i64 fix. Let's also track it in #2588

yzhliu · 2019-02-14T16:56:56Z

Thanks @zhiics @wweic @tqchen @Anthony-Mai Let’s merge it for now, and come back when i32->i64 transition has been finished.

* fix storage_rewrite bug when input is big * cast when necessary * simplification * simplification * int64->uint32 * revert uint32->int64

yzhliu reviewed Feb 12, 2019

View reviewed changes

src/pass/storage_rewrite.cc Show resolved Hide resolved

zhiics added 3 commits February 12, 2019 20:21

fix storage_rewrite bug when input is big

04f34e6

cast when necessary

b2965b3

simplification

00c41da

zhiics force-pushed the fix_storage_rewrite branch from b451aba to 00c41da Compare February 12, 2019 20:22

simplification

56f3234

tqchen approved these changes Feb 13, 2019

View reviewed changes

tqchen added the status: need review label Feb 13, 2019

wweic approved these changes Feb 13, 2019

View reviewed changes

int64->uint32

f512daa

revert uint32->int64

3ed1680

yzhliu approved these changes Feb 13, 2019

View reviewed changes

wweic approved these changes Feb 13, 2019

View reviewed changes

yzhliu merged commit 326fff5 into apache:master Feb 14, 2019

yzhliu added status: accepted and removed status: need review labels Feb 14, 2019

zhiics deleted the fix_storage_rewrite branch February 14, 2019 17:08

This was referenced Feb 15, 2019

TVM 0.5 Release Note #2448

Closed

[VOTE] TVM 0.5 Release #2547

Closed

[VOTE] TVM 0.5 Release #2614

Closed

yzhliu mentioned this pull request Mar 2, 2019

[DEV] TVM v0.6 Roadmap #2623

Closed

28 tasks

yzhliu mentioned this pull request Nov 11, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TVM][Bugfix] fix storage_rewrite bug when input is big #2580

[TVM][Bugfix] fix storage_rewrite bug when input is big #2580

zhiics commented Feb 10, 2019 •

edited

wweic commented Feb 10, 2019

zhiics commented Feb 12, 2019

tqchen commented Feb 12, 2019

zhiics commented Feb 12, 2019

tqchen commented Feb 13, 2019 •

edited

wweic left a comment

Anthony-Mai commented Feb 13, 2019

tqchen commented Feb 13, 2019

zhiics commented Feb 13, 2019 •

edited

zhiics commented Feb 13, 2019

tqchen commented Feb 13, 2019

yzhliu commented Feb 13, 2019 •

edited

zhiics commented Feb 13, 2019

tqchen commented Feb 13, 2019

zhiics commented Feb 13, 2019

zhiics commented Feb 13, 2019

yzhliu commented Feb 13, 2019

yzhliu commented Feb 14, 2019

[TVM][Bugfix] fix storage_rewrite bug when input is big #2580

[TVM][Bugfix] fix storage_rewrite bug when input is big #2580

Conversation

zhiics commented Feb 10, 2019 • edited

wweic commented Feb 10, 2019

zhiics commented Feb 12, 2019

tqchen commented Feb 12, 2019

zhiics commented Feb 12, 2019

tqchen commented Feb 13, 2019 • edited

wweic left a comment

Choose a reason for hiding this comment

Anthony-Mai commented Feb 13, 2019

tqchen commented Feb 13, 2019

zhiics commented Feb 13, 2019 • edited

zhiics commented Feb 13, 2019

tqchen commented Feb 13, 2019

yzhliu commented Feb 13, 2019 • edited

zhiics commented Feb 13, 2019

tqchen commented Feb 13, 2019

zhiics commented Feb 13, 2019

zhiics commented Feb 13, 2019

yzhliu commented Feb 13, 2019

yzhliu commented Feb 14, 2019

zhiics commented Feb 10, 2019 •

edited

tqchen commented Feb 13, 2019 •

edited

zhiics commented Feb 13, 2019 •

edited

yzhliu commented Feb 13, 2019 •

edited