Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix][TOPI] Fix the integer overflow problem of the scatter_nd op. #8415

Merged

Conversation

zhuwenxi
Copy link
Contributor

@zhuwenxi zhuwenxi commented Jul 7, 2021

Problem Statement

scatter_nd crashes on cuda backend, when input data shape is slightly larger than usual.

Code to reproduce

import tvm
import numpy as np
import tvm.relay as relay

dev = tvm.cuda()
target = tvm.target.Target("cuda")

# input data:
data_np = np.zeros((32, 128, 128, 256)).astype(np.float32)
indices_np = np.random.uniform(1,5,(32, 600, 3)).astype(np.int64)
updates_np = np.random.rand(32, 600, 256).astype(np.float32)

# Construct relay input nodes:
data = relay.var("data", shape=data_np.shape, dtype=str(data_np.dtype))
indices = relay.var("indices", shape=indices_np.shape, dtype=str(indices_np.dtype))
updates = relay.var("updates", shape=updates_np.shape, dtype=str(updates_np.dtype))

# Compute indices:
indices_dim = len(indices_np.shape)
axes = list(range(indices_dim))
indices_t = relay.transpose(indices, axes[-1:] + axes[:-1])

# Construct relay scatter_nd op:
out = relay.op.scatter_nd(data, indices_t, updates, "update")
func = relay.Function([data, indices, updates], out)

# Execute scatter_nd:
intrp = relay.create_executor("debug", device=dev, target=target)
op_res = intrp.evaluate(func)(data_np, indices_np, updates_np)

Error Message

image

Root Cause

WeChatWorkScreenshot_4675eb72-5bcf-4a4c-8751-1b8e5f5b7bb2
We can see the problem more clearly from the cuda code generated. The TIR implementation of scatter_nd would cause a int32 overflow when "i" is large, thus the if statement is always evaluate to true, and conducts a invalid memory access.

@zhuwenxi
Copy link
Contributor Author

zhuwenxi commented Jul 7, 2021

@tqchen @masahi @icemelon9 Could you help review this fix, or bridge someone who is willing to?

@tkonolige
Copy link
Contributor

I've implemented an alternative fix in #8419

@zhuwenxi
Copy link
Contributor Author

zhuwenxi commented Jul 8, 2021

I've implemented an alternative fix in #8419

@tkonolige Thank you for review this PR.

Just curious, is there any specific reason why you created another PR? I understand the alternative fix you proposed maybe better than the existing one, in someway probably. But why bother create a new separate PR? We can discuss and improve the code here, I think that's exactly what code review is about, right?

@tkonolige
Copy link
Contributor

tkonolige commented Jul 8, 2021

@zhuwenxi Sorry, I definitely shouldn't have opened a new PR. I just got a little hasty and did the fix myself. Feel free to copy the code from that PR into this one. I'll end up closing the other one.

@zhuwenxi
Copy link
Contributor Author

zhuwenxi commented Jul 9, 2021

@tkonolige That's OK, it happens.

@mbrookhart
Copy link
Contributor

You hit a flaky test, I found a fix here: #8431

wenxizhu added 2 commits July 12, 2021 11:26
1. Existing scatter_nd cuda implementation has a very large bound,
   which could overflow int32 range when input tensor shape is
   large enough;
2. The overflow could cause the if statement always evaluate to
   true, thus conducts invalid memory accesses;
3. We fix this problem in this commit by reducing the bound, the
   original large bound is not only unnecessary, but also degrading
   the performance; With this fix, scatter_op's performance improves
   100x on some cases.
@zhuwenxi zhuwenxi force-pushed the bugfix/wenxizhu/fix-scatter-integer-overflow branch from 842243b to cd30a24 Compare July 12, 2021 03:39
@zhuwenxi
Copy link
Contributor Author

@mbrookhart I've fixed the UT failure.

@mbrookhart mbrookhart merged commit d043cb9 into apache:main Jul 13, 2021
@mbrookhart
Copy link
Contributor

Thanks @zhuwenxi @tkonolige

ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
…apache#8415)

* Fix the integer overflow problem of the scatter_nd op.

* Fix scatter_nd's crash problem:
1. Existing scatter_nd cuda implementation has a very large bound,
   which could overflow int32 range when input tensor shape is
   large enough;
2. The overflow could cause the if statement always evaluate to
   true, thus conducts invalid memory accesses;
3. We fix this problem in this commit by reducing the bound, the
   original large bound is not only unnecessary, but also degrading
   the performance; With this fix, scatter_op's performance improves
   100x on some cases.

Co-authored-by: wenxizhu <wenxizhu@tencent.com>
zxy844288792 pushed a commit to zxy844288792/tvm that referenced this pull request Mar 4, 2022
…apache#8415)

* Fix the integer overflow problem of the scatter_nd op.

* Fix scatter_nd's crash problem:
1. Existing scatter_nd cuda implementation has a very large bound,
   which could overflow int32 range when input tensor shape is
   large enough;
2. The overflow could cause the if statement always evaluate to
   true, thus conducts invalid memory accesses;
3. We fix this problem in this commit by reducing the bound, the
   original large bound is not only unnecessary, but also degrading
   the performance; With this fix, scatter_op's performance improves
   100x on some cases.

Co-authored-by: wenxizhu <wenxizhu@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants