Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi #7018

Merged
merged 7 commits into from Dec 4, 2020

Conversation

kevinthesun
Copy link
Contributor

This fix also works for gpu argwhere.

@zhiics @anijain2305 @mbrookhart @Laurawly

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) This is awesome, I didn't think to do indexdiv(end[i] - begin[i], strides[i]), how did you find the issue?

A few nitpicks below for code readability.

Can you also enable the test here?

# TODO(mbrookhart): Enable when we can get it working
# @tvm.testing.uses_gpu
def test_dynamic_topk():

include/tvm/topi/transform.h Outdated Show resolved Hide resolved
@mbrookhart
Copy link
Contributor

Out of curiosity, why no nvptx?

Copy link
Member

@icemelon icemelon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Need to fix the CI

@kevinthesun
Copy link
Contributor Author

@mbrookhart Generally we need thrust for this dynamic sorting ops. nvptx will have issue to compile them.
@icemelon9 We need to enable thrust for ci gpu. #7024

@mbrookhart
Copy link
Contributor

I don't love making thrust a necessary component unless we automatically enable it when we turn on cuda? If we don't support the tir-based sort, should we remove it from the codebase?

@kevinthesun
Copy link
Contributor Author

kevinthesun commented Dec 3, 2020

I think we can raise an exception when compiling dynamic topk but Thrust is not enabled. Building with Thrust usually needs extra effort since it requires cmake >=3.13. User can enable it when necessary. For tvm cuda sorting, I'm not sure whether it covers some cases which Thrust doesn't. Maybe we can keep it a while.

Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can get around the unit test error without forcing users to enable thrust? Just requesting changes while we chat about it, will reapprove once we decide.

Comment on lines -564 to +568
if k > 0:
if not isinstance(k, int) or k > 0:
beg = [0] * ndim
end = data.shape[:-1] + [k]
out = [strided_slice(o, beg, end) for o in out]
end = data.shape[:-1] + [k if isinstance(k, int) else tvm.te.size_var("dim")]
strides = [1] * ndim
out = [strided_slice(o, beg, end, strides) for o in out]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinthesun, why don't we just repeat this change in the tir topk above? that would fix the unit test, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified cuda topk so that topk in dyn can pass. However, topk in test any in which data has dynamic shape can't pass without Thrust. I disable that test for now.

@zhiics
Copy link
Member

zhiics commented Dec 3, 2020

I think without thrust, we then have to fix sort. We can probably disable the test for now and come back to work on sorting and then enable the test. This would at least unblock downstream users to run models through thrust. @mbrookhart @icemelon9 @kevinthesun how do you think?

@mbrookhart
Copy link
Contributor

I'm not really sure what's wrong with the tir sort, do we have a regression test/issue we could track?

@kevinthesun
Copy link
Contributor Author

AFAIK cuda sort has several issues:

  1. Performance is bad for large workloads.
  2. Can't handle dynamic data shape well.
  3. Can generate flaky result.

There is no clear path to a solution to these problems. For now the best way is to let user turn on Thrust, when they want to compile sort related op on nvidia gpu.

@mbrookhart
Copy link
Contributor

Yeah, the perf of the kernel isn't great, and I see some thread definition issues that will cause issues with dynamic shapes. Do we have a flaky test we can include? I don't think it's important for this PR, but it might be interesting to tackle later.

@zhiics
Copy link
Member

zhiics commented Dec 3, 2020

@mbrookhart yeah, argwhere is flaky on large inputs if sort is used

@mbrookhart
Copy link
Contributor

:/ OddEvenTransportSort should be stable, but something looks very wrong about the threading in this kernel. I'll see if I can edit to to solve these problems at some point in the near-ish future. If somehow this sort isn't stable, that would easily explain flakiness in argwhere/argsort.

@zhiics zhiics merged commit f4c6517 into apache:main Dec 4, 2020
@zhiics
Copy link
Member

zhiics commented Dec 4, 2020

Thanks @kevinthesun @mbrookhart @icemelon9

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Dec 4, 2020
apache#7018)

* Fix GPU dynamic Topk

* Fix style

* Minor fix

* Simplfy dynamic checking

* Fix lint

* More improvements

* Disable test any topk
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Dec 4, 2020
apache#7018)

* Fix GPU dynamic Topk

* Fix style

* Minor fix

* Simplfy dynamic checking

* Fix lint

* More improvements

* Disable test any topk
electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021
apache#7018)

* Fix GPU dynamic Topk

* Fix style

* Minor fix

* Simplfy dynamic checking

* Fix lint

* More improvements

* Disable test any topk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants