[Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi #7018

kevinthesun · 2020-12-03T00:06:55Z

This fix also works for gpu argwhere.

@zhiics @anijain2305 @mbrookhart @Laurawly

mbrookhart

:) This is awesome, I didn't think to do indexdiv(end[i] - begin[i], strides[i]), how did you find the issue?

A few nitpicks below for code readability.

Can you also enable the test here?

tvm/tests/python/relay/dyn/test_dynamic_op_level6.py

Lines 25 to 27 in e212f96

    
           # TODO(mbrookhart): Enable when we can get it working 
        
           # @tvm.testing.uses_gpu 
        
           def test_dynamic_topk():

include/tvm/topi/transform.h

mbrookhart · 2020-12-03T03:50:17Z

Out of curiosity, why no nvptx?

icemelon

LGTM.
Need to fix the CI

kevinthesun · 2020-12-03T19:23:50Z

@mbrookhart Generally we need thrust for this dynamic sorting ops. nvptx will have issue to compile them.
@icemelon9 We need to enable thrust for ci gpu. #7024

mbrookhart · 2020-12-03T19:29:50Z

I don't love making thrust a necessary component unless we automatically enable it when we turn on cuda? If we don't support the tir-based sort, should we remove it from the codebase?

kevinthesun · 2020-12-03T20:27:01Z

I think we can raise an exception when compiling dynamic topk but Thrust is not enabled. Building with Thrust usually needs extra effort since it requires cmake >=3.13. User can enable it when necessary. For tvm cuda sorting, I'm not sure whether it covers some cases which Thrust doesn't. Maybe we can keep it a while.

mbrookhart

I think we can get around the unit test error without forcing users to enable thrust? Just requesting changes while we chat about it, will reapprove once we decide.

mbrookhart · 2020-12-03T20:51:06Z

python/tvm/topi/cuda/sort.py

-    if k > 0:
+    if not isinstance(k, int) or k > 0:
        beg = [0] * ndim
-        end = data.shape[:-1] + [k]
-        out = [strided_slice(o, beg, end) for o in out]
+        end = data.shape[:-1] + [k if isinstance(k, int) else tvm.te.size_var("dim")]
+        strides = [1] * ndim
+        out = [strided_slice(o, beg, end, strides) for o in out]


@kevinthesun, why don't we just repeat this change in the tir topk above? that would fix the unit test, I think.

I modified cuda topk so that topk in dyn can pass. However, topk in test any in which data has dynamic shape can't pass without Thrust. I disable that test for now.

zhiics · 2020-12-03T21:23:17Z

I think without thrust, we then have to fix sort. We can probably disable the test for now and come back to work on sorting and then enable the test. This would at least unblock downstream users to run models through thrust. @mbrookhart @icemelon9 @kevinthesun how do you think?

mbrookhart · 2020-12-03T21:24:38Z

I'm not really sure what's wrong with the tir sort, do we have a regression test/issue we could track?

kevinthesun · 2020-12-03T21:53:11Z

AFAIK cuda sort has several issues:

Performance is bad for large workloads.
Can't handle dynamic data shape well.
Can generate flaky result.

There is no clear path to a solution to these problems. For now the best way is to let user turn on Thrust, when they want to compile sort related op on nvidia gpu.

mbrookhart · 2020-12-03T22:05:10Z

Yeah, the perf of the kernel isn't great, and I see some thread definition issues that will cause issues with dynamic shapes. Do we have a flaky test we can include? I don't think it's important for this PR, but it might be interesting to tackle later.

zhiics · 2020-12-03T22:12:18Z

@mbrookhart yeah, argwhere is flaky on large inputs if sort is used

mbrookhart · 2020-12-03T22:26:55Z

:/ OddEvenTransportSort should be stable, but something looks very wrong about the threading in this kernel. I'll see if I can edit to to solve these problems at some point in the near-ish future. If somehow this sort isn't stable, that would easily explain flakiness in argwhere/argsort.

zhiics · 2020-12-04T04:51:19Z

Thanks @kevinthesun @mbrookhart @icemelon9

apache#7018) * Fix GPU dynamic Topk * Fix style * Minor fix * Simplfy dynamic checking * Fix lint * More improvements * Disable test any topk

kevinthesun added 2 commits December 2, 2020 23:59

Fix GPU dynamic Topk

b8add61

Fix style

4a5f4a8

kevinthesun mentioned this pull request Dec 3, 2020

[Relay][Topi]Add Sort Op to Relay #6978

Merged

mbrookhart requested changes Dec 3, 2020

View reviewed changes

include/tvm/topi/transform.h Outdated Show resolved Hide resolved

mbrookhart reviewed Dec 3, 2020

View reviewed changes

include/tvm/topi/transform.h Outdated Show resolved Hide resolved

kevinthesun added 4 commits December 3, 2020 00:23

Minor fix

7855e49

Simplfy dynamic checking

d30b7a6

Fix lint

ce76572

More improvements

96cdac0

zhiics approved these changes Dec 3, 2020

View reviewed changes

mbrookhart approved these changes Dec 3, 2020

View reviewed changes

icemelon approved these changes Dec 3, 2020

View reviewed changes

mbrookhart requested changes Dec 3, 2020

View reviewed changes

Disable test any topk

164a664

mbrookhart approved these changes Dec 3, 2020

View reviewed changes

zhiics merged commit f4c6517 into apache:main Dec 4, 2020

comaniac mentioned this pull request Jan 8, 2021

[TOPI] Treat undefined elements as constants in Array #7232

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi #7018

[Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi #7018

kevinthesun commented Dec 3, 2020

mbrookhart left a comment

mbrookhart commented Dec 3, 2020

icemelon left a comment

kevinthesun commented Dec 3, 2020

mbrookhart commented Dec 3, 2020

kevinthesun commented Dec 3, 2020 •

edited

mbrookhart left a comment

mbrookhart Dec 3, 2020

kevinthesun Dec 3, 2020

zhiics commented Dec 3, 2020

mbrookhart commented Dec 3, 2020

kevinthesun commented Dec 3, 2020

mbrookhart commented Dec 3, 2020

zhiics commented Dec 3, 2020

mbrookhart commented Dec 3, 2020

zhiics commented Dec 4, 2020

	# TODO(mbrookhart): Enable when we can get it working
	# @tvm.testing.uses_gpu
	def test_dynamic_topk():

[Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi #7018

[Topi] Fix GPU Dynamic Topk by Improving Dynamic Strided Slice in Topi #7018

Conversation

kevinthesun commented Dec 3, 2020

mbrookhart left a comment

Choose a reason for hiding this comment

mbrookhart commented Dec 3, 2020

icemelon left a comment

Choose a reason for hiding this comment

kevinthesun commented Dec 3, 2020

mbrookhart commented Dec 3, 2020

kevinthesun commented Dec 3, 2020 • edited

mbrookhart left a comment

Choose a reason for hiding this comment

mbrookhart Dec 3, 2020

Choose a reason for hiding this comment

kevinthesun Dec 3, 2020

Choose a reason for hiding this comment

zhiics commented Dec 3, 2020

mbrookhart commented Dec 3, 2020

kevinthesun commented Dec 3, 2020

mbrookhart commented Dec 3, 2020

zhiics commented Dec 3, 2020

mbrookhart commented Dec 3, 2020

zhiics commented Dec 4, 2020

kevinthesun commented Dec 3, 2020 •

edited