[TOPI] Rewrite GPU argwhere using exclusive scan #7314

masahi · 2021-01-20T10:32:43Z

This PR improves the implementation of GPU argwhere added in #6868, using exclusive scan (see #7303).

The current implementation of argwhere is very inefficient, because it uses atomic to update the write location. Since all threads compete for the single location, this effectively makes it a sequential kernel. Moreover, since the output indices need to be lexicographically sorted, the current implementation involves sorting along each axis.

Since argwhere is literally an instance of stream compaction, this is a perfect application of exclusive scan. Now, argwhere simply consists of

A single call to exclusive scan on a boolean flag array to compute the write indices.
Compaction using the write indices (just copying elements with nonzero condition).

both of which are highly parallel operation. Thus, both atomic and sort are gone, vastly simplifying the implementation. Moreover, it also brings huge speed up, as shown below.

All numbers in milli sec

Shape	current main	using thrust exclusive scan	using TIR exclusive scan
(128, 65)	0.240	0.024	0.132
(200, 500)	0.571	0.029	0.184
(1000, 50)	0.270	0.027	0.154
(100000,)	0.205	0.027	0.182
(500000,)	0.947	0.090	0.418
(1000000,)	1.63	0.170	0.848
(1000, 1000)	3.17	0.185	0.863
(1000, 5000)	15.69	0.976	3.689
(100, 100, 1000)	48.70	2.087	7.423
(256, 128, 64, 32)	397.21	15.58	51.438

please review @zhiics @Laurawly @mbrookhart @tkonolige @anijain2305 @trevor-m

mbrookhart · 2021-01-20T22:42:40Z

Could we add a column for the performance of the PR without thrust (i.e., TIR exclusive scan?)

mbrookhart

I'd like to include benchmarks without thrust in the PR for posterity, but otherwise this looks great, thanks! I'd wait to merge until @zhiics can review, since he wrote the existing kernel.

masahi · 2021-01-20T23:00:21Z

Ok updated the numbers to include TIR scan result.

mbrookhart · 2021-01-20T23:02:22Z

👍 Not as fast as thrust, as expected, but it's good to see it's still a performance improvement.

zhiics

Thanks for the improvement.

masahi · 2021-01-21T06:05:57Z

Thanks @mbrookhart @zhiics

* use ex scan to write argwhere * add doc

mbrookhart approved these changes Jan 20, 2021

View reviewed changes

zhiics approved these changes Jan 20, 2021

View reviewed changes

ZihengJiang added the status: review in progress label Jan 20, 2021

masahi added 2 commits January 21, 2021 10:10

use ex scan to write argwhere

4179bf1

add doc

63469a6

masahi force-pushed the argwhere-ex-scan branch from 85a91e9 to 63469a6 Compare January 21, 2021 01:10

masahi merged commit f829403 into apache:main Jan 21, 2021

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 11, 2021

[TOPI] Rewrite GPU argwhere using exclusive scan (apache#7314)

1133314

* use ex scan to write argwhere * add doc

electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021

[TOPI] Rewrite GPU argwhere using exclusive scan (apache#7314)

0558257

* use ex scan to write argwhere * add doc

Lokiiiiii pushed a commit to Lokiiiiii/tvm that referenced this pull request Mar 2, 2021

[TOPI] Rewrite GPU argwhere using exclusive scan (apache#7314)

0dfa99f

* use ex scan to write argwhere * add doc

trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2021

[TOPI] Rewrite GPU argwhere using exclusive scan (apache#7314)

160036f

* use ex scan to write argwhere * add doc

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI] Rewrite GPU argwhere using exclusive scan #7314

[TOPI] Rewrite GPU argwhere using exclusive scan #7314

masahi commented Jan 20, 2021 •

edited

mbrookhart commented Jan 20, 2021

mbrookhart left a comment

masahi commented Jan 20, 2021

mbrookhart commented Jan 20, 2021

zhiics left a comment

masahi commented Jan 21, 2021

[TOPI] Rewrite GPU argwhere using exclusive scan #7314

[TOPI] Rewrite GPU argwhere using exclusive scan #7314

Conversation

masahi commented Jan 20, 2021 • edited

mbrookhart commented Jan 20, 2021

mbrookhart left a comment

Choose a reason for hiding this comment

masahi commented Jan 20, 2021

mbrookhart commented Jan 20, 2021

zhiics left a comment

Choose a reason for hiding this comment

masahi commented Jan 21, 2021

masahi commented Jan 20, 2021 •

edited