Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay, TOPI] Add numpy style cumsum op #7334

Merged
merged 27 commits into from
Jan 26, 2021
Merged

Conversation

masahi
Copy link
Member

@masahi masahi commented Jan 25, 2021

This adds a numpy style cumsum op to Relay/TOPI. The spec is identical with numpy one except there is no promotion of int32 -> int64 output dtype, for the case when the input is a int32 tensor and output dtype is not provided.
https://numpy.org/doc/stable/reference/generated/numpy.cumsum.html

Both CPU and GPU are supported, and it is especially efficient if thrust is available. I updated TIR scan IR to support cumsum on any rank. But scan is still done only on the innermost axis, so transposing is required when the scan axis is not the innermost one.

please review @jwfromm @mbrookhart @zhiics @kevinthesun @junrushao1994 @antinucleon

@masahi masahi marked this pull request as ready for review January 25, 2021 06:09
python/tvm/relay/op/transform.py Show resolved Hide resolved
python/tvm/topi/cuda/scan.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/scan.py Show resolved Hide resolved
python/tvm/topi/cumsum.py Outdated Show resolved Hide resolved
src/relay/op/tensor/transform.cc Outdated Show resolved Hide resolved
src/runtime/contrib/thrust/thrust.cu Outdated Show resolved Hide resolved
tests/python/relay/test_op_level3.py Outdated Show resolved Hide resolved
@masahi masahi force-pushed the relay-cumsum branch 2 times, most recently from 0eadcc6 to 7612cfd Compare January 25, 2021 21:22
Copy link
Contributor

@mbrookhart mbrookhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nitpicks, but overall LGTM

python/tvm/topi/cuda/scan.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/scan.py Outdated Show resolved Hide resolved
python/tvm/topi/cumsum.py Outdated Show resolved Hide resolved
@masahi
Copy link
Member Author

masahi commented Jan 26, 2021

@tkonolige @mbrookhart Comments were addressed. CPU cumsum is now done in parallel over non-scan axes, and binop argument is now a function. Thanks for reviews.

Copy link
Contributor

@tkonolige tkonolige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I put a couple minor nits for documentation, but otherwise this is good to go.

python/tvm/relay/op/transform.py Outdated Show resolved Hide resolved
python/tvm/topi/cuda/scan.py Show resolved Hide resolved
@masahi masahi merged commit 1e0d356 into apache:main Jan 26, 2021
@masahi
Copy link
Member Author

masahi commented Jan 26, 2021

Thanks @tkonolige @mbrookhart

@ybai62868
Copy link

Hi @masahi ,
Thanks for your contribution for the "cumsum". I want to use this op in onnx model.
I meet the problem when I want to compile an onnx model with "CumSum" op to relay.
When I use "relay.frontend.from_onnx" and add "CumSum": AttrCvt("cumsum",{"axis": axis, "dtype": dtye})," in python/tvm/relay/frontend/onnx.py

But it still does not work.
Thanks for your attention!

@ybai62868
Copy link

typo -> "CumSum": AttrCvt("cumsum",{"axis": "axis", "dtype": "dtye"}),"

@masahi
Copy link
Member Author

masahi commented Jan 28, 2021

@ybai62868 please post this to the discuss forum or open an issue, clarifying what exactly you mean by "it still does not work"

@ybai62868
Copy link

@masahi Hi, what I mean "it still does not work" is that it will report this message " tvm.error.OpNotImplemented: The following operators are not supported for frontend ONNX: CumSum " although I add and change the AttrCvt("cumsum",{"axis": axis, "dtype": dtye}) and rename the CumSum to cumsum.

@tkonolige
Copy link
Contributor

@ybai62868 This PR adds cumsum support at the topi level. The work to add support to onnx has not been done yet.

alexwong pushed a commit to alexwong/tvm that referenced this pull request Feb 11, 2021
* Add cumsum relay/topi op

* relay tests working

* add torch frontend converter

* fix for importing detr

* fix bad merge

* begin cuda cumsum

* support non innermost axis

* support rank higher than 3

* making binop parameter

* fix overflow issue in thrust scan

* generic binop parameter working

* relay test working

* fixed for bool input

* remove pytorch change

* fix pylint

* doc update

* Update python/tvm/topi/cumsum.py

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* Update tests/python/relay/test_op_level3.py

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* add example outputs

* add supported input and output dtype in thrust log

* adding more loop var names

* fix cpplint

* fix missing check for the cuda target in nms thrust sort

* parallelize cpu cumsum

* making binop argument tir function

* update doc for binop

* doc update

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
electriclilies pushed a commit to electriclilies/tvm that referenced this pull request Feb 18, 2021
* Add cumsum relay/topi op

* relay tests working

* add torch frontend converter

* fix for importing detr

* fix bad merge

* begin cuda cumsum

* support non innermost axis

* support rank higher than 3

* making binop parameter

* fix overflow issue in thrust scan

* generic binop parameter working

* relay test working

* fixed for bool input

* remove pytorch change

* fix pylint

* doc update

* Update python/tvm/topi/cumsum.py

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* Update tests/python/relay/test_op_level3.py

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* add example outputs

* add supported input and output dtype in thrust log

* adding more loop var names

* fix cpplint

* fix missing check for the cuda target in nms thrust sort

* parallelize cpu cumsum

* making binop argument tir function

* update doc for binop

* doc update

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
Lokiiiiii pushed a commit to Lokiiiiii/tvm that referenced this pull request Mar 2, 2021
* Add cumsum relay/topi op

* relay tests working

* add torch frontend converter

* fix for importing detr

* fix bad merge

* begin cuda cumsum

* support non innermost axis

* support rank higher than 3

* making binop parameter

* fix overflow issue in thrust scan

* generic binop parameter working

* relay test working

* fixed for bool input

* remove pytorch change

* fix pylint

* doc update

* Update python/tvm/topi/cumsum.py

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* Update tests/python/relay/test_op_level3.py

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* add example outputs

* add supported input and output dtype in thrust log

* adding more loop var names

* fix cpplint

* fix missing check for the cuda target in nms thrust sort

* parallelize cpu cumsum

* making binop argument tir function

* update doc for binop

* doc update

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Mar 2, 2021
* Add cumsum relay/topi op

* relay tests working

* add torch frontend converter

* fix for importing detr

* fix bad merge

* begin cuda cumsum

* support non innermost axis

* support rank higher than 3

* making binop parameter

* fix overflow issue in thrust scan

* generic binop parameter working

* relay test working

* fixed for bool input

* remove pytorch change

* fix pylint

* doc update

* Update python/tvm/topi/cumsum.py

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* Update tests/python/relay/test_op_level3.py

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>

* add example outputs

* add supported input and output dtype in thrust log

* adding more loop var names

* fix cpplint

* fix missing check for the cuda target in nms thrust sort

* parallelize cpu cumsum

* making binop argument tir function

* update doc for binop

* doc update

Co-authored-by: Tristan Konolige <tristan.konolige@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants