Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay, TOPI] Make Softmax op fusible with elemwise ops #8909

Merged
merged 15 commits into from
Sep 6, 2021

Conversation

masahi
Copy link
Member

@masahi masahi commented Sep 2, 2021

Currently, op fusion is not enabled for softmax op. This has been fine for imagenet models where softmax is only used at the end. But transformer models use a lot of softmax in the middle.

When the FP16 conversion is applied, softmax is always left fp32, so there are always a lot of cast(softmax_output, dtype="float16") after the conversion. Since softmax and cast cannot be fused, we end up a lot of inefficient "cast only" kernels: https://gist.github.com/masahi/0d7d96ae88722b616a906cec2054559e#file-transformer-txt-L37-L43

This PR changes softmax op's fuse pattern, so that it can be fused with following elemwise / injective ops, just like conv2d etc. Topi schedules have also been updated to take fused ops into account.

cc @yzhliu @comaniac @AndrewZhaoLuo @mbrookhart

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

python/tvm/topi/cuda/softmax.py Show resolved Hide resolved
@masahi masahi force-pushed the softmax-fuse branch 4 times, most recently from ecb05db to c4dd35b Compare September 3, 2021 08:35
@masahi
Copy link
Member Author

masahi commented Sep 3, 2021

Completely blocked by flaky micro tvm tests, the QEMU job failed five times in a row.
My change did break one of microtvm tests...

@masahi masahi force-pushed the softmax-fuse branch 2 times, most recently from 6afd482 to 89c947b Compare September 3, 2021 10:53
@masahi masahi merged commit 7eda4a5 into apache:main Sep 6, 2021
ylc pushed a commit to ylc/tvm that referenced this pull request Sep 29, 2021
* Change softmax op pattern to OUT_ELEMWISE_FUSABLE

* Softmax is fused but x86 schedule is suboptimal

* fusion properly done

* Updating GPU schedule for fusion

* update softmax warp shuffle schedule

* fix compute_at

* Bug fix in lower_thread_all_reduce when reduction storage is reused by storage_rewrite

* Temp disable softmax warp reduction schedule when softmax is fused

* Revert "Bug fix in lower_thread_all_reduce when reduction storage is reused by storage_rewrite"

This reverts commit 8aa340e.

* lint fix

* try make diff smaller

* fix tests

* fixed another broken test

* Fix flaky uTVM templating test

* fix equality check on output op

Co-authored-by: masa <masa@pop-os.localdomain>
Co-authored-by: Gavin Uberti <gavin.uberti@gmail.com>
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* Change softmax op pattern to OUT_ELEMWISE_FUSABLE

* Softmax is fused but x86 schedule is suboptimal

* fusion properly done

* Updating GPU schedule for fusion

* update softmax warp shuffle schedule

* fix compute_at

* Bug fix in lower_thread_all_reduce when reduction storage is reused by storage_rewrite

* Temp disable softmax warp reduction schedule when softmax is fused

* Revert "Bug fix in lower_thread_all_reduce when reduction storage is reused by storage_rewrite"

This reverts commit 8aa340e.

* lint fix

* try make diff smaller

* fix tests

* fixed another broken test

* Fix flaky uTVM templating test

* fix equality check on output op

Co-authored-by: masa <masa@pop-os.localdomain>
Co-authored-by: Gavin Uberti <gavin.uberti@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants