Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOPI] Update softmax compute and CPU schedule #3680

Merged
merged 7 commits into from Aug 5, 2019

Conversation

@soiferj
Copy link
Contributor

commented Jul 31, 2019

This change improves performance for softmax by simplifying the computation and writing a schedule that supports better parallelization.

Compute: Currently, exp(input - max) is computed twice: once in the _compute_expsum stage and once in the _normalize stage. This change adds an extra stage to compute this tensor once. It is then re-used in the _compute_expsum and _normalize stages.

Schedule: Currently, the schedule only parallelizes the _normalize stage of the computation. This change puts all stages of computation under a common root and parallelizes the outer dimensions.

The following results are with a tensor of shape (1,12,128,128) and axis=-1. This simulates the softmax in BERT base. The CPU is Intel Xeon E5-2650, and the Relay target string is llvm -mcpu=core-avx2.

TVM_NUM_THREADS Latency in ms (master branch) Latency in ms (new branch)
1 4.7 3.0
2 3.8 1.8
4 3.3 1.0
8 3.1 0.74
16 3.2 0.55
@soiferj

This comment has been minimized.

Copy link
Contributor Author

commented Jul 31, 2019

@kevinthesun @vinx13 can you please review and add any other reviewers you think are necessary?

I am currently modifying log_softmax, and it seems worthwhile to create a new generic schedule for it, since the inputs to tvm.compute are now different for softmax and log_softmax. What do you think?

jonso4 added 3 commits Jul 31, 2019
@tqchen

This comment has been minimized.

Copy link
Member

commented Aug 1, 2019

Thank you @soiferj , can you check the CI problem?

@soiferj

This comment has been minimized.

Copy link
Contributor Author

commented Aug 1, 2019

Yeah, I'm taking a look at the CI failure now. It seems to be an issue in the CUDA schedule. I will work on it.

@soiferj

This comment has been minimized.

Copy link
Contributor Author

commented Aug 2, 2019

The CI issue is fixed.

Copy link
Contributor

left a comment

lgtm

@tqchen

This comment has been minimized.

Copy link
Member

commented Aug 3, 2019

@kevinthesun feel free to merge the PR given you are managing it

@kevinthesun kevinthesun merged commit ee74d00 into dmlc:master Aug 5, 2019
5 checks passed
5 checks passed
continuous-integration/jenkins/pr-merge This commit looks good
Details
windows_mac_build Build #20190802.1 succeeded
Details
windows_mac_build (MacOS_XCode9) MacOS_XCode9 succeeded
Details
windows_mac_build (Windows_VS2017_x64) Windows_VS2017_x64 succeeded
Details
windows_mac_build (Windows_VS2017_x86) Windows_VS2017_x86 succeeded
Details
@kevinthesun

This comment has been minimized.

Copy link
Contributor

commented Aug 5, 2019

Thank you for contributing!

@soiferj soiferj deleted the soiferj:soiferj/softmaxupdate branch Aug 5, 2019
wweic added a commit to wweic/tvm that referenced this pull request Aug 9, 2019
* Update Softmax compute and CPU schedule

* Add C++ compute

* Fix schedule

* Update CUDA and OpenGL schedules

* Fix log_softmax

* Fix hls and opengl schedules

* Fix CUDA schedule
wweic added a commit to neo-ai/tvm that referenced this pull request Sep 6, 2019
* Update Softmax compute and CPU schedule

* Add C++ compute

* Fix schedule

* Update CUDA and OpenGL schedules

* Fix log_softmax

* Fix hls and opengl schedules

* Fix CUDA schedule
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.