Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
use legacy unrolled kernel for non-trivial offset calc cases (#71710)
Summary: This leads to across the board improvements on Pascals, big perf improvements for some broadcasting patterns and datatypes on V100 (along with some 3-5% regressions for some other patterns). The most common improving pattern on V100 is half-precision x+bias, that improves by ~5%. Full V100 results in https://docs.google.com/spreadsheets/d/1K67x-6_TPT9Yt6533NfECEhUyfbqBxLH9M5Z3gymzXE/edit#gid=1218963246, benchmarking script in https://gist.github.com/ngimel/986ee84a1dd234a0485e99544e0fc8b6 Most importantly, it reduces context size by 40 MB. Pull Request resolved: pytorch/pytorch#71710 Reviewed By: mruberry Differential Revision: D33769330 Pulled By: ngimel fbshipit-source-id: 5a7942261e06003ca79bfa3b071106aab1a8a4bc (cherry picked from commit f9b51b4)
- Loading branch information
Showing
1 changed file
with
85 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters