-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary concatenation using the zipping approach. #1768
Comments
wujingyue
added a commit
that referenced
this issue
Feb 15, 2024
…ats." For #1768. [ghstack-poisoned]
wujingyue
added a commit
that referenced
this issue
Feb 15, 2024
…els splits and cats." For #1768. [ghstack-poisoned]
wujingyue
added a commit
that referenced
this issue
Feb 15, 2024
…ats." For #1768. [ghstack-poisoned]
wujingyue
added a commit
that referenced
this issue
Feb 15, 2024
…els splits and cats." For #1768. [ghstack-poisoned]
wujingyue
added a commit
that referenced
this issue
Feb 15, 2024
…ats." For #1768. [ghstack-poisoned]
wujingyue
added a commit
that referenced
this issue
Feb 15, 2024
…els splits and cats." For #1768. [ghstack-poisoned]
wujingyue
added a commit
that referenced
this issue
Feb 15, 2024
…ats." For #1768. [ghstack-poisoned]
wujingyue
added a commit
that referenced
this issue
Feb 15, 2024
…els splits and cats." For #1768. [ghstack-poisoned]
wujingyue
added a commit
that referenced
this issue
Feb 15, 2024
…ats." For #1768. [ghstack-poisoned]
wujingyue
added a commit
that referenced
this issue
Feb 16, 2024
wujingyue
added a commit
that referenced
this issue
Feb 16, 2024
For #1768. `ghstack land https://github.com/NVIDIA/Fuser/pull/1771` failed for reasons that I don't understand. I'm trying to land it again without `ghstack`. See #1771 for review comments.
This was referenced Feb 17, 2024
wujingyue
added a commit
that referenced
this issue
Feb 20, 2024
With this PR, MoveSplitCatPass can cancel the <split,cat> pair with `permute`s in between and horizontally merge those `permute`s. See code comments for details. For #1768.
wujingyue
added a commit
that referenced
this issue
Feb 26, 2024
wujingyue
added a commit
that referenced
this issue
Feb 26, 2024
This was referenced Feb 26, 2024
wujingyue
added a commit
that referenced
this issue
Feb 27, 2024
wujingyue
added a commit
that referenced
this issue
Feb 28, 2024
wujingyue
added a commit
that referenced
this issue
Mar 1, 2024
wujingyue
added a commit
that referenced
this issue
Mar 1, 2024
This makes it convenient to use an IdModel as a class member without having to pass it through many functions. I examined the NVFUSER_TRACE. FusionKernelRuntime::FusionKernelRuntime is bottlenecked by "Finding valid fusion segment solutions" not pre-segmenter passes. I added the FUSER_PERF_SCOPE for pre-segmenter passes anyway. For #1768
This optimization has been implemented but not turned on by default. The part in nvFuser is turned on unconditionally. The part in Thunder is behind a flag. However, even with this flag on, this optimization won't kick in because Thunder gives cat to a special executor by default. |
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A spin-off from #1502 (comment). Created for tracking progress.
Problem
Below is a common pattern in nanoGPT's backprop.
Because nvFuser doesn't take sdpa_backward and therefore sees three unconnected input tensors (dQ, dK, and dV), it has to materialize
dQKV_view
anddQKV_permute
.Solution
TL;DR: change Thunder's cudnnex to feed nvFuser a concatenated tensor that contains dQ, dK and dV, so nvFuser realizes that the existing cat is unnecessary and removes it.
spda_backward
kernel (which outputs one dQKV tensor) followed by a split.slice
s and thecat
and merge allview
andpermute
between them, so the above will become:dQKV_view
anddQKV_permute
will become aliases ofdQKV
. The fusion will boil down to a ReduceSum kernel that sums[B,S,H,D*3]
to[H*D*3]
.The text was updated successfully, but these errors were encountered: