New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make wrapIndexOnce check async, avoid DtoH sync on index_put_ #125952
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125952
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New FailuresAs of commit bb0289b with merge base 96a5698 (): NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 20ad93d07612a3326ecaebaa3a7b2a5b45c15f54 Pull Request resolved: #125952
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already do that within canDispatchToMaskedFill:
static std::tuple<bool, Tensor> canDispatchToMaskedFill(const Tensor& self, const torch::List<c10::optional<at::Tensor>>& indices,
const Tensor& value){
if (!(value.numel() ==1 && value.device().is_cpu())){
OK, that's useful, because the internal user tested and they also said this did not fix it. |
I'm out of time right now, but here is the repro
|
The previous fix was not general enough. Fixes #125952 [ghstack-poisoned]
I put up a fix, but I was not able to test whether it works (my triton version is acting up with the repro). Mind checking if it fixes the issue? |
OK, I got a better stacktrace here:
|
It's this one:
and the reason it doesn't sync in eager is because, dun dun dun, eager used some special API to avoid the check range. |
Signed-off-by: Edward Z. Yang <ezyang@meta.com> ghstack-source-id: 0fe1f51c815d8d135172d7f6e15bdb22e5339d48 Pull Request resolved: #125952
Updated with fix that actually is confirmed to work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good if it fixes the problem. Could you add a regression test?
at::_assert_async(index.max() < dim_size); | ||
at::_assert_async(index.min() >= -dim_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know we had these working in the end. How does compile interprete these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the CUDA kernel, so compile doesn't really interact with this in a nontrivial way, you just end up hitting this at runtime.
I don't really see how to add a regression test. To check if we're doing a DtoH sync, we need some way of detecting such a sync has happened, but there's no facility for programatically determining this. |
@pytorchbot merge -i |
I figured a way would be to try to cudagraph the relevant code and see that it was able to do so |
Merge startedYour change will be merged while ignoring the following 3 checks: Lint / Test collect_env (with_torch), Lint / Test collect_env (without_torch), Lint / Test collect_env (older_python_version) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…h#125952) Internal xref: https://fb.workplace.com/groups/1075192433118967/posts/1427156211255919/ Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: pytorch#125952 Approved by: https://github.com/lezcano
Stack from ghstack (oldest at bottom):
Internal xref: https://fb.workplace.com/groups/1075192433118967/posts/1427156211255919/
Signed-off-by: Edward Z. Yang ezyang@meta.com