[Inductor] Masked tl.load
operations should explicitly include other
if the masked out values are expected to be used
#126535
Labels
module: inductor
oncall: pt2
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃悰 Describe the bug
The expected semantics of a Triton
tl.load
call with amask
supplied but noother
parameter is to leave the masked out values (values where the mask is false) undefined. However, the CUDA backend in Triton explicitly zero-initializes masked out values as part of the predicated load instruction generated by the compiler. It appears that this behavior is being relied upon in Inductor (e.g. #126173) which results in undefined behavior from Inductor generated kernels on other Triton backends. In particular, the Intel backend used an undefined llvm value instead of 0-initialization when a masked load occurs w/outother
.The suggested solution is to add
other
where all masked loads are generated inInductor
, but looking at the code I noticed commentary around issues with masked loads whenother
is present (e.g.pytorch/torch/_inductor/codegen/triton.py
Line 1893 in 55033ab
other
back to these areas. This issue is opened to investigate those.In the meantime, we have decided to follow the CUDA backend implementation and explicitly zero initialize in the Intel XPU backend. It is possible this contains some performance benefit, in addition to fixing the bug above (though the performance benefit may just be a side effect of not trying to compare undefined values, I have not investigated). But, our reasoning is CUDA is the reference backend for Triton, and regardless of what the language spec says users are going to expect us to follow the CUDA backend semantics, particularly users who started with CUDA (like Inductor and PyTorch). It appears that the AMD Triton backend also zero-initializes, but I don't have hardware to test that.
I am happy to investigate adding
other
to the masked load operations (I found 3 or 4 of them doing a quick survey last night), but wanted to open this issue first for discussion.Versions
main
cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire
The text was updated successfully, but these errors were encountered: