-
Notifications
You must be signed in to change notification settings - Fork 75
[release/2.4] Backport AOTriton 0.10b to support gfx950 and ROCM 7.0 #2318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/2.4] Backport AOTriton 0.10b to support gfx950 and ROCM 7.0 #2318
Conversation
|
Jenkins build for d4ca84094906731b10ccfdfacdaa5a8f319848cf commit finished as FAILURE |
d4ca840 to
f7fffcb
Compare
|
Jenkins build for 9ee293482d0827c87561b23bcdd41c310d4ccca6 commit finished as FAILURE |
caffe2/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removal is necessary to let non-root user build editable pytorch. Not necessary but neat to have.
|
Jenkins build for 5171177128902f894069e25542e81a473f46fce5 commit finished as FAILURE |
|
Jenkins build for 5171177128902f894069e25542e81a473f46fce5 commit finished as FAILURE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does USE_ROCM_ATTENTION get defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, legacy code from CK integration patch.
jithunnair-amd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xinyazhang The magnitude of changes to port 0.9.2b support to PyTorch 2.4 is substantial. I can see the benefit of not having to support more than one AOTriton version across multiple PyTorch versions in case more bc-breaking ROCm7.0 changes necessitate generation of a new tarball.
However, given the magnitude of changes, I'd request separating the non-AOTriton-related ROCm7.0-compatibility changes into a different PR.
|
Jenkins build for 478abd95c7f94a68043748e61c774716c284733f commit finished as FAILURE |
|
Jenkins build for 478abd95c7f94a68043748e61c774716c284733f commit finished as FAILURE |
478abd9 to
77c5ab6
Compare
Done. Build fix PR: #2325 |
|
Jenkins build for 77c5ab6c7f2d1e2d1a736e8d7f1150aa2ddf7505 commit finished as FAILURE |
|
Waiting for MI300 testing. |
|
Jenkins build for 77c5ab6c7f2d1e2d1a736e8d7f1150aa2ddf7505 commit is in progress |
…g AOTriton from source (pytorch#139432) Pull Request resolved: pytorch#139432 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily Co-authored-by: Vicky Tsang <vtsang@amd.com>
…te aotriton_version.txt (pytorch#137443) We do not need `install_aotriton.sh` and `aotriton_version.txt` any more since `aotriton.cmake` now installs the best binary release package as the default option when building pytorch. This should resolve the issue of needing a pre-installed aotriton package when building PyTorch for ROCm from source, which is not feasible if building PyTorch *outside* a CI docker image. With this change, a user can have a pre-installed AOTriton in their environment, if desired, and have the build pick it up by specifying the `AOTRITON_INSTALLED_PREFIX` env var, or have the build automatically detect and install the compatible version. As a third option, the user can also force AOTriton to build from source instead, using the `AOTRITON_INSTALL_FROM_SOURCE` env var. Also, with the changes in this PR, the cmake build process handles the tasks of copying aotriton .so and images directory from `torch/lib` to the installation path. Pull Request resolved: pytorch#137443 Approved by: https://github.com/jithunnair-amd, https://github.com/jeffdaily Co-authored-by: Jithun Nair <jithun.nair@amd.com>
We received reports AOTriton kernels mishandles the bias pointer and it causes NaN during fine-tuning llama3.2-11b vision model. This PR will fix the problem. Note: this AOTriton 0.8.1b adds head dimension 512 support and thus the binary size increases, but it is considered experimental and will not be enabled right now. Pull Request resolved: pytorch#145508 Approved by: https://github.com/jeffdaily
This is backporting the following commit: [ROCm] Bump AOTriton to 0.9.2b (pytorch#148433) Notable new features/optimizations for SDPA operators on AMD systems from AOTriton 0.9b: * Optimize these Non-power-of-two head dimensions: 48, 80, 96, 160, 192, 224. Inputs with these head dimensions do not need padding to power-of-two anymore. * `is_causal=True` cases are now supported with persistent dynamic algorithm, which requires an atomic tensor but does load balance between different CTAs * `dropout_p > 0.0` cases now support full 64-bit offsets and use all i64x4 PRNG outputs * The precise AOTriton shared library version can now be identified with `readelf -p .comment libaotriton_v2.so` + However, this does not guarantee the GPU images stored under `aotriton.images` have the same version, since they can be overwritten. * The newly added fused backward kernel will be used for smaller workloads, due to less kernel invocation overhead. * Support gfx1201 (RX 9070XT). Need to be enabled at runtime with `TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1` Pull Request resolved: pytorch#148433 Approved by: https://github.com/jeffdaily
#2105) Also fixes the URL problem, where release page does not always match the version string in file name.
Per request from SWDEV-540108
This reverts commit 2211aace36d46e98a0081e2ea91ef8c16818157c.
…pytorch#136627) This change fixes the RUNPATH of installed c++ tests so that the linker can find the shared libraries they depend on. For example, currently: ```bash venv/lib/python3.10/site-packages/torch $ ./bin/test_lazy ./bin/test_lazy: error while loading shared libraries: libtorch.so: cannot open shared object file: No such file or directory ``` Pull Request resolved: pytorch#136627 Approved by: https://github.com/malfet
Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#134436 Approved by: https://github.com/r-barnes
This reverts commit b7eaa03dc469a73e4fe10f93fa779180c96c763e.
…ibraries (pytorch#136627)" This reverts commit 84040be83ee4f8850e2384065415c1f8c8e997a5.
|
@pruthvistony tested on MI300X but saw multiple failures on 0.9.2b. (recall release/2.4 has more UTs than release/2.5) A subset of the observed errors: Trying 0.10b to see if there are failures. |
77c5ab6 to
714e850
Compare
|
Jenkins build for c3834b3e0166735c256bf998fee0979f8eee8ff4 commit finished as FAILURE |
|
Jenkins build for c3834b3e0166735c256bf998fee0979f8eee8ff4 commit finished as FAILURE |
…OCm#2318) This also enables gfx950 unit tests for ROCM >= 6.5. --------- Co-authored-by: Vicky Tsang <vtsang@amd.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com> Co-authored-by: Mwiza Kunda <mwizak@graphcore.ai> Co-authored-by: cyyever <cyyever@outlook.com>
This also enables gfx950 unit tests for ROCM >= 6.5.