-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use fallback libraries for archs without optimized logic #1862
Conversation
Fixes ROCm#1757. Enables architectures that don't have optimized logic files to also produce libraries when `--separate-architectures` or `--lazy-library-loading` is turned on. Previously, one must disable both of these two flags in order for rocBLAS to run on architectures like `gfx1010`. Test plan: ``` cmake -GNinja -B build -S . \ -DCMAKE_C_COMPILER=hipcc \ -DCMAKE_CXX_COMPILER=hipcc \ -DBUILD_CLIENTS_TESTS=OFF \ -DBUILD_CLIENTS_BENCHMARKS=OFF \ -DBUILD_CLIENTS_SAMPLES=OFF \ -DBUILD_TESTING=OFF \ -DBUILD_WITH_TENSILE=ON \ -DTensile_PRINT_DEBUG=ON \ -DTensile_LIBRARY_FORMAT=msgpack \ -DTensile_CPU_THREADS=14 \ -DTensile_LAZY_LIBRARY_LOADING=ON \ -DAMDGPU_TARGETS="..." ``` With `AMDGPU_TARGETS` being one of the following - `AMDGPU_TARGETS=gfx1010` - `AMDGPU_TARGETS=gfx1030;gfx1010` - `AMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102` In all three cases, `$ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat` is produced and all other `*.dat` files remain unchanged. Signed-off-by: Gavin Zhao <git@gzgz.dev>
a8a8335
to
9fa257d
Compare
@AlexBrownAMD @nakajee Hi, can someone take a look? I think this is important issue which can help alot of AMD user to working on ML and AI |
@yoichiyoshida @babakpst @bragadeesh can someone merge this PR? |
@nakajee @AlexBrownAMD someone take a look and merge please 😢 |
We are aware of this PR. |
@babakpst @yoichiyoshida @AlexBrownAMD @nakajee someone please help me to merge this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
External PR review summary:
Reviewed PR with Braga. This change adds fallback libs for alternative archs. Change is small and does not introduce any new IP or dependencies. Should not affect existing libs / size, build needs to request alternative archs.
@AlexBrownAMD thank you so much <3 |
hello guys, im trying to use my rx5700 with llamacpp, but im getting the error of
thats how i found this repo, I tried to installed like the follow
got the end message of But when i try to run again llamacpp get the same error of missing tensile library |
@userbox020 This PR introduced a regression so the changes have been reverted. I've been investigating this issue. Please wait until #1757 is closed. |
@GZGavinZhao thanks bro going to follow the work, do you use discord or something where we can chat in a non too formal way. I woudl like to help in anything i can |
Fixes #1757.
Enables architectures that don't have optimized logic files to also produce libraries when
--separate-architectures
or--lazy-library-loading
is turned on. Previously, one must disable both of these two flags in order for rocBLAS to run on architectures likegfx1010
.Test plan:
With
AMDGPU_TARGETS
being one of the followingAMDGPU_TARGETS=gfx1010
AMDGPU_TARGETS=gfx1030;gfx1010
AMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102
In all three cases,
$ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat
is produced and all other*.dat
files remain unchanged.