Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fallback libraries for archs without optimized logic #1862

Merged
merged 1 commit into from
Jan 24, 2024

Conversation

GZGavinZhao
Copy link
Contributor

@GZGavinZhao GZGavinZhao commented Jan 11, 2024

Fixes #1757.

Enables architectures that don't have optimized logic files to also produce libraries when --separate-architectures or --lazy-library-loading is turned on. Previously, one must disable both of these two flags in order for rocBLAS to run on architectures like gfx1010.

Test plan:

cmake -GNinja -B build -S . \
    -DCMAKE_C_COMPILER=hipcc \
    -DCMAKE_CXX_COMPILER=hipcc \
    -DBUILD_CLIENTS_TESTS=OFF \
    -DBUILD_CLIENTS_BENCHMARKS=OFF \
    -DBUILD_CLIENTS_SAMPLES=OFF \
    -DBUILD_TESTING=OFF \
    -DBUILD_WITH_TENSILE=ON \
    -DTensile_PRINT_DEBUG=ON \
    -DTensile_LIBRARY_FORMAT=msgpack \
    -DTensile_CPU_THREADS=14 \
    -DTensile_LAZY_LIBRARY_LOADING=ON \
    -DAMDGPU_TARGETS="..."

With AMDGPU_TARGETS being one of the following

  • AMDGPU_TARGETS=gfx1010
  • AMDGPU_TARGETS=gfx1030;gfx1010
  • AMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102

In all three cases, $ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat is produced and all other *.dat files remain unchanged.

Fixes ROCm#1757.

Enables architectures that don't have optimized logic files to also produce
libraries when `--separate-architectures` or `--lazy-library-loading` is
turned on. Previously, one must disable both of these two flags in order for
rocBLAS to run on architectures like `gfx1010`.

Test plan:
```
cmake -GNinja -B build -S . \
    -DCMAKE_C_COMPILER=hipcc \
    -DCMAKE_CXX_COMPILER=hipcc \
    -DBUILD_CLIENTS_TESTS=OFF \
    -DBUILD_CLIENTS_BENCHMARKS=OFF \
    -DBUILD_CLIENTS_SAMPLES=OFF \
    -DBUILD_TESTING=OFF \
    -DBUILD_WITH_TENSILE=ON \
    -DTensile_PRINT_DEBUG=ON \
    -DTensile_LIBRARY_FORMAT=msgpack \
    -DTensile_CPU_THREADS=14 \
    -DTensile_LAZY_LIBRARY_LOADING=ON \
    -DAMDGPU_TARGETS="..."
```
With `AMDGPU_TARGETS` being one of the following
- `AMDGPU_TARGETS=gfx1010`
- `AMDGPU_TARGETS=gfx1030;gfx1010`
- `AMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102`

In all three cases,
`$ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat` is produced
and all other `*.dat` files remain unchanged.

Signed-off-by: Gavin Zhao <git@gzgz.dev>
@hiepxanh
Copy link

@AlexBrownAMD @nakajee Hi, can someone take a look? I think this is important issue which can help alot of AMD user to working on ML and AI

@hiepxanh
Copy link

@yoichiyoshida @babakpst @bragadeesh can someone merge this PR?

@hiepxanh
Copy link

@nakajee @AlexBrownAMD someone take a look and merge please 😢

@nakajee
Copy link
Contributor

nakajee commented Jan 19, 2024

We are aware of this PR.
We have started reviewing this.
Please give us some more time to finalize our decision.

@hiepxanh
Copy link

@babakpst @yoichiyoshida @AlexBrownAMD @nakajee someone please help me to merge this one?

Copy link
Collaborator

@AlexBrownAMD AlexBrownAMD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

External PR review summary:
Reviewed PR with Braga. This change adds fallback libs for alternative archs. Change is small and does not introduce any new IP or dependencies. Should not affect existing libs / size, build needs to request alternative archs.

@AlexBrownAMD AlexBrownAMD merged commit efbe0c0 into ROCm:develop Jan 24, 2024
@hiepxanh
Copy link

@AlexBrownAMD thank you so much <3

@userbox020
Copy link

hello guys, im trying to use my rx5700 with llamacpp, but im getting the error of

rocBLAS error: Cannot read /opt/rocm/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

thats how i found this repo, I tried to installed like the follow

git clone https://github.com/ROCmSoftwarePlatform/rocBLAS.git

cd rocBLAS

cmake -GNinja -B build -S . \
    -DCMAKE_C_COMPILER=hipcc \
    -DCMAKE_CXX_COMPILER=hipcc \
    -DBUILD_CLIENTS_TESTS=OFF \
    -DBUILD_CLIENTS_BENCHMARKS=OFF \
    -DBUILD_CLIENTS_SAMPLES=OFF \
    -DBUILD_TESTING=OFF \
    -DBUILD_WITH_TENSILE=ON \
    -DTensile_PRINT_DEBUG=ON \
    -DTensile_LIBRARY_FORMAT=msgpack \
    -DTensile_CPU_THREADS=14 \
    -DTensile_LAZY_LIBRARY_LOADING=ON \
    -DAMDGPU_TARGETS=gfx1010

got the end message of -- Build files have been written to: /home/mruserbox/Desktop/RocBlas/rocBLAS/build

But when i try to run again llamacpp get the same error of missing tensile library

@GZGavinZhao
Copy link
Contributor Author

@userbox020 This PR introduced a regression so the changes have been reverted. I've been investigating this issue. Please wait until #1757 is closed.

@userbox020
Copy link

@GZGavinZhao thanks bro going to follow the work, do you use discord or something where we can chat in a non too formal way. I woudl like to help in anything i can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants