Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fallback libraries for archs without optimized logic #1862

Merged
merged 1 commit into from
Jan 24, 2024

Commits on Jan 11, 2024

  1. Use fallback libraries for archs without optimized logic

    Fixes ROCm#1757.
    
    Enables architectures that don't have optimized logic files to also produce
    libraries when `--separate-architectures` or `--lazy-library-loading` is
    turned on. Previously, one must disable both of these two flags in order for
    rocBLAS to run on architectures like `gfx1010`.
    
    Test plan:
    ```
    cmake -GNinja -B build -S . \
        -DCMAKE_C_COMPILER=hipcc \
        -DCMAKE_CXX_COMPILER=hipcc \
        -DBUILD_CLIENTS_TESTS=OFF \
        -DBUILD_CLIENTS_BENCHMARKS=OFF \
        -DBUILD_CLIENTS_SAMPLES=OFF \
        -DBUILD_TESTING=OFF \
        -DBUILD_WITH_TENSILE=ON \
        -DTensile_PRINT_DEBUG=ON \
        -DTensile_LIBRARY_FORMAT=msgpack \
        -DTensile_CPU_THREADS=14 \
        -DTensile_LAZY_LIBRARY_LOADING=ON \
        -DAMDGPU_TARGETS="..."
    ```
    With `AMDGPU_TARGETS` being one of the following
    - `AMDGPU_TARGETS=gfx1010`
    - `AMDGPU_TARGETS=gfx1030;gfx1010`
    - `AMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102`
    
    In all three cases,
    `$ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat` is produced
    and all other `*.dat` files remain unchanged.
    
    Signed-off-by: Gavin Zhao <git@gzgz.dev>
    GZGavinZhao committed Jan 11, 2024
    Configuration menu
    Copy the full SHA
    9fa257d View commit details
    Browse the repository at this point in the history