Tensile won't produce backend libraries for archs without optimized logic files when using --separate-architectures #1757

ulyssesrr · 2023-08-17T18:30:37Z

Issue

Tensile won't produce backend libraries for archs without optimized logic files when using --separate-architectures.

Description

According with #1165 (comment) "gfx1010 has been enabled by default in rocBLAS builds since ROCm 4.3.0." however since rocBLAS does not have optimized logic files for navi10 no library is produced for gfx1010.

$ drun --rm rocm/dev-ubuntu-22.04:5.6-complete
root@ftl:/# ls -1 /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx*
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat

Expected

Tensile should produce libraries for all requested architectures, using the fallback logic files for archs missing optimized logic files.

Workaround

Building rocBLAS with --merge-architectures --no-lazy-library-loading seems to avoid the issue.

Patch

https://github.com/ulyssesrr/docker-rocm-xtra/blob/3be41a9d79ff4f4324f3f34383b2282529c0c4b7/rocm-xtra-builder-rocblas/patches/Tensile-fix-fallback-arch-build.patch

The text was updated successfully, but these errors were encountered:

smirgol · 2023-10-28T17:33:24Z

Although that's probably not the right place, I really needed to say thank you! I've struggled with that basically since my card has been released and finally I was able to fix it because of you.

Doing compute stuff is just a nightmare with AMD, really.

Fixes ROCm#1757. Enables architectures that don't have optimized logic files to also produce libraries when `--separate-architectures` or `--lazy-library-loading` is turned on. Previously, one must disable both of these two flags in order for rocBLAS to run on architectures like `gfx1010`. Test plan: ``` cmake -GNinja -B build -S . \ -DCMAKE_C_COMPILER=hipcc \ -DCMAKE_CXX_COMPILER=hipcc \ -DBUILD_CLIENTS_TESTS=OFF \ -DBUILD_CLIENTS_BENCHMARKS=OFF \ -DBUILD_CLIENTS_SAMPLES=OFF \ -DBUILD_TESTING=OFF \ -DBUILD_WITH_TENSILE=ON \ -DTensile_PRINT_DEBUG=ON \ -DTensile_LIBRARY_FORMAT=msgpack \ -DTensile_CPU_THREADS=14 \ -DTensile_LAZY_LIBRARY_LOADING=ON \ -DAMDGPU_TARGETS="..." ``` With `AMDGPU_TARGETS` being one of the following - `AMDGPU_TARGETS=gfx1010` - `AMDGPU_TARGETS=gfx1030;gfx1010` - `AMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102` In all three cases, `$ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat` is produced and all other `*.dat` files remain unchanged. Signed-off-by: Gavin Zhao <git@gzgz.dev>

GZGavinZhao · 2024-01-11T17:52:07Z

#1862 has an updated version of this patch for ROCm >=5.5.

Fixes #1757. Enables architectures that don't have optimized logic files to also produce libraries when `--separate-architectures` or `--lazy-library-loading` is turned on. Previously, one must disable both of these two flags in order for rocBLAS to run on architectures like `gfx1010`.

nakajee · 2024-02-02T16:43:56Z

This change triggered a fail in rocblas test.
We cannot add this change to our release until we solve the issue.

GZGavinZhao · 2024-02-02T16:45:55Z

I actually have a fix for a test failure that I just found out today, but may I get a failure log to ensure that it's the same failure I'm getting?

nakajee · 2024-02-02T16:49:15Z

command line: ./rocblas-test --gtest_output=xml --gtest_color=yes --gtest_filter=*quick*:*pre_checkin*-*known_bug*

Error:

[----------] 32 tests from _/gemm_ex_get_solutions

/var/jenkins_home/workspace/eckin_rocBLAS-internal_develop_2/z344iq86D/rocblas/clients/gtest/../include/blas_ex/testing_gemm_ex_get_solutions.hpp:152: Failure

Value of: status_match(rocblas_status_success, status_)

Actual: false (got rocblas_status_invalid_value instead of rocblas_status_success)

Expected: true

[ FAILED ] _/gemm_ex_get_solutions.blas3_tensile/pre_checkin_gemm_ex_get_solutions_f16_rf16_rf16_rf16_rf32_r_CN_250_250_250_1_250_250_1_250_250, where GetParam() = { function: "gemm_ex_get_solutions", name: "gemm_ex_get_solutions", category: "pre_checkin", known_bug_platforms: "", beta: 1.0, stride_a: 62500, stride_b: 62500, stride_c: 62500, stride_d: 62500, M: 250, N: 250, K: 250, lda: 250, ldb: 250, ldc: 250, ldd: 250, a_type: f16_r, b_type: f16_r, c_type: f16_r, d_type: f16_r, composite_compute_type: invalid, initialization: rand_int, gpu_arch: "", flush_batch_count: 1, transA: 'C', transB: 'N' }

(6 ms)

nakajee · 2024-02-02T16:51:50Z

We are investigating the issue now, but have not found the cause yet.

nakajee · 2024-02-02T16:52:59Z

As long as I tried, this fail does not happen if I revert the fallback change.

GZGavinZhao · 2024-02-02T16:58:14Z

I'm building rocBLAS locally to test. I've been working on ISA compatibility improvements in rocBLAS so my local copy has some modifications. With my current modifications my gfx1032 GPU is passing the test you mentioned, so I'm stashing my changes and building the develop branch right now to check if I can reproduce this failure.

While it's building, could you change the if statement at https://github.com/ROCm/rocBLAS/blob/5211f0dca313c56c2163b8602581242c8cb608f1/library/src/tensile_host.cpp#L991C1-L992C43 from

        if(library)
            *library = host.get_library();

to

        if(adapter)
            *library = host.get_library();

and see if you get a segfault (sigsev)?

nakajee · 2024-02-02T17:02:23Z

Actually. I could not reproduce the fail with my local rocblas build, but it fails on our CI environment.
I do not quite understand your change.
Could we ignore library==NULL case?

GZGavinZhao · 2024-02-02T17:07:40Z

What GPU arch does the CI environment has?

Could we ignore library==NULL case?

No, sometimes library is NULL when execution reaches this if statement. I don't know why library would become NULL, but this has caused several segfaults on me so in my local changes I had to check if library is NULL and assign host.get_library() to library if it is.

nakajee · 2024-02-02T17:16:06Z

The fail above is gfx1101. I do not have gfx1032 environment.

GZGavinZhao · 2024-02-02T17:16:14Z

To clarify, above that line there's a comment:

// If an adapter is found, it is assumed that the library is initialized

If the "library" refers to the library variable, then this doesn't always holds. I have logged the values of adapter and library and sometimes adapter is non-NULL while library is NULL, which will then cause segfaults in runContractionProblem.

GZGavinZhao · 2024-02-02T17:22:11Z

Also does it only fail on Level-3? Or are there also failures with Level-2 and Level-1 operations as well?

nakajee · 2024-02-02T17:25:33Z

I am not familiar with rocblas side, but if get_library_and_adapter() is called from rocblas_initialize(), library seems to be NULL since library is not specified here.

GZGavinZhao · 2024-02-02T17:26:40Z

The fail above is gfx1101. I do not have gfx1032 environment.

I'm confused as to why my change would affect gfx1101, as gfx1101 should have optimized logic files so fallback libraries shouldn't even be compiled? Does the log contain lines like Using fallback for arch: <arch>?

nakajee · 2024-02-02T17:29:13Z

If we do not specify -a option when building rocblas, rocblas picks Tensile library for all architectures including gfx1010 and 1012 added by the fallback change.
For some reason, it affects solution selection (which should not happen).

nakajee · 2024-02-02T17:30:45Z

That is my guess. We still do not understand why it fails.

GZGavinZhao · 2024-02-02T17:39:02Z

rocBLAS still compiling. Will report back when I get to run the tests and reproduce the failure.

GZGavinZhao · 2024-02-02T17:50:44Z

A SIGSEV was triggered, let me debug what went wrong.

Edit: the exact failure also reproduced.

GZGavinZhao · 2024-02-03T03:12:08Z

rocblas_gemm_batched_ex_get_solutions is doing something weird. For a Contraction_l_Ailk_Bjlk_Cijk_Dijk problem, it returned a solution index corresponding to Cijk_Alik_Bljk_HB_MT128x64x16_SN_AMAS3_BL1_BS1_EPS0_GLVWA8_GLVWB8_GRVW8_GSU1_GSUASB_ISA1030_IU1_K1_KLA_LDL1_LRVW8_MMFGLC_NLCA1_NLCB1_PGR1_PLR1_SIA1_SU0_SUM0_SUS0_SVW4_TT8_8_USFGROn1_VAW2_VSn1_VW8_VWB8_WS32_WG16_8_1_WGM1. Other similar problems returned indices in the range of a few 3000~5000, while for this particular problem solution with indices 1 and 2 are returned. Will do more investigation during the weekend.

GZGavinZhao · 2024-02-05T12:52:15Z

Putting some investigation notes here. I will spend more time to dig through this later in this week, but if anyone wants to investigate feel free to build on top of here.

Through my tracing I found that solution selection doesn't seem to be affected. If you print out every single solution found in getAllSolutions, you will see that they all correspond to the correct solution. The problem is that despite the library object being the same (verified by printing the address of library), with the same index library->getSolutionByIndex(index) in runContractionProblem and getAllSolutions return different solutions. This is what baffled me. Basically if you run library->getSolutionByIndex(2) in runContractionProblem and getAllSolutions, they will return different solutions despite calling on the same library object. Will investigate further later in this week to see why this happens.

userbox020 · 2024-02-08T06:13:14Z

hello guys, fist of all thanks for all the hard work you doing to make rx5700 work, im just a hobbist and not even close to be near your league of expertice.
However i would like to ask if meanwhile the problem is solve, can i do something similar to

export ROCM_PATH=/opt/rocm
export HCC_AMDGPU_TARGET=gfx803
export HSA_OVERRIDE_GFX_VERSION=8.3.0

to make work my rx5700 with llamacpp? want to ask frist before do the test and break my ubuntu lol

smirgol · 2024-02-10T18:21:50Z

hello guys, fist of all thanks for all the hard work you doing to make rx5700 work, im just a hobbist and not even close to be near your league of expertice. However i would like to ask if meanwhile the problem is solve, can i do something similar to
export ROCM_PATH=/opt/rocm
export HCC_AMDGPU_TARGET=gfx803
export HSA_OVERRIDE_GFX_VERSION=8.3.0
to make work my rx5700 with llamacpp? want to ask frist before do the test and break my ubuntu lol

I'm not sure any more if that is all that it takes, because I fiddled a LOT to make my now replaced RX 5700 XT work, but I've used these settings:

export PATH="/opt/rocm/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/rocm/lib/"
export HSA_OVERRIDE_GFX_VERSION=10.1.0
export HCC_AMDGPU_TARGET=gfx1010

In some places I've also used

export PATH="/opt/rocm/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/rocm/lib/"
export HSA_OVERRIDE_GFX_VERSION=10.3.0
export HCC_AMDGPU_TARGET=gfx1030
ROCM_VERSION=5.6

but in any case I did not use the overrides that you have used, with these low versions / numbers.

That shouldn't break anything, as it is solely related to things that make use of ROCm. Worst case it won't work.
Sorry that I can't give you more hints, I have forgotten most of the things I've tried to make it work and a lot of it was blindly poking at things anyway. But these tensile libraries this issue is about definitely did help at some point.

Edit:
For LLMs I ended up using mainly https://github.com/YellowRoseCx/koboldcpp-rocm for loading the LLMs and then e.g. SillyTavern for the frontend. I had quite some issues with oobabooga back then, but these might have been resolved meanwhile.

cgmb · 2024-02-13T05:43:45Z

The way that separate architectures and lazy loading were implemented was really not ideal. The complexity of building all the necessary data structures during initialization should really be pushed to build time, and there should be no meaningful logic executing during initialization. The initialization could be nothing more than read or mmap and then there would be no need to split the dat files at all, because it would be so fast to load that you could read everything in a fraction of the time the current implementation takes. Or, you could read the parts you needed on-demand. There has been so much complexity introduced into Tensile just to avoid fixing the underlying data representation on disk.

The use of an unindexed key-value pair format like msgpack is the underlying cause of these bugs, because the slow conversion of that data into the Tensile in-memory format drives the introduction of complicated logic to try to be clever about the loading. If a more appropriate data format was used, there would be no need to be clever.

This is not the most helpful comment of mine, because I presume folks here want this bug fixed in less time than it would take to rearchitect the Tensile on-disk data format. The separate-architectures and lazy-loading features just frustrate me. I was there when those features were designed and implemented (by a very close friend of mine who is no longer at AMD), and I told the author this back then too.

Redesign the on-disk data format and you will:

Reduce the number of bugs in Tensile
Improve Tensile initialization performance
Reduce the size and complexity of the Tensile codebase

cgmb · 2024-02-13T05:51:35Z

to make work my rx5700 with llamacpp? want to ask frist before do the test and break my ubuntu lol

Use librocblas-dev and libhipblas-dev from Ubuntu 23.10 or later. Here's an example of how to build and run llama-cpp for any discrete Vega, RDNA 1, RDNA 2, CDNA 1 or CDNA 2 GPU in a docker container: https://gist.github.com/cgmb/be113c04cd740425f637aa33c3e4ea33

It might also work on Polaris, but it might not (since the software for that architecture has a lot of bugs).

userbox020 · 2024-02-16T20:55:13Z

@smirgol one of the contributors of rocblas says that we can compile llamacpp with hipblas and mix old and new gpus

ROCm/rocBLAS#1251 (comment)

userbox020 · 2024-02-16T20:56:29Z

lol i just notice its cgmb, sup bro i just dm you in the other repo chat lol

nakajee · 2024-03-04T21:50:04Z

This change triggered a fail in rocblas test. We cannot add this change to our release until we solve the issue.

Fix for gemm_ex_get_solutions issue has been merged into Tensile and rocBLAS develop branch.
We might be able to try the previous fix for gfx1010 on top of the latest Tensile develop.
I do not have gfx1010 environment to confirm the fix.

Fixes ROCm#1757. Enables architectures that don't have optimized logic files to also produce libraries when `--separate-architectures` or `--lazy-library-loading` is turned on. Previously, one must disable both of these two flags in order for rocBLAS to run on architectures like `gfx1010`. Test plan: ``` cmake -GNinja -B build -S . \ -DCMAKE_C_COMPILER=hipcc \ -DCMAKE_CXX_COMPILER=hipcc \ -DBUILD_CLIENTS_TESTS=OFF \ -DBUILD_CLIENTS_BENCHMARKS=OFF \ -DBUILD_CLIENTS_SAMPLES=OFF \ -DBUILD_TESTING=OFF \ -DBUILD_WITH_TENSILE=ON \ -DTensile_PRINT_DEBUG=ON \ -DTensile_LIBRARY_FORMAT=msgpack \ -DTensile_CPU_THREADS=14 \ -DTensile_LAZY_LIBRARY_LOADING=ON \ -DAMDGPU_TARGETS="..." ``` With `AMDGPU_TARGETS` being one of the following - `AMDGPU_TARGETS=gfx1010` - `AMDGPU_TARGETS=gfx1030;gfx1010` - `AMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102` In all three cases, `$ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat` is produced and all other `*.dat` files remain unchanged. Signed-off-by: Gavin Zhao <git@gzgz.dev>

GZGavinZhao · 2024-03-05T01:07:27Z

@nakajee I think at the current stage we don't have to test on gfx1010 yet. The first step is to confirm that when compiling any already supported arch with gfx1010 (such as AMDGPU_TARGETS="gfx1010;gfx1030"), all tests pass, as per the directions specified in #1897. Currently I cannot build rocBLAS at head (5937a87d) with ROCm 6.0 because I get the following error message:

# Tensile Create Library
Tensile::WARNING: Did not detect SupportedISA: [(8, 0, 3), (9, 0, 0), (9, 0, 6), (9, 0, 8), (9, 0, 10), (9, 4, 0), (9, 4, 1), (9, 4, 2), (10, 1, 0), (10, 1, 1), (10, 1, 2), (10, 3, 0), (10, 3, 1), (11, 0, 0), (11, 0, 1), (11, 0, 2)]; cannot benchmark assembly kernels.
# Found  hipcc version 6.0.0-0
Tensile::FATAL: Cached asm caps differ from derived asm caps for (9, 0, 10)
CMake Error at build/virtualenv/cmake/TensileConfig.cmake:277 (message):
  Error creating Tensile library: 255
Call Stack (most recent call first):
  library/src/CMakeLists.txt:74 (TensileCreateLibraryFiles)

GZGavinZhao · 2024-03-05T01:08:05Z

If you can help test on your environment that'd be great.

nakajee · 2024-03-05T01:37:40Z

@nakajee I think at the current stage we don't have to test on gfx1010 yet. The first step is to confirm that when compiling any already supported arch with gfx1010 (such as AMDGPU_TARGETS="gfx1010;gfx1030"), all tests pass, as per the directions specified in #1897. Currently I cannot build rocBLAS at head (5937a87d) with ROCm 6.0 because I get the following error message:
# Tensile Create Library
Tensile::WARNING: Did not detect SupportedISA: [(8, 0, 3), (9, 0, 0), (9, 0, 6), (9, 0, 8), (9, 0, 10), (9, 4, 0), (9, 4, 1), (9, 4, 2), (10, 1, 0), (10, 1, 1), (10, 1, 2), (10, 3, 0), (10, 3, 1), (11, 0, 0), (11, 0, 1), (11, 0, 2)]; cannot benchmark assembly kernels.
# Found  hipcc version 6.0.0-0
Tensile::FATAL: Cached asm caps differ from derived asm caps for (9, 0, 10)
CMake Error at build/virtualenv/cmake/TensileConfig.cmake:277 (message):
  Error creating Tensile library: 255
Call Stack (most recent call first):
  library/src/CMakeLists.txt:74 (TensileCreateLibraryFiles)

I will implement some workaround for this fail.
To make it work now,

checkout Tensile with commit id is the same as tensile_tag.txt in rocblas
modify Tensile/Common.py
globalParameters["IgnoreAsmCapCache"] = False
-> change to True
build rocblas with -t [path to Tensile]

nakajee · 2024-03-05T02:51:23Z

Fix for asm cap error.
#1898

GZGavinZhao · 2024-03-05T04:03:17Z

The *gemm_ex_get_solutions* tests that previously failed are now passing. If AMD's CI also passes I think #1897 would be good to go.

Fixes #1757. Enables architectures that don't have optimized logic files to also produce libraries when `--separate-architectures` or `--lazy-library-loading` is turned on. Previously, one must disable both of these two flags in order for rocBLAS to run on architectures like `gfx1010`. Test plan: ``` cmake -GNinja -B build -S . \ -DCMAKE_C_COMPILER=hipcc \ -DCMAKE_CXX_COMPILER=hipcc \ -DBUILD_CLIENTS_TESTS=OFF \ -DBUILD_CLIENTS_BENCHMARKS=OFF \ -DBUILD_CLIENTS_SAMPLES=OFF \ -DBUILD_TESTING=OFF \ -DBUILD_WITH_TENSILE=ON \ -DTensile_PRINT_DEBUG=ON \ -DTensile_LIBRARY_FORMAT=msgpack \ -DTensile_CPU_THREADS=14 \ -DTensile_LAZY_LIBRARY_LOADING=ON \ -DAMDGPU_TARGETS="..." ``` With `AMDGPU_TARGETS` being one of the following - `AMDGPU_TARGETS=gfx1010` - `AMDGPU_TARGETS=gfx1030;gfx1010` - `AMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102` In all three cases, `$ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat` is produced and all other `*.dat` files remain unchanged. Signed-off-by: Gavin Zhao <git@gzgz.dev>

Patch taken from ROCm/Tensile#1757

Fixes ROCm#1757. Enables architectures that don't have optimized logic files to also produce libraries when `--separate-architectures` or `--lazy-library-loading` is turned on. Previously, one must disable both of these two flags in order for rocBLAS to run on architectures like `gfx1010`. Test plan: ``` cmake -GNinja -B build -S . \ -DCMAKE_C_COMPILER=hipcc \ -DCMAKE_CXX_COMPILER=hipcc \ -DBUILD_CLIENTS_TESTS=OFF \ -DBUILD_CLIENTS_BENCHMARKS=OFF \ -DBUILD_CLIENTS_SAMPLES=OFF \ -DBUILD_TESTING=OFF \ -DBUILD_WITH_TENSILE=ON \ -DTensile_PRINT_DEBUG=ON \ -DTensile_LIBRARY_FORMAT=msgpack \ -DTensile_CPU_THREADS=14 \ -DTensile_LAZY_LIBRARY_LOADING=ON \ -DAMDGPU_TARGETS="..." ``` With `AMDGPU_TARGETS` being one of the following - `AMDGPU_TARGETS=gfx1010` - `AMDGPU_TARGETS=gfx1030;gfx1010` - `AMDGPU_TARGETS=gfx803;gfx900;gfx906:xnack-;gfx908:xnack-;gfx90a:xnack+;gfx90a:xnack-;gfx1010;gfx1012;gfx1030;gfx1100;gfx1101;gfx1102` In all three cases, `$ROCM_PATH/lib/rocblas/library/TensileLibrary_lazy_gfx1010.dat` is produced and all other `*.dat` files remain unchanged. Signed-off-by: Gavin Zhao <git@gzgz.dev>

This was referenced Aug 17, 2023

[Bug]: rocBLAS error: Cannot read TensileLibrary.dat: No such file or directory ROCm/rocBLAS#1339

Open

ROCm Port ggerganov/llama.cpp#1087

Merged

GZGavinZhao mentioned this issue Jan 11, 2024

Use fallback libraries for archs without optimized logic #1862

Merged

hiepxanh mentioned this issue Jan 17, 2024

rocBLAS AMD: Cannot read TensileLibrary.dat because no support GPU (wait PR) Mozilla-Ocho/llamafile#208

Closed

AlexBrownAMD closed this as completed in #1862 Jan 24, 2024

nakajee reopened this Feb 2, 2024

scabros mentioned this issue Feb 14, 2024

Support Radeon RX 5700 XT (gfx1010) ollama/ollama#2503

Open

cgmb mentioned this issue Feb 16, 2024

gfx1031: initial import ROCm/rocBLAS#1251

Closed

GZGavinZhao mentioned this issue Mar 5, 2024

Use fallback libraries for archs without optimized logic (v2) #1897

Merged

nakajee mentioned this issue Mar 5, 2024

Ignore asm cap check for kernel arg preload for rocm6.0 #1898

Merged

nakajee closed this as completed in #1897 Mar 6, 2024

T0mstone added a commit to T0mstone/void-packages that referenced this issue Mar 6, 2024

rocBLAS: Tensile: Fix fallback architecture

dbfd3ec

Patch taken from ROCm/Tensile#1757

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensile won't produce backend libraries for archs without optimized logic files when using --separate-architectures #1757

Tensile won't produce backend libraries for archs without optimized logic files when using --separate-architectures #1757

ulyssesrr commented Aug 17, 2023 •

edited

smirgol commented Oct 28, 2023

GZGavinZhao commented Jan 11, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024 •

edited

nakajee commented Feb 2, 2024 •

edited

nakajee commented Feb 2, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

nakajee commented Feb 2, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024 •

edited

GZGavinZhao commented Feb 3, 2024

GZGavinZhao commented Feb 5, 2024

userbox020 commented Feb 8, 2024 •

edited

smirgol commented Feb 10, 2024 •

edited

cgmb commented Feb 13, 2024

cgmb commented Feb 13, 2024 •

edited

userbox020 commented Feb 16, 2024

userbox020 commented Feb 16, 2024

nakajee commented Mar 4, 2024

GZGavinZhao commented Mar 5, 2024

GZGavinZhao commented Mar 5, 2024

nakajee commented Mar 5, 2024

nakajee commented Mar 5, 2024

GZGavinZhao commented Mar 5, 2024

Tensile won't produce backend libraries for archs without optimized logic files when using --separate-architectures #1757

Tensile won't produce backend libraries for archs without optimized logic files when using --separate-architectures #1757

Comments

ulyssesrr commented Aug 17, 2023 • edited

Issue

Description

Expected

Workaround

Patch

smirgol commented Oct 28, 2023

GZGavinZhao commented Jan 11, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024 • edited

nakajee commented Feb 2, 2024 • edited

nakajee commented Feb 2, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

nakajee commented Feb 2, 2024

nakajee commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024

GZGavinZhao commented Feb 2, 2024 • edited

GZGavinZhao commented Feb 3, 2024

GZGavinZhao commented Feb 5, 2024

userbox020 commented Feb 8, 2024 • edited

smirgol commented Feb 10, 2024 • edited

cgmb commented Feb 13, 2024

cgmb commented Feb 13, 2024 • edited

userbox020 commented Feb 16, 2024

userbox020 commented Feb 16, 2024

nakajee commented Mar 4, 2024

GZGavinZhao commented Mar 5, 2024

GZGavinZhao commented Mar 5, 2024

nakajee commented Mar 5, 2024

nakajee commented Mar 5, 2024

GZGavinZhao commented Mar 5, 2024

ulyssesrr commented Aug 17, 2023 •

edited

GZGavinZhao commented Feb 2, 2024 •

edited

nakajee commented Feb 2, 2024 •

edited

GZGavinZhao commented Feb 2, 2024 •

edited

userbox020 commented Feb 8, 2024 •

edited

smirgol commented Feb 10, 2024 •

edited

cgmb commented Feb 13, 2024 •

edited