[Bug]: rocBLAS error: Cannot read TensileLibrary.dat: No such file or directory #1339

slipperyslipped · 2023-07-04T13:40:30Z

Describe the bug

Basically getting some form of this error, either rocBLAS error: Cannot read /opt/rocm-5.4.0/lib/rocblas/library/TensileLibrary.dat: Illegal seek or Cannot read TensileLibrary.dat: No such file or directory

To Reproduce

rocblas-dev (= 2.46.0.50400-72~22.04)

Steps to reproduce the behavior:

Basically followed the guide on AMD Docs to install ROCm along with almost all usecases because I kept having errors with packages missing.
Built wheel for onnxruntime using docker
Trying to run the python application (roop) and getting this error

Expected behavior

No error?

Log-files

Aborted (core dumped)
(roop) hobi@hobi:~/roop$ ~python run.py --execution-provider rocm --execution-threads
Command '~python' not found, did you mean:
  command 'bpython' from deb bpython (0.22.1-2)
  command 'xpython' from deb xpython (0.12.5-1build1)
Try: sudo apt install <deb name>
(roop) hobi@hobi:~/roop$ python run.py --execution-provider rocm --execution-threads
usage: run.py [-h] [-s SOURCE_PATHS] [-t TARGET_PATHS] [-o OUTPUT_PATH] [--frame-processor {face_swapper,face_enhancer} [{face_swapper,face_enhancer} ...]] [--keep-fps] [--keep-audio] [--keep-frames] [--keep-filenames] [--many-faces]
              [--video-encoder {libx264,libx265,libvpx-vp9}] [--video-quality [0-51]] [--max-memory MAX_MEMORY] [--execution-provider {rocm,cpu} [{rocm,cpu} ...]] [--execution-threads EXECUTION_THREADS] [-v]
run.py: error: argument --execution-threads: expected one argument
(roop) hobi@hobi:~/roop$ python run.py --execution-provider rocm --execution-threads 2
[ROOP.CORE] Creating temp resources...
[ROOP.CORE] Extracting frames...
Applied providers: ['ROCMExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'ROCMExecutionProvider': {'tunable_op_tuning_enable': '0', 'do_copy_in_default_stream': '1', 'miopen_conv_exhaustive_search': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'miopen_conv_use_max_workspace': '1', 'gpu_mem_limit': '18446744073709551615', 'tunable_op_enable': '0', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/hobi/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['ROCMExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'ROCMExecutionProvider': {'tunable_op_tuning_enable': '0', 'do_copy_in_default_stream': '1', 'miopen_conv_exhaustive_search': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'miopen_conv_use_max_workspace': '1', 'gpu_mem_limit': '18446744073709551615', 'tunable_op_enable': '0', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/hobi/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['ROCMExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'ROCMExecutionProvider': {'tunable_op_tuning_enable': '0', 'do_copy_in_default_stream': '1', 'miopen_conv_exhaustive_search': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'miopen_conv_use_max_workspace': '1', 'gpu_mem_limit': '18446744073709551615', 'tunable_op_enable': '0', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/hobi/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['ROCMExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'ROCMExecutionProvider': {'tunable_op_tuning_enable': '0', 'do_copy_in_default_stream': '1', 'miopen_conv_exhaustive_search': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'miopen_conv_use_max_workspace': '1', 'gpu_mem_limit': '18446744073709551615', 'tunable_op_enable': '0', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/hobi/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['ROCMExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'ROCMExecutionProvider': {'tunable_op_tuning_enable': '0', 'do_copy_in_default_stream': '1', 'miopen_conv_exhaustive_search': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'miopen_conv_use_max_workspace': '1', 'gpu_mem_limit': '18446744073709551615', 'tunable_op_enable': '0', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/hobi/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)

rocBLAS error: Cannot read /opt/rocm-5.4.0/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

Environment

Hardware	description
CPU	Rzyen 5 5600
GPU	Radeon 6700XT

Software	version
rocm-core	v5.4.0.50400-72~22.04
rocblas	v2.46.0.50400-72~22.04

Make sure that ROCm is correctly installed and to capture detailed environment information run the following command:

printf '=== environment\n' > environment.txt &&
printf '\n\n=== date\n' >> environment.txt && date >> environment.txt &&
printf '\n\n=== Linux Kernel\n' >> environment.txt && uname -a  >> environment.txt &&
printf '\n\n=== rocm-smi' >> environment.txt && rocm-smi  >> environment.txt &&
printf '\n\n' >> environment.txt && hipconfig  >> environment.txt &&
printf '\n\n=== rocminfo\n' >> environment.txt && rocminfo  >> environment.txt &&
printf '\n\n=== lspci VGA\n' >> environment.txt && lspci | grep -i vga >> environment.txt

Getting this error: ```No LSB modules are available.



### Additional context
I am super new to machine learning and I am having a nightmare time of making things work with ROCm. Pretty much at the end of the rope here guys. Any help would be appreciated. Thank you.

The text was updated successfully, but these errors were encountered:

cgmb · 2023-07-05T18:31:40Z

Hi @slipperyslipped. Your GPU uses the gfx1031 instruction set, but the binaries distributed by AMD are not built for that architecture as it is not officially supported. However, the gfx1030 instruction set is identical to the gfx1031 instruction set in all but name. For this reason, there are ways to get the existing binaries running on your GPU.

As a workaround, I would recommend setting the environment variable export HSA_OVERRIDE_GFX_VERSION=10.3.0. This will cause your GPU to report that it supports the gfx1030 instruction set, which is included in the AMD-provided binaries. I've confirmed that this works correctly with rocBLAS on the RX 6750 XT. I believe this workaround is generally applicable to any discrete RDNA 2 GPUs.

jasber9999 · 2023-08-10T07:23:55Z

Hi, I was blocked by the same problem
"rocBLAS error: Cannot read /home/bc250/Desktop/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: No such file or directory" when launching stable-diffusion-webui.

I am using a gfx 1013 device. Can I set pytorch not to use rocBlas for this ?

cgmb · 2023-08-14T22:21:15Z

I'm not an expert on PyTorch, but the gfx1013 ISA is a superset of the gfx1010 ISA. You can set export HSA_OVERRIDE_GFX_VERSION=10.1.0 and it will probably work. With that said, it is obviously not an officially supported configuration. You may want to build and run the rocBLAS test suite to check that the library functions correctly on your hardware with that workaround.

ulyssesrr · 2023-08-17T18:39:31Z

@cgmb gfx1010 produces the same issue:

$ drun --rm rocm/dev-ubuntu-22.04:5.6-complete
root@ftl:/# ls -1 /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx*
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat

As you see there is no TensileLibrary_lazy_gfx1010.dat in the official container, while rocBLAS does build with gfx1010 enabled, Tensile is not producing the library, see ROCm/Tensile#1757

cgmb · 2023-08-17T19:31:47Z

Thanks @ulyssesrr. That's a great analysis of the problem.

It's perhaps worth noting that the OS-provided rocBLAS package on Debian 13 (Testing/Trixie) and the upcoming Ubuntu 23.10 (Mantic Minotaur) builds Tensile with --merge-architectures --no-lazy-library-loading. For users on RDNA 1 hardware, that may be a good option until the problem is fixed in the AMD releases.

The OS-provided package for rocBLAS on Debian/Ubuntu also automatically handles loading code objects for ISAs that are known to be compatible as I'd suggested earlier in this thread. For this reason, the OS-provided package has much wider hardware compatibility than the AMD-provided package on GFX9 and GFX10 hardware.

I have not tested the OS-provided packages on all hardware platforms, but the tests are also packaged in the OS package librocblas0-tests (which entered Debian Unstable today and should migrate to Trixie next week), so you can run the tests on your own system to determine if it will work on your hardware.

Just mentioning it, since that's probably a useful workaround for some people on hardware that is not officially supported. Even folks on other operating systems could potentially spin up a docker container with an Ubuntu or Debian image and apt install librocblas-dev.

ulyssesrr · 2023-08-17T20:57:42Z

@cgmb I forgot to mention that the rocBLAS build script on 5.6.0 seems to have an issue where --merge-architectures and --no-lazy-library-loading have no effect, I stumbled on that when trying the workaround.

The rmake.py script treats the cmake flags Tensile_LAZY_LIBRARY_LOADING and Tensile_SEPARATE_ARCHITECTURES as opt-in.
https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/a5ef7c59507e6601a18539f42a088fe63bffaa5a/rmake.py#L371-L374

However I was getting them enabled by default, thus I had to actually opt-out, I'm guessing it is being done here:
https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/a5ef7c59507e6601a18539f42a088fe63bffaa5a/cmake/build-options.cmake#L75-L76

I didn't debug much, just rolled a patch and went my way(Which I ended not needing as I patched the Tensile issue):
https://github.com/ulyssesrr/docker-rocm-gfx803/blob/main/rocm-xtra-rocblas-builder/patches/deactivated/rocBLAS-fix_cmake_options.patch

As I didn't debug much, I didn't feel confident to open an Issue.

2eQTu · 2023-09-13T20:24:00Z

FYI seeing what seems to be the same TensileLibrary.dat: Illegal seek issue on Ubuntu 22.04 LTS and Radeon Software for Linux 23.20.

GPU is a 7800 XT.

Stack from running a basic PyTorch example under GDB is shown below. I did have to override gfx version to either 11.0.0 or 11.0.1 for it to see GPU at all but I forget which.

rocBLAS error: Cannot read /home/redacted/venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek

Thread 1 "python3" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) where
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352507392, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fff4e341ccf in rocblas_abort_once() () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#6  0x00007fff4e341c49 in rocblas_abort () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#7  0x00007fff4dc44633 in (anonymous namespace)::TensileHost::initialize(Tensile::hip::SolutionAdapter&, int) ()
   from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#8  0x00007fff4dc33929 in (anonymous namespace)::get_library_and_adapter(std::shared_ptr<Tensile::MasterSolutionLibrary<Tensile::ContractionProblem, Tensile::ContractionSolution> >*, std::shared_ptr<hipDeviceProp_t>*, int) () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#9  0x00007fff4dc46b6c in rocblas_status_ runContractionProblem<float, float, float>(RocblasContractionProblem<float, float, float> const&, rocblas_gemm_algo_, int) ()
   from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
[SNIP]

YellowRoseCx · 2023-09-13T22:32:39Z

FYI seeing what seems to be the same TensileLibrary.dat: Illegal seek issue on Ubuntu 22.04 LTS and Radeon Software for Linux 23.20.

GPU is a 7800 XT.

Stack from running a basic PyTorch example under GDB is shown below. I did have to override gfx version to either 11.0.0 or 11.0.1 for it to see GPU at all but I forget which.

rocBLAS error: Cannot read /home/redacted/venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek

Thread 1 "python3" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) where
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352507392, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fff4e341ccf in rocblas_abort_once() () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#6  0x00007fff4e341c49 in rocblas_abort () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#7  0x00007fff4dc44633 in (anonymous namespace)::TensileHost::initialize(Tensile::hip::SolutionAdapter&, int) ()
   from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#8  0x00007fff4dc33929 in (anonymous namespace)::get_library_and_adapter(std::shared_ptr<Tensile::MasterSolutionLibrary<Tensile::ContractionProblem, Tensile::ContractionSolution> >*, std::shared_ptr<hipDeviceProp_t>*, int) () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#9  0x00007fff4dc46b6c in rocblas_status_ runContractionProblem<float, float, float>(RocblasContractionProblem<float, float, float> const&, rocblas_gemm_algo_, int) ()
   from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
[SNIP]

Did you only install the Radeon Software or did you also install ROCm?

2eQTu · 2023-09-14T06:11:33Z

@YellowRoseCx Yes, rocm was installed. But there were some errors and perhaps there is a version mismatch. I have since reinstalled the whole machine and here is the current state:

Software	version
rocm-core	5.7.0.50700-45~22.04
rocblas	3.1.0.50700-45~22.04
uname -r	6.2.0-32-generic
rocminfo	[...] gfx1101

Same segfault and stack looks similar.

Here is a basic log of what I tried this time:

python3 -m venv ptroc561-nightly
cd ptroc561-nightly/
source bin/activate
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.6/
python3 -c 'import torch; print(torch.cuda.is_available())'
[TRUE]
git clone https://github.com/pytorch/examples.git
cd examples/mnist
python3 main.py
rocBLAS error: Cannot read /path/to/venv/ptroc561-nightly/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek
Aborted (core dumped)

This time I opted for the AMDGPU install flow option in the ROCm install guide. Running the installer from the amdgpu-install_5.6.50601-1_all.deb as specified did not result in a system where rocminfo saw a GPU. A newer amdgpu-install_5.7.50700-1_all.deb file I found on the server seemed to work. But the error is still the same as before. No env overrides needed this time, oddly enough.

Note PyTorch repo is nightly/rocm5.6/. When I tried to substitute nightly/rocm5.7/, it just installed some cuda flavors. I'm attempting to build PyTorch for ROCm from source on bare metal. We'll see how that goes.

cgmb · 2023-09-14T16:51:18Z

GPU is a 7800 XT.

Stack from running a basic PyTorch example under GDB is shown below. I did have to override gfx version to either 11.0.0 or 11.0.1 for it to see GPU at all but I forget which.

The RX 7800 XT (Navi 32) is gfx1101. You likely were overriding the gfx version to 11.0.0. However, that is not safe. The gfx1100 ISA has more registers than the gfx1101 ISA and there are other important differences in the ABI too.

With Navi 21/22/23/24, the gfx version override approach more or less worked, despite not being officially supported. Users execute code built for Navi 21 on any of those chips and I don't know of any problems encountered from doing so. The compiler handled each of those ISAs identically. Navi 31/32/33 are not like that. There are known differences between those chips that the compiler is accounting for when it generates code for each architecture.

(This isn't the cause of the specific TensileLibrary.dat error you encountered, but it's a warning that you may encounter other problems even once the Tensile issue is resolved, if you're using that override.)

2eQTu · 2023-09-14T23:07:56Z

@cgmb Thanks for the ISA incompatibility heads up for Navi 31/32/33. Good to know.

I actually had just started going through the RDNA 3 ISA doc, but did not notice any chip-specific differences called out so far. Is there other documentation I should review, or will there eventually be updates to highlight differences? Since this is off-topic for this issue, is there a better place to follow (or open) an issue wrt to documentation?

shtirlic · 2023-11-26T10:21:04Z

JFYI I got working stable diffusion automatic with rocm 5.7 working on Phoenix APU (7840u) via setting it to 11.0.0

export HSA_OVERRIDE_GFX_VERSION=11.0.0

Without this override I got

rocBLAS error: Cannot read /home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary.dat:
 No such file or directory for GPU arch : gfx1103
 List of available TensileLibrary Files :
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"

geekboood · 2023-12-22T18:00:53Z

For other arch such as gfx1103, I think the right way to use it is to generate a new TensileLibrary.dat file to get optimal performance. Do we have a way to trigger this process?

hiepxanh · 2024-01-19T07:22:13Z

@TorreZuk can you take a look or merge it please 😢 ROCm/Tensile#1862 My code wont run without it on rx 6600 xt

TorreZuk · 2024-01-19T15:54:11Z

@TorreZuk can you take a look or merge it please 😢 ROCm/Tensile#1862 My code wont run without it on rx 6600 xt

@hiepxanh sure I will push to see if it can get reviewed sooner rather than later.

NaturalHate · 2024-02-18T18:11:47Z

Tried to get my 6650 XT to work with llama.ccp by installing rocm-hip-sdk and got the same error after I think it failed to properly build on first launch:

./mistral-7b-instruct-v0.2.Q5_K_M.llamafile -ngl 999
import_cuda_impl: initializing gpu module...
get_rocm_bin_path: note: amdclang++ not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/amdclang++ does not exist
get_rocm_bin_path: note: hipInfo not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/hipInfo does not exist
get_rocm_bin_path: note: /opt/rocm/bin/hipInfo does not exist
llamafile_log_command: /usr/bin/rocminfo
llamafile_log_command: hipcc -O3 -fPIC -shared -DNDEBUG --offload-arch=gfx1032 -march=native -mtune=native -use_fast_math -DGGML_BUILD=1 -DGGML_SHARED=1 -Wno-return-type -Wno-unused-result -DGGML_USE_HIPBLAS -DGGML_CUDA_MMV_Y=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DIGNORE4 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DIGNORE -o /home/*****/.llamafile/ggml-rocm.so.ikigfn /home/*****/.llamafile/ggml-cuda.cu -lhipblas -lrocblas
/home/*****/.llamafile/ggml-cuda.cu:408:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/*****/.llamafile/ggml-cuda.cu:777:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/*****/.llamafile/ggml-cuda.cu:5132:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
    mul_mat_q4_K(
    ^
/home/*****/.llamafile/ggml-cuda.cu:5132:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:5199:1: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
mul_mat_q5_K(
^
/home/*****/.llamafile/ggml-cuda.cu:5199:1: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:5268:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
    mul_mat_q6_K(
    ^
/home/*****/.llamafile/ggml-cuda.cu:5268:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
static __global__ void soft_max_f32(const float * x, const float * y, float * dst, const int ncols_par, const int nrows_y, const float scale) {
                       ^
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
14 warnings generated when compiling for gfx1032.
/home/*****/.llamafile/ggml-cuda.cu:408:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/*****/.llamafile/ggml-cuda.cu:777:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
2 warnings generated when compiling for host.
link_cuda_dso: note: dynamically linking /home/*****/.llamafile/ggml-rocm.so
ggml_cuda_link: welcome to ROCm SDK with hipBLAS
link_cuda_dso: GPU support linked

rocBLAS error: Cannot read /opt/rocm-5.6.1/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

Launching through the gpu again just gives me the last error now.

cgmb · 2024-02-19T22:58:11Z

@NaturalHate, build for gfx1030 and run with export HSA_OVERRIDE_GFX_VERSION=10.3.0 set in your environment.

NaturalHate · 2024-02-20T10:55:59Z

@NaturalHate, build for gfx1030 and run with export HSA_OVERRIDE_GFX_VERSION=10.3.0 set in your environment.

If i have to build it myself then I guess I'll pass.

hiepxanh · 2024-02-20T12:39:23Z

No i can send you if you use rx6600 there is a lot of people already build it. Just copy pate and it run

NaturalHate · 2024-02-20T14:24:03Z

I don't. I use a 6650 XT.

wayneyaoo · 2024-02-27T14:41:06Z

@NaturalHate, build for gfx1030 and run with export HSA_OVERRIDE_GFX_VERSION=10.3.0 set in your environment.

@hiepxanh Hey taking my moment to thank you:) I use rx6600 XT and the environment variable saved me!

@NaturalHate I'm not expert on those hardware stuff but from your error message the architecture is gfx1032. Even if you use 6650 XT and I'm using 6600 XT, they might share the same "series" from software perspective. Maybe that works... Doesn't hurt to try right?

hiepxanh · 2024-02-28T03:56:58Z

@NaturalHate LostRuins/koboldcpp#441
gfx1032_none_lazy.zip

He gave me this file on koboldcpp, it work, you can try it since it the same 1032 platform.
AMD should embeding it since it just 1,8mb :(

@wayneyaoo you are welcome, I digging a lot and I think I should save others time, this issue is really frustrated

cgmb mentioned this issue Aug 17, 2023

ROCm Port ggerganov/llama.cpp#1087

Merged

daineAMD mentioned this issue Sep 14, 2023

[Bug]: Failed to build rocblas-5.6.0 with Tensile from source #1356

Closed

GZGavinZhao mentioned this issue Nov 19, 2023

AMD GPU & ROCm support ollama/ollama#738

Closed

TorreZuk self-assigned this Dec 18, 2023

hiepxanh mentioned this issue Jan 17, 2024

rocBLAS AMD: Cannot read TensileLibrary.dat because no support GPU (wait PR) Mozilla-Ocho/llamafile#208

Closed

Zambito1 mentioned this issue Mar 4, 2024

Ollama ROCm Docker container crashing on RX 6650 ollama/ollama#2870

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: rocBLAS error: Cannot read TensileLibrary.dat: No such file or directory #1339

[Bug]: rocBLAS error: Cannot read TensileLibrary.dat: No such file or directory #1339

slipperyslipped commented Jul 4, 2023

cgmb commented Jul 5, 2023 •

edited

jasber9999 commented Aug 10, 2023

cgmb commented Aug 14, 2023 •

edited

ulyssesrr commented Aug 17, 2023

cgmb commented Aug 17, 2023 •

edited

ulyssesrr commented Aug 17, 2023 •

edited

2eQTu commented Sep 13, 2023

YellowRoseCx commented Sep 13, 2023

2eQTu commented Sep 14, 2023 •

edited

cgmb commented Sep 14, 2023 •

edited

2eQTu commented Sep 14, 2023

shtirlic commented Nov 26, 2023 •

edited

geekboood commented Dec 22, 2023

hiepxanh commented Jan 19, 2024

TorreZuk commented Jan 19, 2024

NaturalHate commented Feb 18, 2024 •

edited

cgmb commented Feb 19, 2024

NaturalHate commented Feb 20, 2024

hiepxanh commented Feb 20, 2024

NaturalHate commented Feb 20, 2024

wayneyaoo commented Feb 27, 2024

hiepxanh commented Feb 28, 2024 •

edited

[Bug]: rocBLAS error: Cannot read TensileLibrary.dat: No such file or directory #1339

[Bug]: rocBLAS error: Cannot read TensileLibrary.dat: No such file or directory #1339

Comments

slipperyslipped commented Jul 4, 2023

Describe the bug

To Reproduce

Expected behavior

Log-files

Environment

cgmb commented Jul 5, 2023 • edited

jasber9999 commented Aug 10, 2023

cgmb commented Aug 14, 2023 • edited

ulyssesrr commented Aug 17, 2023

cgmb commented Aug 17, 2023 • edited

ulyssesrr commented Aug 17, 2023 • edited

2eQTu commented Sep 13, 2023

YellowRoseCx commented Sep 13, 2023

2eQTu commented Sep 14, 2023 • edited

cgmb commented Sep 14, 2023 • edited

2eQTu commented Sep 14, 2023

shtirlic commented Nov 26, 2023 • edited

geekboood commented Dec 22, 2023

hiepxanh commented Jan 19, 2024

TorreZuk commented Jan 19, 2024

NaturalHate commented Feb 18, 2024 • edited

cgmb commented Feb 19, 2024

NaturalHate commented Feb 20, 2024

hiepxanh commented Feb 20, 2024

NaturalHate commented Feb 20, 2024

wayneyaoo commented Feb 27, 2024

hiepxanh commented Feb 28, 2024 • edited

cgmb commented Jul 5, 2023 •

edited

cgmb commented Aug 14, 2023 •

edited

cgmb commented Aug 17, 2023 •

edited

ulyssesrr commented Aug 17, 2023 •

edited

2eQTu commented Sep 14, 2023 •

edited

cgmb commented Sep 14, 2023 •

edited

shtirlic commented Nov 26, 2023 •

edited

NaturalHate commented Feb 18, 2024 •

edited

hiepxanh commented Feb 28, 2024 •

edited