Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: rocBLAS error: Cannot read TensileLibrary.dat: No such file or directory #1339

Open
slipperyslipped opened this issue Jul 4, 2023 · 22 comments
Assignees

Comments

@slipperyslipped
Copy link

Describe the bug

Basically getting some form of this error, either rocBLAS error: Cannot read /opt/rocm-5.4.0/lib/rocblas/library/TensileLibrary.dat: Illegal seek or Cannot read TensileLibrary.dat: No such file or directory

To Reproduce

rocblas-dev (= 2.46.0.50400-72~22.04)

Steps to reproduce the behavior:

  1. Basically followed the guide on AMD Docs to install ROCm along with almost all usecases because I kept having errors with packages missing.
  2. Built wheel for onnxruntime using docker
  3. Trying to run the python application (roop) and getting this error

Expected behavior

No error?

Log-files

Aborted (core dumped)
(roop) hobi@hobi:~/roop$ ~python run.py --execution-provider rocm --execution-threads
Command '~python' not found, did you mean:
  command 'bpython' from deb bpython (0.22.1-2)
  command 'xpython' from deb xpython (0.12.5-1build1)
Try: sudo apt install <deb name>
(roop) hobi@hobi:~/roop$ python run.py --execution-provider rocm --execution-threads
usage: run.py [-h] [-s SOURCE_PATHS] [-t TARGET_PATHS] [-o OUTPUT_PATH] [--frame-processor {face_swapper,face_enhancer} [{face_swapper,face_enhancer} ...]] [--keep-fps] [--keep-audio] [--keep-frames] [--keep-filenames] [--many-faces]
              [--video-encoder {libx264,libx265,libvpx-vp9}] [--video-quality [0-51]] [--max-memory MAX_MEMORY] [--execution-provider {rocm,cpu} [{rocm,cpu} ...]] [--execution-threads EXECUTION_THREADS] [-v]
run.py: error: argument --execution-threads: expected one argument
(roop) hobi@hobi:~/roop$ python run.py --execution-provider rocm --execution-threads 2
[ROOP.CORE] Creating temp resources...
[ROOP.CORE] Extracting frames...
Applied providers: ['ROCMExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'ROCMExecutionProvider': {'tunable_op_tuning_enable': '0', 'do_copy_in_default_stream': '1', 'miopen_conv_exhaustive_search': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'miopen_conv_use_max_workspace': '1', 'gpu_mem_limit': '18446744073709551615', 'tunable_op_enable': '0', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/hobi/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['ROCMExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'ROCMExecutionProvider': {'tunable_op_tuning_enable': '0', 'do_copy_in_default_stream': '1', 'miopen_conv_exhaustive_search': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'miopen_conv_use_max_workspace': '1', 'gpu_mem_limit': '18446744073709551615', 'tunable_op_enable': '0', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/hobi/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['ROCMExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'ROCMExecutionProvider': {'tunable_op_tuning_enable': '0', 'do_copy_in_default_stream': '1', 'miopen_conv_exhaustive_search': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'miopen_conv_use_max_workspace': '1', 'gpu_mem_limit': '18446744073709551615', 'tunable_op_enable': '0', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/hobi/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['ROCMExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'ROCMExecutionProvider': {'tunable_op_tuning_enable': '0', 'do_copy_in_default_stream': '1', 'miopen_conv_exhaustive_search': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'miopen_conv_use_max_workspace': '1', 'gpu_mem_limit': '18446744073709551615', 'tunable_op_enable': '0', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/hobi/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['ROCMExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'ROCMExecutionProvider': {'tunable_op_tuning_enable': '0', 'do_copy_in_default_stream': '1', 'miopen_conv_exhaustive_search': '0', 'arena_extend_strategy': 'kNextPowerOfTwo', 'gpu_external_empty_cache': '0', 'gpu_external_free': '0', 'miopen_conv_use_max_workspace': '1', 'gpu_mem_limit': '18446744073709551615', 'tunable_op_enable': '0', 'gpu_external_alloc': '0', 'device_id': '0'}}
find model: /home/hobi/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)

rocBLAS error: Cannot read /opt/rocm-5.4.0/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

Environment

Hardware description
CPU Rzyen 5 5600
GPU Radeon 6700XT
Software version
rocm-core v5.4.0.50400-72~22.04
rocblas v2.46.0.50400-72~22.04

Make sure that ROCm is correctly installed and to capture detailed environment information run the following command:

printf '=== environment\n' > environment.txt &&
printf '\n\n=== date\n' >> environment.txt && date >> environment.txt &&
printf '\n\n=== Linux Kernel\n' >> environment.txt && uname -a  >> environment.txt &&
printf '\n\n=== rocm-smi' >> environment.txt && rocm-smi  >> environment.txt &&
printf '\n\n' >> environment.txt && hipconfig  >> environment.txt &&
printf '\n\n=== rocminfo\n' >> environment.txt && rocminfo  >> environment.txt &&
printf '\n\n=== lspci VGA\n' >> environment.txt && lspci | grep -i vga >> environment.txt

Getting this error: ```No LSB modules are available.



### Additional context
I am super new to machine learning and I am having a nightmare time of making things work with ROCm. Pretty much at the end of the rope here guys. Any help would be appreciated. Thank you.
@cgmb
Copy link
Contributor

cgmb commented Jul 5, 2023

Hi @slipperyslipped. Your GPU uses the gfx1031 instruction set, but the binaries distributed by AMD are not built for that architecture as it is not officially supported. However, the gfx1030 instruction set is identical to the gfx1031 instruction set in all but name. For this reason, there are ways to get the existing binaries running on your GPU.

As a workaround, I would recommend setting the environment variable export HSA_OVERRIDE_GFX_VERSION=10.3.0. This will cause your GPU to report that it supports the gfx1030 instruction set, which is included in the AMD-provided binaries. I've confirmed that this works correctly with rocBLAS on the RX 6750 XT. I believe this workaround is generally applicable to any discrete RDNA 2 GPUs.

@jasber9999
Copy link

Hi, I was blocked by the same problem
"rocBLAS error: Cannot read /home/bc250/Desktop/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: No such file or directory" when launching stable-diffusion-webui.

I am using a gfx 1013 device. Can I set pytorch not to use rocBlas for this ?

@cgmb
Copy link
Contributor

cgmb commented Aug 14, 2023

I'm not an expert on PyTorch, but the gfx1013 ISA is a superset of the gfx1010 ISA. You can set export HSA_OVERRIDE_GFX_VERSION=10.1.0 and it will probably work. With that said, it is obviously not an officially supported configuration. You may want to build and run the rocBLAS test suite to check that the library functions correctly on your hardware with that workaround.

@ulyssesrr
Copy link

@cgmb gfx1010 produces the same issue:

$ drun --rm rocm/dev-ubuntu-22.04:5.6-complete
root@ftl:/# ls -1 /opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx*
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1101.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx1102.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx803.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat
/opt/rocm/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat

As you see there is no TensileLibrary_lazy_gfx1010.dat in the official container, while rocBLAS does build with gfx1010 enabled, Tensile is not producing the library, see ROCm/Tensile#1757

@cgmb
Copy link
Contributor

cgmb commented Aug 17, 2023

Thanks @ulyssesrr. That's a great analysis of the problem.

It's perhaps worth noting that the OS-provided rocBLAS package on Debian 13 (Testing/Trixie) and the upcoming Ubuntu 23.10 (Mantic Minotaur) builds Tensile with --merge-architectures --no-lazy-library-loading. For users on RDNA 1 hardware, that may be a good option until the problem is fixed in the AMD releases.

The OS-provided package for rocBLAS on Debian/Ubuntu also automatically handles loading code objects for ISAs that are known to be compatible as I'd suggested earlier in this thread. For this reason, the OS-provided package has much wider hardware compatibility than the AMD-provided package on GFX9 and GFX10 hardware.

I have not tested the OS-provided packages on all hardware platforms, but the tests are also packaged in the OS package librocblas0-tests (which entered Debian Unstable today and should migrate to Trixie next week), so you can run the tests on your own system to determine if it will work on your hardware.

Just mentioning it, since that's probably a useful workaround for some people on hardware that is not officially supported. Even folks on other operating systems could potentially spin up a docker container with an Ubuntu or Debian image and apt install librocblas-dev.

@ulyssesrr
Copy link

ulyssesrr commented Aug 17, 2023

@cgmb I forgot to mention that the rocBLAS build script on 5.6.0 seems to have an issue where --merge-architectures and --no-lazy-library-loading have no effect, I stumbled on that when trying the workaround.

The rmake.py script treats the cmake flags Tensile_LAZY_LIBRARY_LOADING and Tensile_SEPARATE_ARCHITECTURES as opt-in.
https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/a5ef7c59507e6601a18539f42a088fe63bffaa5a/rmake.py#L371-L374

However I was getting them enabled by default, thus I had to actually opt-out, I'm guessing it is being done here:
https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/a5ef7c59507e6601a18539f42a088fe63bffaa5a/cmake/build-options.cmake#L75-L76

I didn't debug much, just rolled a patch and went my way(Which I ended not needing as I patched the Tensile issue):
https://github.com/ulyssesrr/docker-rocm-gfx803/blob/main/rocm-xtra-rocblas-builder/patches/deactivated/rocBLAS-fix_cmake_options.patch

As I didn't debug much, I didn't feel confident to open an Issue.

@2eQTu
Copy link

2eQTu commented Sep 13, 2023

FYI seeing what seems to be the same TensileLibrary.dat: Illegal seek issue on Ubuntu 22.04 LTS and Radeon Software for Linux 23.20.

GPU is a 7800 XT.

Stack from running a basic PyTorch example under GDB is shown below. I did have to override gfx version to either 11.0.0 or 11.0.1 for it to see GPU at all but I forget which.

rocBLAS error: Cannot read /home/redacted/venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek

Thread 1 "python3" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) where
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352507392, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fff4e341ccf in rocblas_abort_once() () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#6  0x00007fff4e341c49 in rocblas_abort () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#7  0x00007fff4dc44633 in (anonymous namespace)::TensileHost::initialize(Tensile::hip::SolutionAdapter&, int) ()
   from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#8  0x00007fff4dc33929 in (anonymous namespace)::get_library_and_adapter(std::shared_ptr<Tensile::MasterSolutionLibrary<Tensile::ContractionProblem, Tensile::ContractionSolution> >*, std::shared_ptr<hipDeviceProp_t>*, int) () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#9  0x00007fff4dc46b6c in rocblas_status_ runContractionProblem<float, float, float>(RocblasContractionProblem<float, float, float> const&, rocblas_gemm_algo_, int) ()
   from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
[SNIP]

@YellowRoseCx
Copy link

FYI seeing what seems to be the same TensileLibrary.dat: Illegal seek issue on Ubuntu 22.04 LTS and Radeon Software for Linux 23.20.

GPU is a 7800 XT.

Stack from running a basic PyTorch example under GDB is shown below. I did have to override gfx version to either 11.0.0 or 11.0.1 for it to see GPU at all but I forget which.

rocBLAS error: Cannot read /home/redacted/venv/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek

Thread 1 "python3" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
(gdb) where
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352507392) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352507392, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fff4e341ccf in rocblas_abort_once() () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#6  0x00007fff4e341c49 in rocblas_abort () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#7  0x00007fff4dc44633 in (anonymous namespace)::TensileHost::initialize(Tensile::hip::SolutionAdapter&, int) ()
   from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#8  0x00007fff4dc33929 in (anonymous namespace)::get_library_and_adapter(std::shared_ptr<Tensile::MasterSolutionLibrary<Tensile::ContractionProblem, Tensile::ContractionSolution> >*, std::shared_ptr<hipDeviceProp_t>*, int) () from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
#9  0x00007fff4dc46b6c in rocblas_status_ runContractionProblem<float, float, float>(RocblasContractionProblem<float, float, float> const&, rocblas_gemm_algo_, int) ()
   from /home/redacted/venv/lib/python3.10/site-packages/torch/lib/librocblas.so
[SNIP]

Did you only install the Radeon Software or did you also install ROCm?

@2eQTu
Copy link

2eQTu commented Sep 14, 2023

@YellowRoseCx Yes, rocm was installed. But there were some errors and perhaps there is a version mismatch. I have since reinstalled the whole machine and here is the current state:

Software version
rocm-core 5.7.0.50700-45~22.04
rocblas 3.1.0.50700-45~22.04
uname -r 6.2.0-32-generic
rocminfo [...] gfx1101

Same segfault and stack looks similar.

Here is a basic log of what I tried this time:

python3 -m venv ptroc561-nightly
cd ptroc561-nightly/
source bin/activate
pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/rocm5.6/
python3 -c 'import torch; print(torch.cuda.is_available())'
[TRUE]
git clone https://github.com/pytorch/examples.git
cd examples/mnist
python3 main.py
rocBLAS error: Cannot read /path/to/venv/ptroc561-nightly/lib/python3.10/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek
Aborted (core dumped)

This time I opted for the AMDGPU install flow option in the ROCm install guide. Running the installer from the amdgpu-install_5.6.50601-1_all.deb as specified did not result in a system where rocminfo saw a GPU. A newer amdgpu-install_5.7.50700-1_all.deb file I found on the server seemed to work. But the error is still the same as before. No env overrides needed this time, oddly enough.

Note PyTorch repo is nightly/rocm5.6/. When I tried to substitute nightly/rocm5.7/, it just installed some cuda flavors. I'm attempting to build PyTorch for ROCm from source on bare metal. We'll see how that goes.

@cgmb
Copy link
Contributor

cgmb commented Sep 14, 2023

GPU is a 7800 XT.

Stack from running a basic PyTorch example under GDB is shown below. I did have to override gfx version to either 11.0.0 or 11.0.1 for it to see GPU at all but I forget which.

The RX 7800 XT (Navi 32) is gfx1101. You likely were overriding the gfx version to 11.0.0. However, that is not safe. The gfx1100 ISA has more registers than the gfx1101 ISA and there are other important differences in the ABI too.

With Navi 21/22/23/24, the gfx version override approach more or less worked, despite not being officially supported. Users execute code built for Navi 21 on any of those chips and I don't know of any problems encountered from doing so. The compiler handled each of those ISAs identically. Navi 31/32/33 are not like that. There are known differences between those chips that the compiler is accounting for when it generates code for each architecture.

(This isn't the cause of the specific TensileLibrary.dat error you encountered, but it's a warning that you may encounter other problems even once the Tensile issue is resolved, if you're using that override.)

@2eQTu
Copy link

2eQTu commented Sep 14, 2023

@cgmb Thanks for the ISA incompatibility heads up for Navi 31/32/33. Good to know.

I actually had just started going through the RDNA 3 ISA doc, but did not notice any chip-specific differences called out so far. Is there other documentation I should review, or will there eventually be updates to highlight differences? Since this is off-topic for this issue, is there a better place to follow (or open) an issue wrt to documentation?

@shtirlic
Copy link

shtirlic commented Nov 26, 2023

JFYI I got working stable diffusion automatic with rocm 5.7 working on Phoenix APU (7840u) via setting it to 11.0.0

export HSA_OVERRIDE_GFX_VERSION=11.0.0 

Without this override I got

rocBLAS error: Cannot read /home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary.dat:
 No such file or directory for GPU arch : gfx1103
 List of available TensileLibrary Files :
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1030.dat"
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx1100.dat"
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx900.dat"
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx906.dat"
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx908.dat"
"/home/shtirlic/stable-diffusion-webui/venv/lib/python3.11/site-packages/torch/lib/rocblas/library/TensileLibrary_lazy_gfx90a.dat"

@TorreZuk TorreZuk self-assigned this Dec 18, 2023
@geekboood
Copy link

For other arch such as gfx1103, I think the right way to use it is to generate a new TensileLibrary.dat file to get optimal performance. Do we have a way to trigger this process?

@hiepxanh
Copy link

@TorreZuk can you take a look or merge it please 😢 ROCm/Tensile#1862 My code wont run without it on rx 6600 xt

@TorreZuk
Copy link
Contributor

@TorreZuk can you take a look or merge it please 😢 ROCm/Tensile#1862 My code wont run without it on rx 6600 xt

@hiepxanh sure I will push to see if it can get reviewed sooner rather than later.

@NaturalHate
Copy link

NaturalHate commented Feb 18, 2024

Tried to get my 6650 XT to work with llama.ccp by installing rocm-hip-sdk and got the same error after I think it failed to properly build on first launch:

./mistral-7b-instruct-v0.2.Q5_K_M.llamafile -ngl 999
import_cuda_impl: initializing gpu module...
get_rocm_bin_path: note: amdclang++ not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/amdclang++ does not exist
get_rocm_bin_path: note: hipInfo not found on $PATH
get_rocm_bin_path: note: $HIP_PATH/bin/hipInfo does not exist
get_rocm_bin_path: note: /opt/rocm/bin/hipInfo does not exist
llamafile_log_command: /usr/bin/rocminfo
llamafile_log_command: hipcc -O3 -fPIC -shared -DNDEBUG --offload-arch=gfx1032 -march=native -mtune=native -use_fast_math -DGGML_BUILD=1 -DGGML_SHARED=1 -Wno-return-type -Wno-unused-result -DGGML_USE_HIPBLAS -DGGML_CUDA_MMV_Y=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DIGNORE4 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DIGNORE -o /home/*****/.llamafile/ggml-rocm.so.ikigfn /home/*****/.llamafile/ggml-cuda.cu -lhipblas -lrocblas
/home/*****/.llamafile/ggml-cuda.cu:408:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/*****/.llamafile/ggml-cuda.cu:777:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/*****/.llamafile/ggml-cuda.cu:5132:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
    mul_mat_q4_K(
    ^
/home/*****/.llamafile/ggml-cuda.cu:5132:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:5199:1: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
mul_mat_q5_K(
^
/home/*****/.llamafile/ggml-cuda.cu:5199:1: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:5268:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
    mul_mat_q6_K(
    ^
/home/*****/.llamafile/ggml-cuda.cu:5268:5: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
static __global__ void soft_max_f32(const float * x, const float * y, float * dst, const int ncols_par, const int nrows_y, const float scale) {
                       ^
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
/home/*****/.llamafile/ggml-cuda.cu:6034:24: warning: loop not unrolled: the optimizer was unable to perform the requested transformation; the transformation might be disabled or specified as part of an unsupported transformation ordering [-Wpass-failed=transform-warning]
14 warnings generated when compiling for gfx1032.
/home/*****/.llamafile/ggml-cuda.cu:408:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
/home/*****/.llamafile/ggml-cuda.cu:777:1: warning: function declared 'noreturn' should not return [-Winvalid-noreturn]
}
^
2 warnings generated when compiling for host.
link_cuda_dso: note: dynamically linking /home/*****/.llamafile/ggml-rocm.so
ggml_cuda_link: welcome to ROCm SDK with hipBLAS
link_cuda_dso: GPU support linked

rocBLAS error: Cannot read /opt/rocm-5.6.1/lib/rocblas/library/TensileLibrary.dat: No such file or directory
Aborted (core dumped)

Launching through the gpu again just gives me the last error now.

@cgmb
Copy link
Contributor

cgmb commented Feb 19, 2024

@NaturalHate, build for gfx1030 and run with export HSA_OVERRIDE_GFX_VERSION=10.3.0 set in your environment.

@NaturalHate
Copy link

@NaturalHate, build for gfx1030 and run with export HSA_OVERRIDE_GFX_VERSION=10.3.0 set in your environment.

If i have to build it myself then I guess I'll pass.

@hiepxanh
Copy link

No i can send you if you use rx6600 there is a lot of people already build it. Just copy pate and it run

@NaturalHate
Copy link

I don't. I use a 6650 XT.

@wayneyaoo
Copy link

@NaturalHate, build for gfx1030 and run with export HSA_OVERRIDE_GFX_VERSION=10.3.0 set in your environment.

@hiepxanh Hey taking my moment to thank you:) I use rx6600 XT and the environment variable saved me!

@NaturalHate I'm not expert on those hardware stuff but from your error message the architecture is gfx1032. Even if you use 6650 XT and I'm using 6600 XT, they might share the same "series" from software perspective. Maybe that works... Doesn't hurt to try right?

@hiepxanh
Copy link

hiepxanh commented Feb 28, 2024

@NaturalHate LostRuins/koboldcpp#441
gfx1032_none_lazy.zip

He gave me this file on koboldcpp, it work, you can try it since it the same 1032 platform.
AMD should embeding it since it just 1,8mb :(

@wayneyaoo you are welcome, I digging a lot and I think I should save others time, this issue is really frustrated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests