Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

6700XT/6800M Gfx1031 libraries for compilation of Kobold.cpp #441

Open
harish0201 opened this issue Sep 17, 2023 · 32 comments
Open

6700XT/6800M Gfx1031 libraries for compilation of Kobold.cpp #441

harish0201 opened this issue Sep 17, 2023 · 32 comments

Comments

@harish0201
Copy link

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [x ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [ x] I carefully followed the README.md.
  • [ x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [ x] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Select Topic Area

Product Feedback

Body

gfx1031.zip

Hi! I've compiled the Gfx1031 tensile libraries for use with 6800M, which would also work with 6700XT as well, given that they are the same ISA.

This is based from the comment here: ggerganov#1087 (comment)
based on which I generated the (old) non-lazy merged library format. This alleviates the gibberish issue with using gfx1030. I also had initially copied gfx1030 to gfx1031 (ref here: ggerganov#1087 (comment)), but the resposes from llama.cpp wouldn't make any sense.

Current Behavior

As it stands copying gfx1030 to gfx1031 outputs gibberish at times, the attached libraries should allow non-gibberish, sensible inference.

Environment and Context

I have a laptop with 6800M (gfx1031) running windows 10

@LostRuins
Copy link
Owner

Just pinging @YellowRoseCx as they might find it useful to know.

@kowierczyk
Copy link

it also works when done the same for rx6600 with gfx1032, speaks gibberish too (probably easy to fix)

@Drake-AI
Copy link

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

@kowierczyk
Copy link

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake

@YellowRoseCx
Copy link

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake

Are you CDing to the rocBlas folder before running that?

@Drake-AI
Copy link

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake

You need to run it in x64 native as admin INSIDE the rocBLAS directory

@kowierczyk
Copy link

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake

Are you CDing to the rocBlas folder before running that?

yes im in C:/Windows/System32/rocBLAS is that right?
or is it supposed to be some rocblas inside rocm 5.5 in program files?

@Drake-AI
Copy link

Following this https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/comment/jzmvbfc/?context=3 from step 8 and then copy the libraries or use the make_pyinst_rocm_hybrid_henk_yellow.bat works for me with 6750XT. Without mmq it gives 6ms/token procesed and 40ms/token generated with all layers in the gpu for 13b model using YellowRose fork.

when running cmake --install build\release --prefix "C:\Program Files\AMD\ROCm\5.5" i get: CMake Error: Error processing file: C:/Windows/System32/rocBLAS/build/release/cmake_install.cmake

Are you CDing to the rocBlas folder before running that?

yes im in C:/Windows/System32/rocBLAS is that right? or is it supposed to be some rocblas inside rocm 5.5 in program files?

git clone https://github.com/ROCmSoftwarePlatform/rocBLAS

cd rocBLAS

Inside that directory, where you clone the rocBLAS repo.

@kowierczyk
Copy link

yes thats exactly what i did
because default directory is system32 thats where my rocBLAS folder cloned

@Drake-AI
Copy link

yes thats exactly what i did because default directory is system32 thats where my rocBLAS folder cloned

But at first yo clone the repos just in plain cmd, everytime you open a x64 native tools command prompt you need to go to the original directory where you clone rocBLAS repo the first time. Don't do that in system32, don't do anything in system32.

@kowierczyk
Copy link

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

@Drake-AI
Copy link

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

@kowierczyk
Copy link

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

well now im getting:
[4/250] Running utility command for TENSILE_LIBRARY_TARGET
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat 8b1baf77e970c57e
Error copying file (if different) from "D:\rocBLAS\build\release\Tensile\library\TensileLibrary_Type_CC_Contraction_l_Ailk_Bljk_Cijk_Dijk_fallback.dat" to "D:/rocBLAS/build/release/Tensile/library".
Batch file failed at line 3 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\rocBLAS\rmake.py", line 445, in
main()
File "D:\rocBLAS\rmake.py", line 438, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\rocBLAS\rmake.py", line 406, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 12 all' returned non-zero exit status 1.

@Drake-AI
Copy link

Check all the steps from here https://www.reddit.com/r/LocalLLaMA/comments/16d1hi0/guide_build_llamacpp_on_windows_with_amd_gpus_and/ and set the system enviroment variables.

@YellowRoseCx
Copy link

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

well now im getting:
[4/250] Running utility command for TENSILE_LIBRARY_TARGET
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat 8b1baf77e970c57e
Error copying file (if different) from "D:\rocBLAS\build\release\Tensile\library\TensileLibrary_Type_CC_Contraction_l_Ailk_Bljk_Cijk_Dijk_fallback.dat" to "D:/rocBLAS/build/release/Tensile/library".
Batch file failed at line 3 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\rocBLAS\rmake.py", line 445, in
main()
File "D:\rocBLAS\rmake.py", line 438, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\rocBLAS\rmake.py", line 406, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 12 all' returned non-zero exit status 1.

I've been trying all day, I think there's some GPUs that just don't have proper windows support yet. I've got the gfx1031 to compile when that's the only GPU arch, but as soon as I compile it with others like the gfx1030, it builds successfully But my outputs become garbled nonsense and are wrong. I even tried building them separately then adding the files together to the same folder and it didn't work

@Drake-AI
Copy link

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

well now im getting:
[4/250] Running utility command for TENSILE_LIBRARY_TARGET
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat 8b1baf77e970c57e
Error copying file (if different) from "D:\rocBLAS\build\release\Tensile\library\TensileLibrary_Type_CC_Contraction_l_Ailk_Bljk_Cijk_Dijk_fallback.dat" to "D:/rocBLAS/build/release/Tensile/library".
Batch file failed at line 3 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\rocBLAS\rmake.py", line 445, in
main()
File "D:\rocBLAS\rmake.py", line 438, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\rocBLAS\rmake.py", line 406, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 12 all' returned non-zero exit status 1.

I've been trying all day, I think there's some GPUs that just don't have proper windows support yet. I've got the gfx1031 to compile when that's the only GPU arch, but as soon as I compile it with others like the gfx1030, it builds successfully But my outputs become garbled nonsense and are wrong. I even tried building them separately then adding the files together to the same folder and it didn't work

Mine is gfx1031, 6750xt and is working like a charm, compiled myself your repo and then compile rocBLAS and Tensile with the patch, but rocBLAS and Tensile just for gfx1031. I think you don't need to compile rocBLAS and Tensile for gpus that are supported like 1030, just for unsupported gpus.

@YellowRoseCx
Copy link

YellowRoseCx commented Sep 19, 2023

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

well now im getting:
[4/250] Running utility command for TENSILE_LIBRARY_TARGET
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat 8b1baf77e970c57e
Error copying file (if different) from "D:\rocBLAS\build\release\Tensile\library\TensileLibrary_Type_CC_Contraction_l_Ailk_Bljk_Cijk_Dijk_fallback.dat" to "D:/rocBLAS/build/release/Tensile/library".
Batch file failed at line 3 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\rocBLAS\rmake.py", line 445, in
main()
File "D:\rocBLAS\rmake.py", line 438, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\rocBLAS\rmake.py", line 406, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 12 all' returned non-zero exit status 1.

I've been trying all day, I think there's some GPUs that just don't have proper windows support yet. I've got the gfx1031 to compile when that's the only GPU arch, but as soon as I compile it with others like the gfx1030, it builds successfully But my outputs become garbled nonsense and are wrong. I even tried building them separately then adding the files together to the same folder and it didn't work

Mine is gfx1031, 6750xt and is working like a charm, compiled myself your repo and then compile rocBLAS and Tensile with the patch, but rocBLAS and Tensile just for gfx1031. I think you don't need to compile rocBLAS and Tensile for gpus that are supported like 1030, just for unsupported gpus.

So when you built the library for gfx1031, you dragged those files into the rocblas folder inside the koboldcpp-rocm folder? When it asked you if you wanted to replace files, did you? Because when I replace those files is when it breaks support for me. Files like TensileLibrary_Type_CC_Contraction_l_AlikC_BjlkC_Cijk_Dijk_fallback.dat

@Drake-AI
Copy link

Drake-AI commented Sep 19, 2023

i dont know if i cant explain it or i am just dumb, but i cloned the repo in cmd, the repo cloned into system32 directory because its default, and everytime i open x64 native tools command prompt i cd into rocblas folder that i cloned

Don't do it in system32, do it in c: or in other dir, never do anything in system32. Repeat all the steps cloning repos in c: or wherever you want, but never inside OS dirs.

well now im getting:
[4/250] Running utility command for TENSILE_LIBRARY_TARGET
FAILED: library/src/CMakeFiles/TENSILE_LIBRARY_TARGET.util
library\src\CMakeFiles\TENSILE_LIBRARY_TARGET.dir\utility.bat 8b1baf77e970c57e
Error copying file (if different) from "D:\rocBLAS\build\release\Tensile\library\TensileLibrary_Type_CC_Contraction_l_Ailk_Bljk_Cijk_Dijk_fallback.dat" to "D:/rocBLAS/build/release/Tensile/library".
Batch file failed at line 3 with errorcode 1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\rocBLAS\rmake.py", line 445, in
main()
File "D:\rocBLAS\rmake.py", line 438, in main
if run_cmd(exe, opts):
^^^^^^^^^^^^^^^^^^
File "D:\rocBLAS\rmake.py", line 406, in run_cmd
proc = subprocess.run(program, check=True, stderr=subprocess.STDOUT, shell=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'ninja.exe -j 12 all' returned non-zero exit status 1.

I've been trying all day, I think there's some GPUs that just don't have proper windows support yet. I've got the gfx1031 to compile when that's the only GPU arch, but as soon as I compile it with others like the gfx1030, it builds successfully But my outputs become garbled nonsense and are wrong. I even tried building them separately then adding the files together to the same folder and it didn't work

Mine is gfx1031, 6750xt and is working like a charm, compiled myself your repo and then compile rocBLAS and Tensile with the patch, but rocBLAS and Tensile just for gfx1031. I think you don't need to compile rocBLAS and Tensile for gpus that are supported like 1030, just for unsupported gpus.

So when you built the library for gfx1031, you dragged those files into the rocblas folder inside the koboldcpp-rocm folder? When it asked you if you wanted to replace files, did you? Because when I replace those files is when it breaks support for me. Files like TensileLibrary_Type_CC_Contraction_l_AlikC_BjlkC_Cijk_Dijk_fallback.dat

Yes, all files in C:\Program Files\AMD\ROCm\5.5\bin\rocblas\library inside your repo. Just check you don't have lazy and non-lazy together. Here ggerganov#1087 (comment) explain how to create non lazy, if you use non-lazy you need to remove TensileLibrary_lazy_gfx1031.dat. If you want i can pack the files compiled and send it to you.

Update: I tried to compile for gfx1032 and I can't, it looks like you can't compile for a gpu you don't have.

@harish0201
Copy link
Author

Hi,

Sorry I was being a bit sick in the past few days.

Here are the lazy and non-lazy versions of the libraries (might've gotten the names swapped) @YellowRoseCx
lazy_gfx1031.zip
non_lazy_gfx1031.zip

@Drake-AI It sucks that you aren't able to build it for something that you don't have! But like you said, for compilation, you need to have either one in the folder, not both.

Maybe we can crowd-source the Tensile libraries in a repo? That way people can pull them down whilst they are compiling or even use with other applications, given that ROCm can run on windows?

@Foxlum
Copy link

Foxlum commented Sep 19, 2023

I have a RX6600 and some ability to try and compile those tensile libs, after previously trying to get the WIP MIOpen Windows Port compiled. (Which was absolutely a struggle to even get partway there)

@YellowRoseCx
Copy link

YellowRoseCx commented Sep 19, 2023

Hi,

Sorry I was being a bit sick in the past few days.

Here are the lazy and non-lazy versions of the libraries (might've gotten the names swapped) @YellowRoseCx lazy_gfx1031.zip non_lazy_gfx1031.zip

@Drake-AI It sucks that you aren't able to build it for something that you don't have! But like you said, for compilation, you need to have either one in the folder, not both.

Maybe we can crowd-source the Tensile libraries in a repo? That way people can pull them down whilst they are compiling or even use with other applications, given that ROCm can run on windows?

how did you make that non-lazy version? When I followed that guy's instructions, it enables lazy-library loading and I always got "lazy" files

These are what my results were:

gfx1032;gfx1033;gfx1034;gfx1035 = Compile Errors

gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1011;gfx1012;gfx1030;gfx1031;gfx1100;gfx1101;gfx1102 = Built but not working altogether

for some reason I've only been able to generate lazy files, even with modifying rmake.py to disable lazy-library-loading

and when I make multiple kernels at the same time, like:
>python rmake.py -a gfx1010;gfx1030 --no-lazy-library-loading --no-merge-architectures -t C:\Users\YellowRose\rocmbuild\Tensile
I get bad output like: 1 geprüft everybody everybody nobodyς everybody via everybody knows); surely⊕rrrsquitechunscientific article everybody getsislandingfordays everyoneverybodyettapeople, andr they getrustleaving the websiclose girls or aor the first time?

@harish0201
Copy link
Author

harish0201 commented Sep 20, 2023 via email

@YellowRoseCx
Copy link

You need to cd into the rocblas folder where you're rdeps and rmake files
are, in a x64 Native Tools command prompt as admin and then do:

.\build\release\virtualenv\Scripts\activate.bat
TensileCreateLibrary --architecture YOUR_GPU_ARCHS --code-object-version
default --merge-files --library-format msgpack
.\library\src\blas3\Tensile\Logic\asm_full C:\SomeOutputFolder HIP

You're going to change the GPU arch and SomeOutputFolder per your need.

On Tue, Sep 19, 2023, 6:17 PM YellowRoseCx @.***> wrote:

Hi,

Sorry I was being a bit sick in the past few days.

Here are the lazy and non-lazy versions of the libraries (might've gotten
the names swapped) @YellowRoseCx https://github.com/YellowRoseCx
lazy_gfx1031.zip
https://github.com/LostRuins/koboldcpp/files/12663973/lazy_gfx1031.zip
non_lazy_gfx1031.zip
https://github.com/LostRuins/koboldcpp/files/12663974/non_lazy_gfx1031.zip

@Drake-AI https://github.com/Drake-AI It sucks that you aren't able to
build it for something that you don't have! But like you said, for
compilation, you need to have either one in the folder, not both.

Maybe we can crowd-source the Tensile libraries in a repo? That way people
can pull them down whilst they are compiling or even use with other
applications, given that ROCm can run on windows?

how did you make that non-lazy version? When I followed that guy's
instructions, it enables lazy-library loading and I always got "lazy" files

These are what my results were:

gfx1032;gfx1033;gfx1034;gfx1035 = Compile Errors

gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1011;gfx1012;gfx1030;gfx1031;gfx1100;gfx1101;gfx1102
= Built but not working altogether

for some reason I've only been able to generate lazy files, even with
modifying rmake.py to disable lazy-library-loading


Reply to this email directly, view it on GitHub
#441 (comment),
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ACFMFNYJZUHAU7S53OBEZO3X3IKWVANCNFSM6AAAAAA43MJYQM
.
You are receiving this because you authored the thread.Message ID:
@.***>

I did that and have tried it with lazy and non lazy files. As soon as I try to build kernels for more than 1 GPU, even if it's something like GFX906 and GFX1030 that come with Windows ROCm, my output becomes garbled on a 6800xt

If one of you can try building for multiple kernels you might see what I mean

I have a GFX1030, GFX900, and GFX1010 I can test with

@ghost
Copy link

ghost commented Sep 23, 2023

If I download the current build here and compile it with make LLAMA_HIPBLAS=1 -j4, then I also get garbled output on my RX6650 XT, but with Linux. Sometimes the EOS triggers right at the beginning and it doesn't output anything at all, sometimes it fills the range of the specified max tokens and spams me with one word. Something seems to be broken since the last builds of llama.cpp, at least for me. Strangely enough, when I use your rocm build it still works, older llama versions also work, which I find strange. So this doesn't just seem to affect Windows. Never had any problems with it under Linux before. Also with ROCm 5.7.0, exactly the same issue.

@YellowRoseCx
Copy link

If I download the current build here and compile it with make LLAMA_HIPBLAS=1 -j4, then I also get garbled output on my RX6650 XT, but with Linux. Sometimes the EOS triggers right at the beginning and it doesn't output anything at all, sometimes it fills the range of the specified max tokens and spams me with one word. Something seems to be broken since the last builds of llama.cpp, at least for me. Strangely enough, when I use your rocm build it still works, older llama versions also work, which I find strange. So this doesn't just seem to affect Windows. Never had any problems with it under Linux before. Also with ROCm 5.7.0, exactly the same issue.

try using 2 or 3 less layers than the maximum the model has
Fæth in discord found out that using 33/35 layers for 7b works, and 41/43 layers for 13b. There's apparently some issue with the last 2 extra layers that get added

@harish0201
Copy link
Author

I did that and have tried it with lazy and non lazy files. As soon as I try to build kernels for more than 1 GPU, even if it's something like GFX906 and GFX1030 that come with Windows ROCm, my output becomes garbled on a 6800xt

If one of you can try building for multiple kernels you might see what I mean

I have a GFX1030, GFX900, and GFX1010 I can test with

I did with my laptop's integrated graphics a while ago (it was gfx90c), but ROCm doesn't support APUs and I wasn't able to compile for others.

@mahdiyari
Copy link

Works great on 6700 xt on windows 10!

This is how I did it:

  1. Get koboldcpp_rocm_files.zip
  2. pip install customtkinter
  3. Copy TensileLibrary.dat and Kernels.so-000-gfx1031.hsaco into rocblas\library (files from the original post)
  4. python .\koboldcpp.py

I didn't have to replace any files in the rocblas\library folder. The files added were missing.

@YellowRoseCx Would it be possible to have the .exe build with these files added?

Thanks all 🙌

@YellowRoseCx
Copy link

Works great on 6700 xt on windows 10!

This is how I did it:

  1. Get koboldcpp_rocm_files.zip
  2. pip install customtkinter
  3. Copy TensileLibrary.dat and Kernels.so-000-gfx1031.hsaco into rocblas\library (files from the original post)
  4. python .\koboldcpp.py

I didn't have to replace any files in the rocblas\library folder. The files added were missing.

@YellowRoseCx Would it be possible to have the .exe build with these files added?

Thanks all 🙌

see if this works for you: https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.56.yr1-ROCm

I'm gonna have to test it myself to make sure adding those didnt mess up other cards tho

@mahdiyari
Copy link

see if this works for you: https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.56.yr1-ROCm

I'm gonna have to test it myself to make sure adding those didnt mess up other cards tho

It does work.

@jasyuiop
Copy link

jasyuiop commented Feb 1, 2024

EDIT: I wrote in more detail on this issue; #655

With my rx 6600(gfx1032) I couldn't compile it as "lazy" no matter what I did. ggerganov#1087 (comment) I was able to compile it as "non-lazy merged library" as done here. I get very good results on windows, I couldn't see any difference between the speed I get when using linux.

I downloaded the koboldcpp_rocm_files.zip file from https://github.com/YellowRoseCx/koboldcpp-rocm/releases/tag/v1.56.yr0-ROCm. I put the Kernels.so-000-gfx1032.hsaco and TensileLibrary.dat files in the rocblas/library folder(I also put it under "AMD\ROCm\5.5\bin\rocblas\library")

I am attaching the files I compiled for gfx1032.
gfx1032_none_lazy.zip

EDIT: After some experimentation, I think that Linux probably produces faster results. I won't know until I do extensive testing, of course

@YellowRoseCx
Copy link

Adding them into KoboldCpp-ROCm 1.57.1.yr1, hopefully everything works as intended xD
Thanks!

@harish0201
Copy link
Author

Yay! I'm happy that we are working around hacky ways to get around ROCm's weird limitations!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants