Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Here's how to compile and run under MINGW64 from Msys2 #23

Open
greggft opened this issue Aug 6, 2023 · 5 comments
Open

Here's how to compile and run under MINGW64 from Msys2 #23

greggft opened this issue Aug 6, 2023 · 5 comments

Comments

@greggft
Copy link

greggft commented Aug 6, 2023

Fixit@DAD MINGW64 ~/LlamaGPTJ-chat
$ mkdir build
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat
$ cd build
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build
$ mkdir models
(myenv)
Fixit@DAD MINGW64 /LlamaGPTJ-chat/build
$ cp ../../MODELS/ggml-vicuna-13b-1.1-q4_2.bin models
(myenv)
Fixit@DAD MINGW64 /LlamaGPTJ-chat/build
$ cmake --fresh .. -DCMAKE_CXX_COMPILER=g++.exe -DCMAKE_C_COMPILER=gcc.exe
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Detecting C compiler ABI info
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /mingw64/bin/gcc.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /mingw64/bin/g++.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
System is unknown to cmake, create:
Platform/MINGW64_NT-10.0-19045 to use this system, please post your config file on discourse.cmake.org so it can be added to cmake
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: unknown
-- Unknown architecture
-- Configuring done (25.8s)
-- Generating done (0.5s)
-- Build files have been written to: /home/Fixit/LlamaGPTJ-chat/build
(myenv)
Fixit@DAD MINGW64 /LlamaGPTJ-chat/build
$ cmake --build . --parallel
[ 8%] Building C object gpt4all-backend/llama.cpp/CMakeFiles/ggml.dir/ggml.c.obj
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_
':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:781:15: warning: unused variabl
e 'nb' [-Wunused-variable]
781 | const int nb = k / QK4_0;
| ^

C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q4_
':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1129:27: warning: unused variab
le 'y' [-Wunused-variable]
1129 | block_q4_1 * restrict y = vy;
| ^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1127:15: warning: unused variab
le 'nb' [-Wunused-variable]
1127 | const int nb = k / QK4_1;
| ^

C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'quantize_row_q8_
':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:1507:15: warning: unused variab
le 'nb' [-Wunused-variable]
1507 | const int nb = k / QK8_1;
| ^

C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw
ard_alibi_f32':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9357:15: warning: unused variab
le 'ne2_ne3' [-Wunused-variable]
9357 | const int ne2_ne3 = n/ne1; // ne2*ne3
| ^~~~~~~
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw
ard_alibi_f16':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9419:15: warning: unused variab
le 'ne2' [-Wunused-variable]
9419 | const int ne2 = src0->ne[2]; // n_head -> this is k
| ^~~
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c: In function 'ggml_compute_forw
ard_alibi':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/ggml.c:9468:5: warning: enumeration va
lue 'GGML_TYPE_Q4_3' not handled in switch [-Wswitch]
9468 | switch (src0->type) {
| ^~~~~~
[ 8%] Built target ggml
[ 16%] Building CXX object gpt4all-backend/llama.cpp/CMakeFiles/llama.dir/llama.cpp.obj
In file included from C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama.cpp:8:
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h: In constructor 'llama_mm
ap::llama_mmap(llama_file*, bool)':
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:233:94: note: '#pragma me
ssage: warning: You are building for pre-Windows 8; prefetch not supported'
233 | #pragma message("warning: You are building for pre-Windows 8; prefetch not supported
")
|
^
C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:201:47: warning: unused p
arameter 'prefetch' [-Wunused-parameter]
201 | llama_mmap(struct llama_file * file, bool prefetch = true) {
| ~~~~~^~~~~~~~~~~~~~~
[ 25%] Linking CXX static library libllama.a
[ 25%] Built target llama
[ 33%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.obj
[ 41%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llmodel_c.cpp.obj
[ 50%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/gptj.cpp.obj
[ 58%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/llamamodel.cpp.obj
[ 66%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/mpt.cpp.obj
[ 75%] Building CXX object gpt4all-backend/CMakeFiles/llmodel.dir/utils.cpp.obj
[ 83%] Linking CXX static library libllmodel.a
/mingw64/bin/ar.exe qc libllmodel.a CMakeFiles/llmodel.dir/gptj.cpp.obj CMakeFiles/llmodel.dir/llamamodel.cpp.obj CMakeFiles/llmodel.dir/llama.cpp/examples/common.cpp.obj CMakeFiles/llmodel.dir/llmodel_c.cpp.obj CMakeFiles/llmodel.dir/mpt.cpp.obj CMakeFiles/llmodel.dir/utils.cpp.obj
/mingw64/bin/ranlib.exe libllmodel.a
[ 83%] Built target llmodel
[ 91%] Building CXX object src/CMakeFiles/chat.dir/chat.cpp.obj
[100%] Linking CXX executable ../bin/chat
[100%] Built target chat
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build
$ bin/chat
LlamaGPTJ-chat (v. 0.3.0)
Your computer supports AVX2
LlamaGPTJ-chat: loading .\models\ggml-vicuna-13b-1.1-q4_2.bin
llama.cpp: loading model from .\models\ggml-vicuna-13b-1.1-q4_2.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 5 (mostly Q4_2)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 73.73 KB
llama_model_load_internal: mem required = 9807.47 MB (+ 1608.00 MB per state)
llama_init_from_file: kv self size = 1600.00 MB
LlamaGPTJ-chat: done loading!

hello
Hello! How can I help you today?
/quit
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build
$

Took 5 minutes to respond with just saying hello

UPDATE #1
$ ./bin/chat -m "models/ggml-gpt4all-j-v1.3-groovy.bin" -t 4
LlamaGPTJ-chat (v. 0.3.0)
Your computer supports AVX2
LlamaGPTJ-chat: loading models/ggml-gpt4all-j-v1.3-groovy.bin
gptj_model_load: loading model from 'models/ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
gptj_model_load: ggml ctx size = 5401.45 MB
gptj_model_load: kv self size = 896.00 MB
gptj_model_load: .............................................. done
gptj_model_load: model size = 3609.38 MB / num tensors = 285
LlamaGPTJ-chat: done loading!

hello
Hi! How can I assist you today?
/quit
(myenv)
Fixit@DAD MINGW64 ~/LlamaGPTJ-chat/build
$
Just over 2 minutes to respond to my hello

@kuvaus
Copy link
Owner

kuvaus commented Aug 6, 2023

Nice! You got it running on Windows 7. Edit: I just noticed pre-Windows 8 so I'm assuming 7.

Looks like you didn't even need the cmake .. -G "MinGW Makefiles" part from README but I guess its because you already had MinGV gcc in (myenv).

The over 2x speed difference between 13B and 7B models is not surprising, but the fact that it takes several minutes is.

If your processor has more threads than 4, you can set -t to be a bigger number. For example with 8 core (16 threads) I would set it to, say -t 14 (its important to leave at least one thread to the OS, otherwise it will slow down a lot). If there's no -t specified, then default is 4.

But, with the way these models work, memory will always be the biggest bottleneck. This is because any large language model is (in a way) one big equation that is evaluated all at once for each token. So the entire model has to be accessible in memory for this evaluation.

For the Vicuna 13B model, looks like this

C:/msys64/home/Fixit/LlamaGPTJ-chat/gpt4all-backend/llama.cpp/llama_util.h:201:47: warning: unused p
arameter 'prefetch' [-Wunused-parameter]
201 | llama_mmap(struct llama_file * file, bool prefetch = true) {

might indicate that mmap is not working. If you want to tinker, you can change line 55 and 59 on gpt4all-backend/llamamodel.cpp from

d_ptr->params.use_mmap   = params.use_mmap;
d_ptr->params.use_mlock  = params.use_mlock;

to

d_ptr->params.use_mmap   = false;
d_ptr->params.use_mlock  = false;

But I'm not sure if it makes any difference. Probably not. The current settings seem to work well for Windows 10 and up.

@greggft
Copy link
Author

greggft commented Aug 6, 2023

Actually I am running windows 10

OS Name Microsoft Windows 10 Pro
Version 10.0.19045 Build 19045
System Manufacturer Hewlett-Packard
System Model HP ZBook 15u G2
Processor Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, 2601 Mhz, 2 Core(s), 4 Logical Processor(s)

It's the most "powerful" piece of computing I own....
Had to switch to my laptop because the OS hard drive just died on my ProxMox server :-(
So all my "test" servers are "dead", luckily they are on a ZFS partition but I got to find a replacement drive now in my box of hard drives....

You can close this out if you wish but I posted this because without specifying gcc and g++ in Msys2 setup the compile fails
Thanks for a great program to test AI's with I appreciate it!

@kuvaus
Copy link
Owner

kuvaus commented Aug 6, 2023

Oh. I misread:

Platform/MINGW64_NT-10.0 does indicate windows 10. Wonder why it said pre-windows 8 on the pragma message. Probably some MinGW thing.

Oh, and set -t 2 or -t 3 so that you get 1 thread free for the OS. That should absolutely speed things up a bit!

You can close this out if you wish but I posted this because without specifying gcc and g++ in Msys2 setup the compile fails Thanks for a great program to test AI's with I appreciate it!

This is great info for others. Better to leave it up! I didn't know it would not compile without setting DCMAKE_CXX_COMPILER and DCMAKE_C_COMPILER.

Thanks a lot for this! :)

@pranitsh
Copy link

pranitsh commented Apr 11, 2024

I came across a --config Release, but it didn't fix the issue with speed (took too long to load the model even).

Found the idea from the below:
gpt4all/gpt4all-bindings/csharp/build_win-mingw.ps1
https://github.com/nomic-ai/gpt4all/blob/1b84a48c47a382dfa432dbf477a7234402a0f76c/gpt4all-bindings/csharp/build_win-mingw.ps1#L4

I'm running

mkdir build
cd build
cmake -G "MinGW Makefiles" .. -DAVX512=ON
cmake --build . --parallel --config Release

I'm not too familiar with CMake. Any suggestions?
Clarification: Is it an issue with missing a flag? I couldn't find the dll's in question later in the .sh file in the link above. Any approaches to try for this problem?

@kuvaus
Copy link
Owner

kuvaus commented Apr 11, 2024

Hi,

Thanks for the link. Interesting.

The project uses static linking which means that the *dll files are in the .exe already. This was because I didnt want users to have worry about copying those dlls and having them at correct paths.

But if you want to build the dll files, then you can set the flag:
cmake -DBUILD_SHARED_LIBS=ON

and you might need to also edit the CMakeLists.txt line 105 to remove the -static references:

set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS}")

If you made the dir mkdir build like you did, then the *dll files should be in build/gpt4all-backend.

I have found that using the gpt4all backend instead of pure llama.cpp is indeed a bit slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants