You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just tried compiling llama-cpp-python with GGML_BACKEND_DL=ON and GGML_CPU_ALL_VARIANTS=ON to make use of this nice feature with dynamic dispatch to a dynamically loaded backend, which e.g. made it possible to build llama.cpp once but dynamically choose the best backend for the current CPU, i.e. for x86_64 depending on whether certain instructions like AVX2 or AVX512 are available choose the best backend for the current microarchitecture level.
Compiling worked for me so far on Ubuntu 24.04 LTS and when inspecting the wheel I see the backend dynamic libraries like bin/libggml-cpu-x64.so, libggml-cpu-sse42.so, libggml-cpu-haswell.so and so on. So that is good already.
But when loading a model with llama-cpp-python I get this error: llama_model_load_from_file_impl: no backends are loaded. hint: use ggml_backend_load() or ggml_backend_load_all() to load a backend before calling this function but these functions are not exposed yet via the bindings.
I think this would be a really great thing to add. That would make the CPU wheels for llama-cpp-python way better, because it wouldn't be stuck with base x86_64 instructions and could thus be way more performant for cases where the wheel cannot be compiled at installation time.