Skip to content

Expose ggml_backend_load() and ggml_backend_load_all() to make use of builds with GGML_BACKEND_DL=ON and GGML_CPU_ALL_VARIANTS=ON #2069

@uwu-420

Description

@uwu-420

I just tried compiling llama-cpp-python with GGML_BACKEND_DL=ON and GGML_CPU_ALL_VARIANTS=ON to make use of this nice feature with dynamic dispatch to a dynamically loaded backend, which e.g. made it possible to build llama.cpp once but dynamically choose the best backend for the current CPU, i.e. for x86_64 depending on whether certain instructions like AVX2 or AVX512 are available choose the best backend for the current microarchitecture level.

Compiling worked for me so far on Ubuntu 24.04 LTS and when inspecting the wheel I see the backend dynamic libraries like bin/libggml-cpu-x64.so, libggml-cpu-sse42.so, libggml-cpu-haswell.so and so on. So that is good already.

But when loading a model with llama-cpp-python I get this error:
llama_model_load_from_file_impl: no backends are loaded. hint: use ggml_backend_load() or ggml_backend_load_all() to load a backend before calling this function but these functions are not exposed yet via the bindings.

I think this would be a really great thing to add. That would make the CPU wheels for llama-cpp-python way better, because it wouldn't be stuck with base x86_64 instructions and could thus be way more performant for cases where the wheel cannot be compiled at installation time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions