Skip to content

Support for NUMA #571

@rankaiyx

Description

@rankaiyx

The numa feature of llama.cpp does not seem to be supported, resulting in significant performance degradation on servers with multiple numa nodes.

Additional Context
NUMA support
--numa: Attempt optimizations that help on some systems with non-uniform memory access. This currently consists of pinning an equal proportion of the threads to the cores on each NUMA node, and disabling prefetch and readahead for mmap. The latter causes mapped pages to be faulted in on first access instead of all at once, and in combination with pinning threads to NUMA nodes, more of the pages end up on the NUMA node where they are used. Note that if the model is already in the system page cache, for example because of a previous run without this option, this will have little effect unless you drop the page cache first. This can be done by rebooting the system or on Linux by writing '3' to '/proc/sys/vm/drop_caches' as root.
https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md

This downstream project needs it.
oobabooga/text-generation-webui#3444

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions