-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Optimize symbol table #13258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize symbol table #13258
Conversation
CC @xokdvium |
Random tidbit: during eval of a NixOS system, we allocate a lot of common small strings:
|
@tomberek How are these strings allocated? Is |
43c3477
to
1fbe558
Compare
Great work! For reference, some ad-hoc measurements really show an amazing improvement in terms of speed as well as memory usage (compared against current master):
|
Can you expand a bit on how well this would be amenable to multi-threaded evaluation? In #10938 I reduced lock contention for symbol table insertions by sharding the mapping from strings to symbol offsets, and made symbol reads lock-free entirely by making sure that symbols never move in memory. |
Looking at the API https://live.boost.org/doc/libs/develop/libs/unordered/doc/html/unordered/reference/unordered_flat_set.html this does look a drop-in replacement. Boost's implementation doesn't use sharding, but does something more clever it seems: https://bannalia.blogspot.com/2023/07/inside-boostconcurrentflatmap.html. Judging by the benchmarks it seems like its performance is very agreeable and I doubt we could hand-roll something more performant. |
I'm not sure whether this is thread safe. There is no guarantee that the thread, which calls Read locks are also quite expensive. Having a max. size of the symbol table seems to be questionable. Also I'm not sure whether reserving so much virtual memory space on a 32 bit system is a good idea. Sharding could be applied to
|
Builds and tests pass on {x86_64,aarch64}-{darwin,linux} locally as well. Seeing a ~8% memory and CPU usage decrease. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
What's more is that this removes a significant number of Values from the GC-managed heap, so this saves not just allocations, but also a bit of time spent in GC.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/a-look-at-nixos-nixpkgs-evaluation-times-over-the-years/65114/9 |
Motivation
The symbol table is an append only data structure and it allocates a lot of small strings. Thus the first optimization is to use a
std::pmr::monotonic_buffer_resource
for string allocations.Then there is no need to store a
std::string_view
in the set because a pointer into theChunkedVector
can be used instead, which frees asize_t
. Furthermore the size ofstd::string
is 32 bytes onx86_64-linux
with libstdc++, but the strings aren't dynamic, i.e. storing a pointer and a size is sufficient.Some
builtins
transform symbols stored in the symbol table, i.e. a value is allocated, which points to a string in the symbol table. The allocation can be avoided, if the value is stored in theChunkedVector
. Onx86_64-linux
the size ofValue
is 24 bytes for storing a pointer to a string, i.e. it adds some overhead.Combining everything into
SymbolValue
32 bytes are stored, while the index into theChunkedVector
is no longer stored in the set, freeing up anotheruint32_t
or 8 bytes due to alignment.The
Value
optimization reduces the max. memory usage fornix search
noticeable, i.e. I measured a 6% reduction (~300 MiB) with this PR and current nixpkgs. A 2%-3% speedup was observed, too.Finally, the symbol table does not require that pointers to set entries are stable, i.e. the cache friendly
boost::unordered_flat_set
can be used, which stores the set entries in-place, so that it has less memory overhead, too.This implementation requires a good hash function, where different bits of the output are distributed uniformly. I don't know whether the hash functions of libstdc++ and libc++ (especially older versions) are good enough, so that I decided to use the hash function provided by boost.
Context
Making the symbol table thread-safe isn't too much effort anymore by using
boost::concurrent_flat_set
instead. The question is, which operations shall be lock free. Makingoperator[]
thread-safe requires some changes ofChunkedVector
.Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.