Optimize symbol table #13258

NaN-git · 2025-05-24T18:20:38Z

Motivation

The symbol table is an append only data structure and it allocates a lot of small strings. Thus the first optimization is to use a std::pmr::monotonic_buffer_resource for string allocations.

Then there is no need to store a std::string_view in the set because a pointer into the ChunkedVector can be used instead, which frees a size_t. Furthermore the size of std::string is 32 bytes on x86_64-linux with libstdc++, but the strings aren't dynamic, i.e. storing a pointer and a size is sufficient.

Some builtins transform symbols stored in the symbol table, i.e. a value is allocated, which points to a string in the symbol table. The allocation can be avoided, if the value is stored in the ChunkedVector. On x86_64-linux the size of Value is 24 bytes for storing a pointer to a string, i.e. it adds some overhead.
Combining everything into SymbolValue 32 bytes are stored, while the index into the ChunkedVector is no longer stored in the set, freeing up another uint32_t or 8 bytes due to alignment.
The Value optimization reduces the max. memory usage for nix search noticeable, i.e. I measured a 6% reduction (~300 MiB) with this PR and current nixpkgs. A 2%-3% speedup was observed, too.

Finally, the symbol table does not require that pointers to set entries are stable, i.e. the cache friendly boost::unordered_flat_set can be used, which stores the set entries in-place, so that it has less memory overhead, too.
This implementation requires a good hash function, where different bits of the output are distributed uniformly. I don't know whether the hash functions of libstdc++ and libc++ (especially older versions) are good enough, so that I decided to use the hash function provided by boost.

Context

Making the symbol table thread-safe isn't too much effort anymore by using boost::concurrent_flat_set instead. The question is, which operations shall be lock free. Making operator[] thread-safe requires some changes of ChunkedVector.

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

Ericson2314 · 2025-05-25T00:03:05Z

CC @xokdvium

src/libexpr/eval.cc

src/libexpr/include/nix/expr/symbol-table.hh

tomberek · 2025-05-25T21:46:23Z

Random tidbit: during eval of a NixOS system, we allocate a lot of common small strings:

   7561 sym: 1 # 981 = "
   7651 sym: 5 # 591 = users
   8191 sym: 3 # 64 = lib
   9219 sym: 7 # 5307 = systemd
  10940 sym: 6 # 120 = config
  15525 sym: 1 # 28 = 1
  19273 sym: 10 # 5414 = modulePath
  19285 sym: 5 # 171 = order
  19315 sym: 7 # 6171 = control
  19696 sym: 6 # 105 = :anon-
  21718 sym: 8 # 852 = settings
  22510 sym: 8 # 1494 = services
  22585 sym: 5 # 4292 = check
  22668 sym: 12 # 122 = freeformType
  27293 sym: 4 # 3963 = name
  28222 sym: 11 # 2335 = specialArgs
  38879 sym: 0 # 5 =
  40980 sym: 6 # 419 = enable
  45039 sym: 7 # 89 = _module
  53297 sym: 4 # 4926 = args

NaN-git · 2025-05-25T22:13:38Z

@tomberek How are these strings allocated? Is mkString involved or does the PR improve the situation?

src/libexpr/include/nix/expr/symbol-table.hh

xokdvium · 2025-05-27T20:44:07Z

Great work!

For reference, some ad-hoc measurements really show an amazing improvement in terms of speed as well as memory usage (compared against current master):

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`result/bin/nix eval nixpkgs#hello --no-eval-cache --read-only`	348.3 ± 5.1	339.2	356.1	1.00
`nix eval nixpkgs#hello --no-eval-cache --read-only`	365.2 ± 3.9	359.0	370.6	1.05 ± 0.02

$ command time -v result/bin/nix eval nixpkgs#nixosTests.gnome --no-eval-cache --read-only
Maximum resident set size (kbytes): 800468
$ command time -v nix eval nixpkgs#nixosTests.gnome --no-eval-cache --read-only
Maximum resident set size (kbytes): 871412

src/libexpr/include/nix/expr/symbol-table.hh

edolstra · 2025-05-27T21:08:11Z

Can you expand a bit on how well this would be amenable to multi-threaded evaluation? In #10938 I reduced lock contention for symbol table insertions by sharding the mapping from strings to symbol offsets, and made symbol reads lock-free entirely by making sure that symbols never move in memory.

xokdvium · 2025-05-27T21:30:04Z

I reduced lock contention for symbol table insertions by sharding the mapping from strings to symbol offsets, and made symbol reads lock-free entirely by making sure that symbols never move in memory.

Making the symbol table thread-safe isn't too much effort anymore by using boost::concurrent_flat_set instead.

Looking at the API https://live.boost.org/doc/libs/develop/libs/unordered/doc/html/unordered/reference/unordered_flat_set.html this does look a drop-in replacement. Boost's implementation doesn't use sharding, but does something more clever it seems: https://bannalia.blogspot.com/2023/07/inside-boostconcurrentflatmap.html. Judging by the benchmarks it seems like its performance is very agreeable and I doubt we could hand-roll something more performant.

NaN-git · 2025-05-27T22:25:28Z

Can you expand a bit on how well this would be amenable to multi-threaded evaluation? In #10938 I reduced lock contention for symbol table insertions by sharding the mapping from strings to symbol offsets, and made symbol reads lock-free entirely by making sure that symbols never move in memory.

I'm not sure whether this is thread safe. There is no guarantee that the thread, which calls operator[], actually observes the string written to memory. The function reads arena.size, which implies a memory barrier, but the writes in SymbolTable::create happen after updating this atomic variable. There seems to be no other synchronization point.

Read locks are also quite expensive.

Having a max. size of the symbol table seems to be questionable. Also I'm not sure whether reserving so much virtual memory space on a 32 bit system is a good idea.

Sharding could be applied to ChunkedVector, too, or it could be rewritten to be mostly lock free.

boost::concurrent_flat_set itself should avoid contentions and it's SIMD friendly and inserts/lookups are atomic. Then this would be thread safe, if operator[] and SymbolStr(const Key &) are thread safe..

tomberek · 2025-05-29T20:30:10Z

Builds and tests pass on {x86_64,aarch64}-{darwin,linux} locally as well. Seeing a ~8% memory and CPU usage decrease.

roberth

Very nice!

What's more is that this removes a significant number of Values from the GC-managed heap, so this saves not just allocations, but also a bit of time spent in GC.

src/libexpr/include/nix/expr/symbol-table.hh

nixos-discourse · 2025-06-02T18:16:34Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/a-look-at-nixos-nixpkgs-evaluation-times-over-the-years/65114/9

NaN-git requested review from roberth and edolstra as code owners May 24, 2025 18:20

xokdvium reviewed May 25, 2025

View reviewed changes

src/libexpr/eval.cc Outdated Show resolved Hide resolved

src/libexpr/include/nix/expr/symbol-table.hh Show resolved Hide resolved

src/libexpr/include/nix/expr/symbol-table.hh Outdated Show resolved Hide resolved

src/libexpr/include/nix/expr/symbol-table.hh Outdated Show resolved Hide resolved

xokdvium reviewed May 25, 2025

View reviewed changes

src/libexpr/include/nix/expr/symbol-table.hh Outdated Show resolved Hide resolved

NaN-git force-pushed the opt-symbol-table branch from 838c84f to 5ad93a1 Compare May 25, 2025 22:00

NaN-git force-pushed the opt-symbol-table branch from 5ad93a1 to e8912c0 Compare May 25, 2025 23:25

xokdvium reviewed May 26, 2025

View reviewed changes

NaN-git force-pushed the opt-symbol-table branch 2 times, most recently from 43c3477 to 1fbe558 Compare May 27, 2025 20:07

xokdvium reviewed May 27, 2025

View reviewed changes

src/libexpr/include/nix/expr/symbol-table.hh Show resolved Hide resolved

xokdvium approved these changes May 27, 2025

View reviewed changes

roberth reviewed May 30, 2025

View reviewed changes

src/libexpr/include/nix/expr/symbol-table.hh Outdated Show resolved Hide resolved

NaN-git added 2 commits June 1, 2025 19:20

symbol-table: reduce memory usage and use boost::unordered_flat_set

94bbadd

symbol-table: reference entries instead of allocating Values

ed4e512

NaN-git force-pushed the opt-symbol-table branch from 1fbe558 to ed4e512 Compare June 1, 2025 19:01

xokdvium approved these changes Jun 1, 2025

View reviewed changes

xokdvium reviewed Jun 2, 2025

View reviewed changes

src/libexpr/include/nix/expr/symbol-table.hh Show resolved Hide resolved

tomberek approved these changes Jun 7, 2025

View reviewed changes

roberth approved these changes Jun 7, 2025

View reviewed changes

roberth merged commit 1022598 into NixOS:master Jun 7, 2025
12 checks passed

Uh oh!

Optimize symbol table #13258

Optimize symbol table #13258

Uh oh!

Conversation

NaN-git commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Context

Uh oh!

Ericson2314 commented May 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tomberek commented May 25, 2025

Uh oh!

NaN-git commented May 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xokdvium commented May 27, 2025

Uh oh!

Uh oh!

edolstra commented May 27, 2025

Uh oh!

xokdvium commented May 27, 2025

Uh oh!

NaN-git commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tomberek commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roberth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nixos-discourse commented Jun 2, 2025

Uh oh!

Uh oh!

Uh oh!

NaN-git commented May 24, 2025 •

edited

Loading

NaN-git commented May 27, 2025 •

edited

Loading

tomberek commented May 29, 2025 •

edited

Loading