New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse JSON using simdjson #7249
Conversation
In some usecases, Nix spends a significant amount of time parsing JSON. This library is a lot (at least 10x) faster, and decreases real world eval times 15-25%, mostly when using json-heavy code like nix-pypi-fetcher or haskell.nix.
GC_STRNDUP was calling strlen on the source string, but some strings aren't zero-terminated, resulting in pathological slowdowns. Use strnlen to only scan up to n bytes and call malloc manually.
I'm not really keen on having three JSON implementations in Nix (nlohmann, src/libutil/json.cc and simdjson). But maybe simdjson can replace src/libutil/json.cc?
That's a pretty significant downside, given that we generally want to improve error handling in Nix. Probably for most users, the error messages are more important than being able to deal with gigabytes of JSON. Is this an inherent limitation of simdjson? |
If the only issue is that syntax errors are bad, and it is easy to distinguish syntax errors from other (e.g. internal ones), one could first try with |
Discussed in the Nix team meeting 2023-02-10: Decision: closed. If Complete discussion
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2023-02-10-nix-team-meeting-minutes-31/25438/1 |
Description
This PR changes the json-to-value function to use simdjson, an optimized json library.
It additionally adds an ExprJSON expression in order to generate json thunks, which lazily convert the rest of the document from the simdjson representation to nix Values (allocating nix values turns out to be expensive). These are generated for complex attrset values.
Motivation
This mainly saves time on json-heavy libs like haskell.nix and nix-pypi-fetcher, which can be quite slow. The added laziness helps lot for the real world big-attrset-of-pkg-descriptions json files.
Downsides
Current state
Benchmarks
Real world performance difference (~30%)
This PR:
Nix 2.11.0:
Artificial testcase (strict) (~35%)
testcase.nix
This PR
Nix 2.11.0
Artificial testcase (lazy) (~95%)
testcase.nix
This PR
Nix 2.11.0
hackage.nix
testcase2.nix