Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use serde-transcode to optimize JSON formatting #362

Merged
merged 2 commits into from
Apr 12, 2024

Conversation

blyxxyz
Copy link
Collaborator

@blyxxyz blyxxyz commented Apr 12, 2024

Followup to #361 (which I meant to look at but forgot).

Instead of parsing the JSON input into a serde::Value we can use serde-transcode to serialize straight from the input to the output. On an articial very large JSON file I use for testing this ends up five times faster. (Maybe slightly faster than jsonxf was?) I got this trick from jsonxf's serde benchmark.

(The rest of the PR looked great. Thanks, @zuisong!)

I also included a tiny theoretical bugfix for decompression that I wrote a while ago but didn't think was worth a PR, see the explanation in 398d567

A read operation might get an "interrupted" error, in which case the
correct behavior is (usually) to try again as if nothing happened.

When we read a compressed stream we check whether the underlying
reader received an error. But we should only check whether it received
an error for the latest read, so that we can ignore these interrupts
properly.

This is the only place I noticed where interrupts were handled
improperly.

AFAIK this can't happen in reality because we don't install signal
handlers, but it's good practice.
Instead of parsing the JSON input into a complicated heap value before
writing it out we can write it out as we parse it. On a ridiculously
large input this gives me a 5× speedup.
Copy link
Owner

@ducaale ducaale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved 🎉

@ducaale ducaale merged commit 14af5b2 into ducaale:master Apr 12, 2024
9 checks passed
@blyxxyz blyxxyz deleted the optimize-serde-json branch April 12, 2024 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants