-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for JSON Numbers #25
base: master
Are you sure you want to change the base?
Conversation
37390c3
to
58c5e81
Compare
I've updated the PR to do the following:
This should be a much more satisfactory PR right now. The amount of additional code is much smaller, and mostly in unavoidable places. |
Hi @Alexhuszagh - thanks for taking time in getting a stab at this. I think the idea is good (and surely +1 for the pre-RFC - let's see how it will be received...). Before we get into code structure discussions etc though, I've tried running the bench suite a few times for this branch and master, results seems to be fairly consistent. In particular for mesh.txt and canada.txt (the latter probably the most representative of what the most 'common' floats to be parsed would look like), there seems to be a ~7% performance drop which is outside of noise bounds. There might be bias because the master branch code is heavily hand-tuned towards high performance on these benchmarks (on my machine in particular), but, on the other hand, medians outside of IQR signify that it's not just noise. Would you perhaps care to run these a few times on your machine to see how the numbers look like? If it's significant indeed, we should probably investigate what it's caused by? Because the code is logically identical, but the order of things may be slightly changed. master branch:
this branch:
|
@aldanor Definitely. This might also be due to bad inlining. I'll try changing a few inlines to |
Ok this is definitely slower and inling doesn't seem to work. Getting repeated ~7% performance hits with |
Thanks for testing out :) Yea, I've spent lots and lots of time (the majority other that correctness) in this crate trying to speed this up by making inlining right and avoid every single bound check where possible (hence all the weird types in But I think it should be doable since it's essentially the same logic. |
It's a false positive: it's when both functions are used in the same binary, it messes with the inlining thresholds. I've added a feature gate Here are a few sample benchmarks on my end: Note: All the benchmarks are run on x86_64 Linux, with only tmux and Sublime Text open at a time, with the performance governor set. Master
JSON BranchShows a clear 3% performance hit on the median, and much worse in other cases.
JSON Without
|
It might be possible to use |
I got similar results on x86_64 Windows MSVC (slightly more muted, but similar). I'm almost certain this is due to compiler inlining and weird stuff, and not actually going to have any impact on real-world cases. Master
JSON Without
|
Adds support for #17.
This probably needs significant edits prior to merging, but the benchmarks with the primitive tokenizing (it doesn't use an optimized memchr) still are pretty good (it also has to duplicate a lot of work when parsing, so this isn't really that surprising):
I've added tests to ensure the differences in parsing don't lead to correctness issues, as seen here.
The following methods have been added:
In order to share more code,
parse_long_mantissa
has been changed to accept aDecimal
by-value, and the two internal functions have been added:Any feedback would be great, I'm probably going to need to refactor a bit to increase code re-use. The API can therefore be used like this:
If an invalid character is found, it merely stops parsing early: no warning is given for invalid input, but it will not undergo any unsafe behavior. It assumes the input is valid, and the documentation clearly reflects this: