Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of Parsers.parse(Number, ...) #159

Merged
merged 3 commits into from
Feb 2, 2023

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Feb 1, 2023

This is an inherently type-unstable operation where we don't know exactly what type of thing we're going to parse, even once we get into the typeparser, which isn't usually the case.

This is about 2x faster in the top-level case by switching to a functional style where you pass in an "applicator" function that gets applied to whatever objecdt we end up with. If a user calls the new parsenumber directly, they can get near-native performance with their own applicator.

Perf examples:

Before

julia> @btime Parsers.parse(Number, "9223372036854775808")
  294.762 ns (7 allocations: 240 bytes)
9223372036854775808

julia> @btime Parsers.parse(Number, "1.0")
  370.350 ns (9 allocations: 208 bytes)
1.0

julia> @btime Parsers.parse(Number, "1")
  249.674 ns (4 allocations: 112 bytes)
1

This PR:

julia> @btime Parsers.parse(Number, "1")
  123.481 ns (2 allocations: 48 bytes)
1

julia> @btime Parsers.parse(Number, "1.0")
  218.689 ns (4 allocations: 96 bytes)
1.0

julia> @btime Parsers.parse(Number, "9223372036854775808")
  153.579 ns (3 allocations: 80 bytes)
9223372036854775808

This is an inherently type-unstable operation where we don't know
exactly what type of thing we're going to parse, _even_ once we
get into the `typeparser`, which isn't usually the case.

This is about 2x faster in the top-level case by switching to a functional
style where you pass in an "applicator" function that gets applied to
whatever objecdt we end up with. If a user calls the new `parsenumber`
directly, they can get near-native performance with their own applicator.

Perf examples:

Before
```
julia> @Btime Parsers.parse(Number, "9223372036854775808")
  294.762 ns (7 allocations: 240 bytes)
9223372036854775808

julia> @Btime Parsers.parse(Number, "1.0")
  370.350 ns (9 allocations: 208 bytes)
1.0

julia> @Btime Parsers.parse(Number, "1")
  249.674 ns (4 allocations: 112 bytes)
1
```

This PR:
```
julia> @Btime Parsers.parse(Number, "1")
  123.481 ns (2 allocations: 48 bytes)
1

julia> @Btime Parsers.parse(Number, "1.0")
  218.689 ns (4 allocations: 96 bytes)
1.0

julia> @Btime Parsers.parse(Number, "9223372036854775808")
  153.579 ns (3 allocations: 80 bytes)
9223372036854775808
```
src/floats.jl Outdated
@@ -217,11 +212,20 @@ end
return pos, code, PosLen(pl.pos, pos - pl.pos), x
end

@inbounds function handlef(x, f::F) where {F}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was supposed to be @inline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh man, thank you for catching this! I was investigating a profile the other day and was baffled why there was a dynamic dispatch in this....

@quinnj
Copy link
Member Author

quinnj commented Feb 1, 2023

Hmmmm, something is going wrong here; we're not getting the full float code inlined anymore. Investigating....

src/floats.jl Show resolved Hide resolved
src/ints.jl Outdated
Comment on lines 115 to 121
x, code, pos = parsedigits(Number, source, pos, len, b, code, opts, digits, neg, startpos)
_, code, pos = parsedigits(Number, source, pos, len, b, code, OPTIONS, Int64(0), neg, startpos, f)
if (x === Inf || x === -Inf) && !specialvalue(code)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x doesn't seem to exist when we get to the if statement

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, ideally, we'd know that we got an Inf here so we could try parsing as BigFloat

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the remaining wrinkle; I think we can maybe use our code here w/ a special value to know if Inf was set. Let me play around with it in a bit.

@codecov
Copy link

codecov bot commented Feb 1, 2023

Codecov Report

Base: 87.30% // Head: 86.61% // Decreases project coverage by -0.70% ⚠️

Coverage data is based on head (86faadc) compared to base (c54b411).
Patch coverage: 81.31% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #159      +/-   ##
==========================================
- Coverage   87.30%   86.61%   -0.70%     
==========================================
  Files           9        9              
  Lines        1631     1681      +50     
==========================================
+ Hits         1424     1456      +32     
- Misses        207      225      +18     
Impacted Files Coverage Δ
src/utils.jl 78.50% <22.22%> (-2.17%) ⬇️
src/floats.jl 90.19% <83.87%> (-1.72%) ⬇️
src/Parsers.jl 87.22% <100.00%> (+0.21%) ⬆️
src/ints.jl 87.77% <100.00%> (+0.71%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@quinnj quinnj merged commit a57bdb2 into main Feb 2, 2023
@quinnj quinnj deleted the jq-faster-parse-number branch February 2, 2023 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants