Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix corner case of very large integer parsing that promotes to BigInt #118

Merged
merged 1 commit into from
Jun 13, 2022

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Jun 13, 2022

Fixes JuliaData/CSV.jl#1007. The issue here is there's
actually a git sha "19129370688824811353f0bcee35b917" where the multithreaded
CSV.File case was trying to detect if this value was a float or not. But attempting
to call Parsers.xparse threw an error, which shouldn't happen from the Parsers.xparse
code path (it should just return an error code in the parsing result). The problem
was it parsed the very large integer 19129370688824811353, then f which it treated
as the exponenent character, then 0 as the exponent. When it went to do the scaling
code, it promoted the integer to BigInt, but the _scale BigInt path didn't
handle 0 exponent form. This is easily handled before we promote to BigInt
since we're really just looking to convert the large integer to the desired type.

Fixes JuliaData/CSV.jl#1007. The issue here is there's
actually a git sha "19129370688824811353f0bcee35b917" where the multithreaded
CSV.File case was trying to detect if this value was a float or not. But attempting
to call `Parsers.xparse` threw an error, which shouldn't happen from the Parsers.xparse
code path (it should just return an error code in the parsing result). The problem
was it parsed the very large integer 19129370688824811353, then `f` which it treated
as the exponenent character, then `0` as the exponent. When it went to do the scaling
code, it promoted the integer to `BigInt`, but the `_scale` BigInt path didn't
handle `0` exponent form. This is easily handled before we promote to `BigInt`
since we're really just looking to convert the large integer to the desired type.
@codecov
Copy link

codecov bot commented Jun 13, 2022

Codecov Report

Merging #118 (23cf4eb) into main (f143231) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #118      +/-   ##
==========================================
+ Coverage   87.65%   87.70%   +0.04%     
==========================================
  Files           9        9              
  Lines        2309     2309              
==========================================
+ Hits         2024     2025       +1     
+ Misses        285      284       -1     
Impacted Files Coverage Δ
src/floats.jl 93.27% <100.00%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f143231...23cf4eb. Read the comment docs.

@nickrobinson251
Copy link
Collaborator

CSV.File case was trying to detect if this value was a float or not.

What's the CSV codepath that's hitting this? Is it in CSV.detect when calling xparse(Int, ...) or xparse(Float64, ...)?

@quinnj
Copy link
Member Author

quinnj commented Jun 13, 2022

The latter; CSV.detect when calling xparse(Float64, ...).

@quinnj quinnj merged commit ff2ed00 into main Jun 13, 2022
@quinnj quinnj deleted the jq/csv1007 branch June 13, 2022 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSV.File dies with out-of-bounds error when used in multiple threads
2 participants