Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to read file with big integers (Int128) #769

Closed
DrChainsaw opened this issue Nov 5, 2020 · 1 comment · Fixed by JuliaData/Parsers.jl#71
Closed

Failure to read file with big integers (Int128) #769

DrChainsaw opened this issue Nov 5, 2020 · 1 comment · Fixed by JuliaData/Parsers.jl#71

Comments

@DrChainsaw
Copy link

This seems to depend on the number of columns and rows. Is it some type-fitting heuristic which fails?

julia> using DataFrames, CSV

julia> CSV.write("test.csv", repeat(DataFrame(a=[typemax(Int64) + Int128(1)],b=[0],c=[0],d=[0],e=[0],f=[0],g=[0],h=[0],i=[0]),500)) |> CSV.File |> DataFrame
ERROR: InexactError: check_top_bit(Int64, 9223372036854775808)
Stacktrace:
 [1] throw_inexacterror(::Symbol, ::Type{Int64}, ::UInt64) at .\boot.jl:558
 [2] check_top_bit at .\boot.jl:572 [inlined]
 [3] toInt64 at .\boot.jl:633 [inlined]
 [4] Int64 at .\boot.jl:708 [inlined]
 [5] typeparser at ...\.julia\packages\Parsers\dOIit\src\floats.jl:160 [inlined]
 [6] typeparser at ...\.julia\packages\Parsers\dOIit\src\floats.jl:19 [inlined]
 [7] xparse at ...\.julia\packages\Parsers\dOIit\src\Parsers.jl:254 [inlined]
 [8] detect(::Array{UInt8,1}, ::Int64, ::Int64, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}) at ...\.julia\packages\CSV\MKemC\src\utils.jl:359
 [9] findrowstarts!(::Array{UInt8,1}, ::Int64, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}, ::Array{Int64,1}, ::Int64, ::Array{Type,1}, ::Array{UInt8,1}) at ...\.julia\packages\CSV\MKemC\src\detection.jl:337
 [10] multithreadparse(::Array{Type,1}, ::Array{UInt8,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}, ::Nothing, ::Int64, ::Int64, ::Float64, ::Array{CSV.RefPool,1}, ::Int64, ::Dict{Type,Type}, ::Bool, ::Type{T} where T, ::Nothing, ::Int64, ::Bool) at ...\.julia\packages\CSV\MKemC\src\file.jl:420
 [11] CSV.File(::CSV.Header{false,Parsers.Options{false,false,true,false,Missing,UInt8,Nothing},Array{UInt8,1}}; startingbyteposition::Nothing, endingbyteposition::Nothing, limit::Nothing, threaded::Nothing, typemap::Dict{Type,Type}, tasks::Int64, debug::Bool) at ...\.julia\packages\CSV\MKemC\src\file.jl:258
 [12] CSV.File(::String; header::Int64, normalizenames::Bool, datarow::Int64, skipto::Nothing, footerskip::Int64, transpose::Bool, comment::Nothing, use_mmap::Nothing, ignoreemptylines::Bool, select::Nothing, drop::Nothing, missingstrings::Array{String,1}, missingstring::String, delim::Nothing, ignorerepeated::Bool, quotechar::Char, openquotechar::Nothing, closequotechar::Nothing, escapechar::Char, dateformat::Nothing, dateformats::Nothing, decimal::UInt8, truestrings::Array{String,1}, falsestrings::Array{String,1}, type::Nothing, types::Nothing, typemap::Dict{Type,Type}, categorical::Nothing, pool::Float64, lazystrings::Bool, strict::Bool, silencewarnings::Bool, debug::Bool, parsingdebug::Bool, kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ...\.julia\packages\CSV\MKemC\src\file.jl:217
 [13] CSV.File(::String) at ...\.julia\packages\CSV\MKemC\src\file.jl:216
 [14] |>(::String, ::Type{T} where T) at .\operators.jl:834
 [15] top-level scope at REPL[45]:1
quinnj added a commit to JuliaData/Parsers.jl that referenced this issue Nov 10, 2020
Fixes JuliaData/CSV.jl#769. The issue here is
our `digits` variable for tracking all digits we've parsed in decimal
form is an unsigned integer, but might be converted to a signed integer
in cases where there's no decimal point or a decimal point with no
trailing digits. In those cases, there's an edge case of overflow
because the overflow check was on the _unsigned_ integer instead of the
_signed_ integer, which led to the conversion error. The fix is to
instead change the overflow check to be on the _signed_ integer.
@quinnj
Copy link
Member

quinnj commented Nov 10, 2020

Thanks for the report @DrChainsaw! A fix is up: JuliaData/Parsers.jl#71

quinnj added a commit to JuliaData/Parsers.jl that referenced this issue Nov 10, 2020
* Fix signed integer overflow case for float parsing

Fixes JuliaData/CSV.jl#769. The issue here is
our `digits` variable for tracking all digits we've parsed in decimal
form is an unsigned integer, but might be converted to a signed integer
in cases where there's no decimal point or a decimal point with no
trailing digits. In those cases, there's an edge case of overflow
because the overflow check was on the _unsigned_ integer instead of the
_signed_ integer, which led to the conversion error. The fix is to
instead change the overflow check to be on the _signed_ integer.

* simplify
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants