Skip to content

Can't save large file due to conversion to Int32 #20

@under-Peter

Description

@under-Peter

https://github.com/MikeInnes/BSON.jl/blob/7ed00f318f39cfa90a929927132ec252338745a6/src/write.jl#L32

Hi!
I'm using BSON to save a DataFrame with some custom types. For a large DataFrame (about 3.2 GB) this fails at the line indicated because, I think, 3.2GB is larger than typemax(Int32) (about 2.2GB).

Is BSON unable to save larger files or is the cast to Int32 unnecessary and could also be changed to Int64?

My exact trace is:

 ERROR: LoadError: InexactError: trunc(Int32, 2397876411)
 Stacktrace:
  [1] throw_inexacterror(::Symbol, ::Any, ::Int64) at ./boot.jl:567
  [2] checked_trunc_sint at ./boot.jl:589 [inlined]
  [3] toInt32 at ./boot.jl:626 [inlined]
  [4] Type at ./boot.jl:716 [inlined]
  [5] bson_doc(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Array{Pair{String,_1} where _1,1}) at [...]/.julia/            packages/BSON/kxdIr/src/write.jl:32
  [6] bson_primitive(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Array{Any,1}) at [...]/.julia/packages/BSON/kxdIr/       src/write.jl:37
  [7] bson_pair(::Base.GenericIOBuffer{Array{UInt8,1}}, ::String, ::Array{Any,1}) at [...]/.julia/packages/BSON/        kxdIr/src/write.jl:22
  [8] bson_doc(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Array{Pair{String,_1} where _1,1}) at [...]/.julia/            packages/BSON/kxdIr/src/write.jl:28
  [9] bson_primitive(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Array{Any,1}) at [...]/.julia/packages/BSON/kxdIr/       src/write.jl:37
  [10] bson_pair(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Symbol, ::Array{Any,1}) at [...]/.julia/packages/BSON/       kxdIr/src/write.jl:22
  [11] bson_doc(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Dict{Symbol,Any}) at [...]/.julia/packages/BSON/kxdIr/src/    write.jl:28
  [12] bson_primitive at [...]/.julia/packages/BSON/kxdIr/src/write.jl:36 [inlined]
  [13] bson_pair(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Symbol, ::Dict{Symbol,Any}) at [...]/.julia/packages/        BSON/kxdIr/src/write.jl:22
  [14] bson_doc(::IOStream, ::Dict{Symbol,Any}) at [...]/.julia/packages/BSON/kxdIr/src/write.jl:28
  [15] bson_primitive(::IOStream, ::Dict{Symbol,Any}) at [...]/.julia/packages/BSON/kxdIr/src/write.jl:36
  [16] bson(::IOStream, ::Dict{Symbol,DataFrame}) at [...]/.julia/packages/BSON/kxdIr/src/write.jl:81
  [17] #14 at [...]/.julia/packages/BSON/kxdIr/src/write.jl:83 [inlined]
  [18] #open#294(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(BSON,                  Symbol("##14#15")){Dict{Symbol,DataFrame}}, ::String, ::Vararg{String,N} where N) at ./iostream.jl:369
  [19] open at ./iostream.jl:367 [inlined]
  [20] bson(::String, ::Dict{Symbol,DataFrame}) at [...]/.julia/packages/BSON/kxdIr/src/write.jl:83
  [21] top-level scope at none:0
  [22] include at ./boot.jl:317 [inlined]
  [23] include_relative(::Module, ::String) at ./loading.jl:1044
  [24] include(::Module, ::String) at ./sysimg.jl:29
  [25] exec_options(::Base.JLOptions) at ./client.jl:231
  [26] _start() at ./client.jl:425
 in expression starting [...]/runheadless.jl:67

edit:
So it seems like in general all write and read cast the length of the object to write or read to an Int32.
Maybe that's reasonable and I should just split up my DataFrame instead?
Cheers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions