Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corrupt data for Array{Union{Missing, Float32},1} #163

Closed
anj00 opened this issue Nov 20, 2019 · 4 comments
Closed

corrupt data for Array{Union{Missing, Float32},1} #163

anj00 opened this issue Nov 20, 2019 · 4 comments
Labels

Comments

@anj00
Copy link

anj00 commented Nov 20, 2019

I've boiled down to this basic code

using JLD2, FileIO, Printf

data = Union{Missing, Float32}[80.34f0, 42.17f0, 2.27f0, 20.91f0, 11.11f0, 161.08f0, 18.49f0, 1.55f0, 143.47f0, 2.4f0, 16.36f0, 30.09f0, 11.74f0, 63.28f0, 84.75f0, 14.71f0, 1.3752f0, 8.93f0, 37.2f0, 297.57f0, 69.5f0, 15.04f0, 162.98f0, 21.13f0, missing, 19.24f0, 2.82f0, 4.11f0, 35.42f0, 7.93f0, 2.95f0, 196.25f0, 4.06f0, 3.505f0, 6.27f0, 6.81f0, 4.4f0, 30.42f0, 60.94f0, 20.59f0, 18.09f0, 36.34f0, 171.44f0, 12.82f0, 5.38f0, 1.49f0]

save("data.jld2", Dict("data" => data))
data_jld = load("data.jld2")["data"]
@printf("data size  %5d\n", size(data,1))
@printf("jld2 size  %5d\n", size(data_jld,1))
@printf("data valid %5d\n", size(    data[.!ismissing.(data    ), :], 1))
@printf("jld2 valid %5d\n", size(data_jld[.!ismissing.(data_jld), :], 1))

the data has one missing value and that seems to trigger some bug. The output I am getting is the following: the saved data has the same size as original, but bunch of values are a set to missing

data size     46
jld2 size     46
data valid    45
jld2 valid    37

Note, it only happens with Union{Missing, Float32} without Missing or with Float64 it works as expected.

I run Win10, Julia 1.2.0 and relevant packages versions

  [5789e2e9] FileIO v1.0.7        
  [033835bb] JLD2 v0.1.3

Also in my real code (where similar arrays are columns in a DataFrame) it tends to crash after several times I try to save such types of data in a loop (but I couldn't so far reproduce it a simple example like above). I'll attach the crash in the file, in case it is relevant here
jld2_crash.txt

@alyst alyst mentioned this issue Nov 20, 2019
4 tasks
@anj00
Copy link
Author

anj00 commented Dec 2, 2019

The same issue happens with Int64

using JLD2, FileIO, Printf

data = Union{Missing, Int64}[0,0,missing,missing,0,0,0,0,0,0,missing,missing,missing,0,0,0,0]

save("data.jld2", Dict("data" => data))
data_jld = load("data.jld2")["data"]
@printf("data size  %5d\n", size(data,1))
@printf("jld2 size  %5d\n", size(data_jld,1))
@printf("data valid %5d\n", size(    data[.!ismissing.(data    ), :], 1))
@printf("jld2 valid %5d\n", size(data_jld[.!ismissing.(data_jld), :], 1))

in this case all the data is wiped out

data size     17
jld2 size     17
data valid    12
jld2 valid     0

@anj00
Copy link
Author

anj00 commented Apr 28, 2020

above code is working correctly with

Win10, Julia 1.4.0 and
FileIO v1.2.4
JLD2 v0.1.13

wonder if the bug as a whole is fixed or this is just a coincidence. (have code which deals with this bug for my specific data, would be great to remove it if the bug is gone)

@JonasIsensee
Copy link
Collaborator

I also cannot reproduce the error on linux with
julia 1.4.1
FileIO v1.3.0
JLD2 v0.1.13

@JonasIsensee
Copy link
Collaborator

Tests were added on master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants