Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typeassert error preventing load of saved jld file after 0.3.4 updates #198

Closed
disleyland opened this issue Jan 4, 2015 · 9 comments
Closed

Comments

@disleyland
Copy link

Hi, my first post.

I just upped to Julie 0.3.4 and updated packages. but when i tried to load my jld file, i got this error:

ERROR: type: jlconvert: in typeassert, expected Dict{Symbol,Int64}, got Dict{Symbol,Union(Real,AbstractArray{Real,1})}

Do i have to downgrade to retrieve data, save as csv, reload, resave, or is there an alternative way to recover data?

@timholy
Copy link
Member

timholy commented Jan 4, 2015

Oh, drat. Not being able to read old .jld files is precisely what we don't want happening.

Let's try to fix the error without forcing you to convert to a different format. Two questions:

  • Is there any more information to that error message? Any more detail you can provide would be extremely helpful; this is probably not yet enough information to go on.
  • If the .jld file is not too big and you don't feel that it has sensitive information, feel free to email it to me and I'll try it directly.

@disleyland
Copy link
Author

HI Tim,

The full message is:

while loading In[2], in expression starting on line 56

in jlconvert at /Users/richard/.julia/v0.3/HDF5/src/jld_types.jl:397
in read_scalar at /Users/richard/.julia/v0.3/HDF5/src/JLD.jl:356
in read at /Users/richard/.julia/v0.3/HDF5/src/JLD.jl:328
in read_ref at /Users/richard/.julia/v0.3/HDF5/src/JLD.jl:470
in jlconvert at /Users/richard/.julia/v0.3/HDF5/src/jld_types.jl:397
in read_scalar at /Users/richard/.julia/v0.3/HDF5/src/JLD.jl:356
in read at /Users/richard/.julia/v0.3/HDF5/src/JLD.jl:328
in read at /Users/richard/.julia/v0.3/HDF5/src/JLD.jl:313
in anonymous at /Users/richard/.julia/v0.3/HDF5/src/JLD.jl:972
in jldopen at /Users/richard/.julia/v0.3/HDF5/src/JLD.jl:234
in load at /Users/richard/.julia/v0.3/HDF5/src/JLD.jl:971

the file size is unfortunately 8 gigabytes…the entire compustat north america financial accounting dataset for all public companies since 1950. these type problems happened before and always occur if i saved a dataframe with complex “any” type in one of the columns… so i have gotten into the routine of preserving the “original” csv file and all the scripts used to transform the dataset as a matter of redundancy protection. it just takes AGES to load the csv file.

but it would be nice to know how to save data from jlds when they cannot be opened due to type errors. :)

Richard

On 4 Jan 2015, at 13:41, Tim Holy notifications@github.com wrote:

Oh, drat. Not being able to read old .jld files is precisely what we don't want happening.

Let's try to fix the error without forcing you to convert to a different format. Two questions:

• Is there any more information to that error message? Any more detail you can provide would be extremely helpful; this is probably not yet enough information to go on.
• If the .jld file is not too big and you don't feel that it has sensitive information, feel free to email it to me and I'll try it directly.

Reply to this email directly or view it on GitHub.

@timholy
Copy link
Member

timholy commented Jan 4, 2015

If you can come up with a small test case (e.g., x = BrokenType(5); @save "test.jld" x; y = load("test.jld")) that reproduces this error, it would greatly increase the odds that I can figure this out.

@simonster
Copy link
Member

I would guess the culprit is this commit in DataFrames, which changes the type of one of the fields in the DataFrame index from Dict{Symbol,Union(Real, AbstractVector{Real})} to Dict{Symbol,Int}.

Maybe we should make that typeassert call convert, since it's an easy thing to do.

@timholy
Copy link
Member

timholy commented Jan 4, 2015

@disleyland, does it fix it to change that line to

out.$(T.names[i]) = convert(T.types[i], read_ref(file, ref))

@disleyland
Copy link
Author

changing the original line:

out.$(T.names[i]) = read_ref(file, ref)::$(T.types[i])

for you new line:

out.$(T.names[i]) = convert(T.types[i], read_ref(file, ref))

produces the error: "ERROR: T not defined”

On 4 Jan 2015, at 17:57, Tim Holy notifications@github.com wrote:

@disleyland, does it fix it to change that line to

out.$(T.names[i]) = convert(T.types[i], read_ref(file, ref))

Reply to this email directly or view it on GitHub.

@timholy
Copy link
Member

timholy commented Jan 4, 2015

Sorry, I wasn't being careful. I should have said

out.$(T.names[i]) = convert($(T.types[i]), read_ref(file, ref))

(I just tested this, and it passes tests on julia 0.3. Tests seem suddenly broken on julia 0.4, but I suspect that's a different issue.)

@disleyland
Copy link
Author

hurray, that worked! thanks tim.

did 0.3.4 break lots of things with data frames? all of the sudden several scripts i’ve used successfully for ages went caput today, unable to handle NAs and there up convert errors

e.g.,

this used to work:

by(df, [:YEAR], df -> if length(dropna(df[:ROIC])).>0 percentile(float(dropna(df[:ROIC])),50) else NA end)

but stopped and forced me to change the NA to a NaN before i could get it working

by(df, [:YEAR], df -> if length(dropna(df[:ROIC])).>0 percentile(float(dropna(df[:ROIC])),50) else NaN end)

this works but then i had to adjust for the NaN :(

richard

On 4 Jan 2015, at 19:08, Tim Holy notifications@github.com wrote:

Sorry, I wasn't being careful. I should have said

out.$(T.names[i]) = convert($(T.types[i]), read_ref(file, ref))

(I just tested this, and it passes tests on julia 0.3. Tests seem suddenly broken on julia 0.4, but I suspect that's a different issue.)


Reply to this email directly or view it on GitHub.

@timholy
Copy link
Member

timholy commented Jan 4, 2015

Great news! We'll implement this in HDF5.

I don't really follow DataFrames, so I can't answer your other questions. @simonster knows a lot more. But I'd recommend posting your questions over at DataFrames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants