Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of JLD vs JLD2 #75

Closed
ilopezgp opened this issue Aug 6, 2020 · 5 comments
Closed

Use of JLD vs JLD2 #75

ilopezgp opened this issue Aug 6, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@ilopezgp
Copy link
Contributor

ilopezgp commented Aug 6, 2020

From the JLD2 Github repository:

Please use caution. If your tolerance for data loss is low, JLD may be a better choice at this time.

The JLD package seems to provide data structures that are more compatible with HDF5 readers, which may be used in other programming languages (e.g. Python). I propose changing the dependency of the package from JLD2 to JLD.

@ilopezgp ilopezgp added the enhancement New feature or request label Aug 6, 2020
@ali-ramadhan
Copy link
Member

While JLD2.jl is indeed not being actively developed anymore, we ended up switching from JLD to JLD2 in Oceananigans.jl a long time ago. JLD2 was much faster for us (probably because it's written in pure Julia) and I was able to read JLD2 files in Python without a problem (by treating them as HDF5 with the h5py package).

That said, we mostly save 3D arrays, parameters, and metadata. So not sure if JLD2 would be the better alternative for CalibrateEmulateSample.jl.

@ilopezgp
Copy link
Contributor Author

ilopezgp commented Aug 6, 2020

If you have had no problems with it, I suppose the authors of JLD2 might just be overly cautious at the moment. I have actually just seen that the last PR for JLD2 is from @kpamnany . Maybe he can share his thoughts on this?

@kpamnany
Copy link

kpamnany commented Aug 6, 2020

Can't speak to compatibility with Python. Otherwise what Ali said. I tried JLD first and ran into some build hell. JLD2 works well and the code is pretty clean. The package also just got a new maintainer I think.

It may not be bulletproof but it works fine. ClimateMachine checkpoints are written in JLD2.

@ilopezgp
Copy link
Contributor Author

ilopezgp commented Aug 6, 2020

If it is good enough for ClimateMachine it should be good for CalibrateEmulateSample. Since Ali says that he has had no issue importing files to Python using h5py, I will close this issue.

@ilopezgp ilopezgp closed this as completed Aug 6, 2020
@glwagner
Copy link
Member

glwagner commented Aug 7, 2020

This is closed but I just wanted to comment on the context behind: "Please use caution. If your tolerance for data loss is low, JLD may be a better choice at this time." A "low tolerance" is relative. If you work in, for example, experimental neuroscience, and you can afford a couple of experiments a year that involve brain surgery on subjects, your tolerance for data loss is very low indeed.

Our tolerance for data loss is often somewhat higher than that. If we are running simulations that absolutely cannot be reproduced for whatever reason, I think it may indeed make sense to take some care and plan to backup our data with something like NetCDF. In my experience, I did lose some data saved with JLD2 years ago, but it has not happened for a long time (I think because issues are gradually being fixed, albeit slowly).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants