Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to open multiple GRIB files as a single Stream / Dataset #15

Open
alexamici opened this Issue Aug 30, 2018 · 6 comments

Comments

Projects
None yet
2 participants
@alexamici
Copy link
Collaborator

alexamici commented Aug 30, 2018

At low level we use an explicit file path and file offset in several places.

Note that xr.open_mfdataset handles opening and merging of multiple files without any additional support from the low-level driver so this feature is low priority.

@alexamici alexamici self-assigned this Aug 30, 2018

@alexamici alexamici changed the title Add open_mfdataset support: open multiple GRIB files as a single Dataset Add support to open multiple GRIB files as a single Stream / Dataset / xr.Dataset (open_mfdataset) Aug 30, 2018

@alexamici alexamici changed the title Add support to open multiple GRIB files as a single Stream / Dataset / xr.Dataset (open_mfdataset) Add support to open multiple GRIB files as a single Stream / Dataset Sep 11, 2018

@aolt

This comment has been minimized.

Copy link

aolt commented Sep 18, 2018

I was thinking to use cfgrib to convert a lot of grib files into a big xarray and save it all to zarr. I would really benefit of having this feature, because it will save me from the intermediate converting grib files into netcdf to be later processed by xarray.
Any info on when approximately this will be available?

@alexamici

This comment has been minimized.

Copy link
Collaborator Author

alexamici commented Sep 18, 2018

@aolt we intend to prepare a Pull Request to add GRIB support via cfgrib to xarry. If and when this is accepted you will be able to use the xarray.open_mfdataset API directly.

I have no ETA yet, but becoming a first class driver in xarray is one of the main targets of the project.

@alexamici

This comment has been minimized.

Copy link
Collaborator Author

alexamici commented Oct 17, 2018

A cfgrib backend has just been included in xarray:

pydata/xarray#2476

With the upcoming v0.11 you will be able to:

>>> ds = xr.open_mfdataset(['file1.grib', 'file2.grib'], engine='cfgrib', concat_dim='step')
@aolt

This comment has been minimized.

Copy link

aolt commented Nov 9, 2018

Great! It works fine with small files, but I get "Memory Error" on many big files. Is it possible to make it working the same way NetCDF backend works with "lazy" read?

>>> xr.__version__
'0.11.0'

pip list |grep cfgrib
cfgrib           0.9.3.1   

python -m cfgrib selfcheck
Found: ecCodes v2.6.0.
Your system is ready.

python -V
Python 3.7.0
@alexamici

This comment has been minimized.

Copy link
Collaborator Author

alexamici commented Nov 9, 2018

@aolt the theory was that everything was lazy already... but in practice I noticed yesterday a really dumb bug that was loading the whole dataset into memory unconditionally at open 🤦‍♂️

The bug is fixed in version 0.9.4, please upgrade and try again.

I'm currently running a mean on 320Gb of GRIB files on 10 dask.distributed nodes, so I'm confident it's working now :)

@alexamici alexamici removed their assignment Dec 5, 2018

@alexamici

This comment has been minimized.

Copy link
Collaborator Author

alexamici commented Feb 2, 2019

Even if there is some merit in opening several GRIB files as a single cfgrib.Dataset I'm changing this to wontfix as xarray.open_mfdataset is what almost everybody really wants.

@alexamici alexamici added wontfix and removed prioriy - low labels Feb 2, 2019

@alexamici alexamici removed the enhancement label Feb 22, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.