Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to write a `xarray.Dataset` to a GRIB file. #18

Open
alexamici opened this Issue Sep 16, 2018 · 6 comments

Comments

Projects
None yet
2 participants
@alexamici
Copy link
Collaborator

alexamici commented Sep 16, 2018

Due to the fact that the NetCDF data model is much more free and extensible than the GRIB one, it is not possible to write a generic xarray.Dataset to a GRIB file. The aim for cfgrib is to implement write support for a subset of carefully crafted datasets that fit the GRIB data model.

In particular the only coordinates that we target at the moment are the one returned by opening a GRIB with the cfgrib flavour of cfgrib.open_dataset, namely:

number, time, step, a vertical coordinate (isobaricInhPa, heightAboveGround, surface, etc), and the horizontal coordinates (for example latitude and longitude for a regular_ll grid type).

Note that all passed GRIB_ attributes are used to set keys in the output file, it is left to the user to ensure coherence among them.

Some of the keys are autodetected from the coordinates, namely:

Horizontal coordinates gridTypes:

  • regular: regular_ll and regular_gg
  • not target: projected: lambert, etc (can be controlled with GRIB_ attributes)
  • not target: reduced: reduced_ll and reduced_gg (can be controlled with GRIB_ attributes)

Vertical coordinates typeOfLevel:

  • single level: surface, meanSea, etc.
  • pressure: isobaricInhPa and isobaricInPa
  • other: hybrid

GRIB edition:

  • GRIB2
  • GRIB1

@alexamici alexamici self-assigned this Sep 16, 2018

@iainrussell

This comment has been minimized.

Copy link
Member

iainrussell commented Sep 20, 2018

I get a problem with this GRIB file "User_Guide_Example_Data.grib". It contains 3 vertical levels, each at 2 forecast steps, and when I read it, convert to xarray, then write back to GRIB, all the values are NaN. If I filter just a single step, then it works perfectly - the only differences are those you would expect when converting from GRIB 1 to GRIB 2, e.g. bounding box coords in microdegrees instead of millidegrees, etc.
Here's the code:

import xarray as xr
from cfgrib import xarray_store
from cfgrib import xarray_to_grib
import cfgrib

dst = xarray_store.GribDataStore.from_path("User_Guide_Example_Data.grib")
ds = xr.open_dataset(dst)
print(ds)

cfgrib.to_grib(ds, 'User_Guide_Example_Data_from_cfgrib.grib')#, grib_keys={'centre': 'ecmf'})

And the data is attached.
User_Guide_Example_Data.grib.tar.gz

@alexamici

This comment has been minimized.

Copy link
Collaborator Author

alexamici commented Sep 21, 2018

@iainrussell that's an interesting GRIB file:

$ grib_ls User_Guide_Example_Data.grib 
edition      centre       typeOfLevel  level        dataDate     stepRange    dataType     shortName    packingType  gridType     
1            ecmf         isobaricInhPa  1000         19960428     0            an           t            grid_simple  regular_ll  
1            ecmf         isobaricInhPa  500          19960428     0            an           t            grid_simple  regular_ll  
1            ecmf         isobaricInhPa  100          19960428     0            an           t            grid_simple  regular_ll  
1            ecmf         isobaricInhPa  1000         19960426     48           fc           t            grid_simple  regular_ll  
1            ecmf         isobaricInhPa  500          19960426     48           fc           t            grid_simple  regular_ll  
1            ecmf         isobaricInhPa  100          19960426     48           fc           t            grid_simple  regular_ll  
6 of 6 grib messages in User_Guide_Example_Data.grib

It contains a 48h forecast and the analysis for the date 1996-04-28T12:00:00.

The GRIB file is not a complete hypercube so cfgrib correctly fills the missing fields with np.nan:

>>> import cfgrib
>>> ds = cfgrib.open_dataset('User_Guide_Example_Data.grib/User_Guide_Example_Data.grib')
>>> ds
<xarray.Dataset>
Dimensions:       (air_pressure: 3, latitude: 121, longitude: 240, step: 2, time: 2)
Coordinates:
    number        int64 ...
  * time          (time) datetime64[ns] 1996-04-26T12:00:00 1996-04-28T12:00:00
  * step          (step) timedelta64[ns] 0 days 2 days
  * air_pressure  (air_pressure) float64 1e+03 500.0 100.0
  * latitude      (latitude) float64 90.0 88.5 87.0 85.5 84.0 82.5 81.0 79.5 ...
  * longitude     (longitude) float64 0.0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12.0 ...
    valid_time    (time, step) datetime64[ns] ...
Data variables:
    t             (time, step, air_pressure, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    history:                 GRIB to CDM+CF via cfgrib-0.8.5.1.dev0/ecCodes-2...
>>> ds.t.sel(time='1996-04-26T12:00:00', air_pressure=1000, step=0)
<xarray.DataArray 't' (latitude: 121, longitude: 240)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
    number        int64 0
    time          datetime64[ns] 1996-04-26T12:00:00
    step          timedelta64[ns] 00:00:00
    air_pressure  float64 1e+03
  * latitude      (latitude) float64 90.0 88.5 87.0 85.5 84.0 82.5 81.0 79.5 ...
  * longitude     (longitude) float64 0.0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12.0 ...
    valid_time    datetime64[ns] ...
Attributes:
    GRIB_paramId:                             130
    GRIB_shortName:                           t
    ...
    standard_name:                            air_temperature
    long_name:                                Temperature
    units:                                    K

The problem is that np.nan were not handled in cfgrib and ecCodes doesn't support them.

In master now invalid values are set to missingValue before being sent to ecCodes and as an optimisation fields entirely comprised of np.nan are not written to disk at all.

Now the saved GRIB looks like this:

$ grib_ls out.grib 
edition      centre       date         dataType     gridType     stepRange    typeOfLevel  level        shortName    packingType  
2            consensus    19960426     af           regular_ll   48           isobaricInhPa  1000         t            grid_simple 
2            consensus    19960428     af           regular_ll   0            isobaricInhPa  1000         t            grid_simple 
2            consensus    19960426     af           regular_ll   48           isobaricInhPa  500          t            grid_simple 
2            consensus    19960428     af           regular_ll   0            isobaricInhPa  500          t            grid_simple 
2            consensus    19960426     af           regular_ll   48           isobaricInhPa  100          t            grid_simple 
2            consensus    19960428     af           regular_ll   0            isobaricInhPa  100          t            grid_simple 
6 of 6 grib messages in out.grib
@iainrussell

This comment has been minimized.

Copy link
Member

iainrussell commented Sep 21, 2018

@alexamici - thanks for the very quick response! I can confirm that the code in the master branch works perfectly for this file now, and I plotted the fields with metview to check that they look identical (they do)!

@iainrussell

This comment has been minimized.

Copy link
Member

iainrussell commented Sep 24, 2018

Still looking good, I'm now checking more 'awkward' subarea definitions like this one, which plots correctly:
grib-subarea2-compare
I'm using this Metview script to do it quite easily:
compare_grib_plots.py.tar.gz

@alexamici

This comment has been minimized.

Copy link
Collaborator Author

alexamici commented Nov 12, 2018

Note that right now master can re-write to disk most opened GRIB files, because we use the GRIB_ attributes read in input.

Doing an operation on the resulting xr.Dataset but keeping the attributes may make them incoherent. We apply some check and autodetect the values only in the limited set of cases described in the issue text.

We consider write support in Alpha

@alexamici

This comment has been minimized.

Copy link
Collaborator Author

alexamici commented Jan 1, 2019

Note that with 90c8106 very generic GRIB files can be written when setting the all the GRIB_ keys correctly. See #39 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.