Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to write a xarray.Dataset to a GRIB file. #18

Open
6 tasks done
alexamici opened this issue Sep 16, 2018 · 11 comments
Open
6 tasks done

Add support to write a xarray.Dataset to a GRIB file. #18

alexamici opened this issue Sep 16, 2018 · 11 comments
Assignees
Labels
enhancement New feature or request

Comments

@alexamici
Copy link
Contributor

alexamici commented Sep 16, 2018

Due to the fact that the NetCDF data model is much more free and extensible than the GRIB one, it is not possible to write a generic xarray.Dataset to a GRIB file. The aim for cfgrib is to implement write support for a subset of carefully crafted datasets that fit the GRIB data model.

In particular the only coordinates that we target at the moment are the one returned by opening a GRIB with the cfgrib flavour of cfgrib.open_dataset, namely:

number, time, step, a vertical coordinate (isobaricInhPa, heightAboveGround, surface, etc), and the horizontal coordinates (for example latitude and longitude for a regular_ll grid type).

Note that all passed GRIB_ attributes are used to set keys in the output file, it is left to the user to ensure coherence among them.

Some of the keys are autodetected from the coordinates, namely:

Horizontal coordinates gridTypes:

  • regular: regular_ll and regular_gg
  • not target: projected: lambert, etc (can be controlled with GRIB_ attributes)
  • not target: reduced: reduced_ll and reduced_gg (can be controlled with GRIB_ attributes)

Vertical coordinates typeOfLevel:

  • single level: surface, meanSea, etc.
  • pressure: isobaricInhPa and isobaricInPa
  • other: hybrid

GRIB edition:

  • GRIB2
  • GRIB1
@alexamici alexamici added the enhancement New feature or request label Sep 16, 2018
@alexamici alexamici self-assigned this Sep 16, 2018
@iainrussell
Copy link
Member

iainrussell commented Sep 20, 2018

I get a problem with this GRIB file "User_Guide_Example_Data.grib". It contains 3 vertical levels, each at 2 forecast steps, and when I read it, convert to xarray, then write back to GRIB, all the values are NaN. If I filter just a single step, then it works perfectly - the only differences are those you would expect when converting from GRIB 1 to GRIB 2, e.g. bounding box coords in microdegrees instead of millidegrees, etc.
Here's the code:

import xarray as xr
from cfgrib import xarray_store
from cfgrib import xarray_to_grib
import cfgrib

dst = xarray_store.GribDataStore.from_path("User_Guide_Example_Data.grib")
ds = xr.open_dataset(dst)
print(ds)

cfgrib.to_grib(ds, 'User_Guide_Example_Data_from_cfgrib.grib')#, grib_keys={'centre': 'ecmf'})

And the data is attached.
User_Guide_Example_Data.grib.tar.gz

@alexamici
Copy link
Contributor Author

@iainrussell that's an interesting GRIB file:

$ grib_ls User_Guide_Example_Data.grib 
edition      centre       typeOfLevel  level        dataDate     stepRange    dataType     shortName    packingType  gridType     
1            ecmf         isobaricInhPa  1000         19960428     0            an           t            grid_simple  regular_ll  
1            ecmf         isobaricInhPa  500          19960428     0            an           t            grid_simple  regular_ll  
1            ecmf         isobaricInhPa  100          19960428     0            an           t            grid_simple  regular_ll  
1            ecmf         isobaricInhPa  1000         19960426     48           fc           t            grid_simple  regular_ll  
1            ecmf         isobaricInhPa  500          19960426     48           fc           t            grid_simple  regular_ll  
1            ecmf         isobaricInhPa  100          19960426     48           fc           t            grid_simple  regular_ll  
6 of 6 grib messages in User_Guide_Example_Data.grib

It contains a 48h forecast and the analysis for the date 1996-04-28T12:00:00.

The GRIB file is not a complete hypercube so cfgrib correctly fills the missing fields with np.nan:

>>> import cfgrib
>>> ds = cfgrib.open_dataset('User_Guide_Example_Data.grib/User_Guide_Example_Data.grib')
>>> ds
<xarray.Dataset>
Dimensions:       (air_pressure: 3, latitude: 121, longitude: 240, step: 2, time: 2)
Coordinates:
    number        int64 ...
  * time          (time) datetime64[ns] 1996-04-26T12:00:00 1996-04-28T12:00:00
  * step          (step) timedelta64[ns] 0 days 2 days
  * air_pressure  (air_pressure) float64 1e+03 500.0 100.0
  * latitude      (latitude) float64 90.0 88.5 87.0 85.5 84.0 82.5 81.0 79.5 ...
  * longitude     (longitude) float64 0.0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12.0 ...
    valid_time    (time, step) datetime64[ns] ...
Data variables:
    t             (time, step, air_pressure, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    history:                 GRIB to CDM+CF via cfgrib-0.8.5.1.dev0/ecCodes-2...
>>> ds.t.sel(time='1996-04-26T12:00:00', air_pressure=1000, step=0)
<xarray.DataArray 't' (latitude: 121, longitude: 240)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)
Coordinates:
    number        int64 0
    time          datetime64[ns] 1996-04-26T12:00:00
    step          timedelta64[ns] 00:00:00
    air_pressure  float64 1e+03
  * latitude      (latitude) float64 90.0 88.5 87.0 85.5 84.0 82.5 81.0 79.5 ...
  * longitude     (longitude) float64 0.0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12.0 ...
    valid_time    datetime64[ns] ...
Attributes:
    GRIB_paramId:                             130
    GRIB_shortName:                           t
    ...
    standard_name:                            air_temperature
    long_name:                                Temperature
    units:                                    K

The problem is that np.nan were not handled in cfgrib and ecCodes doesn't support them.

In master now invalid values are set to missingValue before being sent to ecCodes and as an optimisation fields entirely comprised of np.nan are not written to disk at all.

Now the saved GRIB looks like this:

$ grib_ls out.grib 
edition      centre       date         dataType     gridType     stepRange    typeOfLevel  level        shortName    packingType  
2            consensus    19960426     af           regular_ll   48           isobaricInhPa  1000         t            grid_simple 
2            consensus    19960428     af           regular_ll   0            isobaricInhPa  1000         t            grid_simple 
2            consensus    19960426     af           regular_ll   48           isobaricInhPa  500          t            grid_simple 
2            consensus    19960428     af           regular_ll   0            isobaricInhPa  500          t            grid_simple 
2            consensus    19960426     af           regular_ll   48           isobaricInhPa  100          t            grid_simple 
2            consensus    19960428     af           regular_ll   0            isobaricInhPa  100          t            grid_simple 
6 of 6 grib messages in out.grib

@iainrussell
Copy link
Member

@alexamici - thanks for the very quick response! I can confirm that the code in the master branch works perfectly for this file now, and I plotted the fields with metview to check that they look identical (they do)!

@iainrussell
Copy link
Member

Still looking good, I'm now checking more 'awkward' subarea definitions like this one, which plots correctly:
grib-subarea2-compare
I'm using this Metview script to do it quite easily:
compare_grib_plots.py.tar.gz

@alexamici
Copy link
Contributor Author

alexamici commented Nov 12, 2018

Note that right now master can re-write to disk most opened GRIB files, because we use the GRIB_ attributes read in input.

Doing an operation on the resulting xr.Dataset but keeping the attributes may make them incoherent. We apply some check and autodetect the values only in the limited set of cases described in the issue text.

We consider write support in Alpha

@alexamici
Copy link
Contributor Author

Note that with 90c8106 very generic GRIB files can be written when setting the all the GRIB_ keys correctly. See #39 .

@AhmedMIssawi
Copy link

hi, can you help me with this code

import cfgrib
import xarray as xr
import pandas as pd
import sys
import glob
import os

def Grib2files():

folder      = input('Folder PAth:')
val         = input('Variable Name:')
#latitude    = input('latitude:')
longitude   = input('longitude:')

data_conc=[]

files = glob.glob(rf'{folder}/*.grib2')

for file in files:

    data = xr.open_dataset(
        file,
        engine='cfgrib',
        backend_kwargs={'filter_by_keys':{'typeOfLevel': 'hybrid'}})
    
    
    lon  = data.variables['longitude'].values
    lat  = data.variables['latitude'].values
    hyb  = data.variables['hybrid'].values
    time = pd.Timestamp(data.time.values) + pd.to_timedelta(data.step.values,'H')
    
    X = xr.Dataset(
        {'data':data[val].values,
        'lon': lon,
        
        },
        coords = {
                'hyb':hyb,
                'lat':lat,
            },
        )
    
    data_conc.append(X) 
return xr.concat(data_conc, dim=['time'])

data = Grib2files()

I am trying many ways to make this data to 2 d data
hybrid with all latitudes and fixed altitude and slice over time because this data is 1 file per hour and I need to run this for a whole year

@xvxiuwen
Copy link

xvxiuwen commented Jul 8, 2022

I get a problem with this GRIB file "User_Guide_Example_Data.grib". It contains 3 vertical levels, each at 2 forecast steps, and when I read it, convert to xarray, then write back to GRIB, all the values are NaN. If I filter just a single step, then it works perfectly - the only differences are those you would expect when converting from GRIB 1 to GRIB 2, e.g. bounding box coords in microdegrees instead of millidegrees, etc. Here's the code:

import xarray as xr
from cfgrib import xarray_store
from cfgrib import xarray_to_grib
import cfgrib

dst = xarray_store.GribDataStore.from_path("User_Guide_Example_Data.grib")
ds = xr.open_dataset(dst)
print(ds)

cfgrib.to_grib(ds, 'User_Guide_Example_Data_from_cfgrib.grib')#, grib_keys={'centre': 'ecmf'})

And the data is attached. User_Guide_Example_Data.grib.tar.gz

sorry to bother you, but when i run code 'dst = xarray_store.GribDataStore.from_path("User_Guide_Example_Data.grib")',the computer warns there is no attribute 'GribDataStore'. my computer is windows platform and i download xarray and cfgrib in anaconda.

@xvxiuwen
Copy link

xvxiuwen commented Jul 8, 2022

图片

@zxdawn
Copy link

zxdawn commented Dec 8, 2022

Another error:

AttributeError: 'Dataset' object has no attribute 'to_grib'

Update

We should import to_grib using the newest name.

from cfgrib.xarray_to_grib import to_grib
to_grib(ds, 'out.grib')

@dasarkisov
Copy link

dasarkisov commented Jun 4, 2024

Hi! I have an issue with writing grib files. Here I open two grib datasets, sum them up, and write the resulting dataset to grib file; then I open the new grib as dataset.

ds00 = xr.open_dataset('gfs.t00z.pgrb2.0p25.f000', engine='cfgrib')
ds06 = xr.open_dataset('gfs.t06z.pgrb2.0p25.f000', engine='cfgrib')
ds_sum = xr.concat([ds00, ds06], dim='time').sum('time')
to_grib(ds_sum, "ouf.grib")
ds_new = xr.open_dataset('ouf.grib', engine='cfgrib')

During the writing I get the following warning:
FutureWarning: GRIB write support is experimental, DO NOT RELY ON IT! warnings.warn("GRIB write support is experimental, DO NOT RELY ON IT!", FutureWarning)
failed to set key 'endStep:int' to 0
failed to set key 'stepUnits:int' to 1

Here are the three datasets, from left to right: the original one, the resulting one, and the resulting one after writing to grib
image

Here you can see that after the writing:

  • the original coordinate heightAboveGround turns into surface and its value 2.0 - into 0.0.
  • the name of t2m data variable turns into t.

Why do the changes take place during the writing? When I open the newly created grib file with the viewer the temperature represents to be the surface temperature (0 altitude), while originally it is 2m temperature.
Thanks!

UPDATE
Okay, what I've done was adding "heightAboveGround" option in https://github.com/ecmwf/cfgrib/blob/master/cfgrib/xarray_to_grib.py:
image

Now the to_grib() detects the altitude coordinate heightAboveGround and properly writes it to grib file, preserving its value, which is in my case 2.0:
image

The data variable name still gets changed (t2m to t), but that's no big deal. The written grib file now is being properly read via grib viewer, with t data variable represented as 2m-altitude temperature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants