Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: multiple values for unique key... on GRIB files with more than one value of a key per variable. #2

Closed
bbonenfant opened this issue Jul 25, 2018 · 24 comments
Assignees
Labels
bug Something isn't working

Comments

@bbonenfant
Copy link

I am able to successfully load the test grib file that was suggested in the README, however when I try to read a grib file such as the one below I get the following error output.

> import cfgrib
> ds = cfgrib.Dataset.frompath('nam.t00z.awip1200.tm00.grib2')
Traceback (most recent call last):
File "<stdin>", line 1, in
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cfgrib/dataset.py", line 482, in frompath
return cls(stream=messages.Stream(path, mode=mode, message_class=CfMessage), **kwargs)
File "<attrs generated init baa5906ed7dcdc8b722f343b3fe827a76110eccb>", line 7, in init
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cfgrib/dataset.py", line 485, in attrs_post_init
dims, vars, attrs = build_dataset_components(**self.dict)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cfgrib/dataset.py", line 457, in build_dataset_components
var_index, encode_parameter, encode_time, encode_geography, encode_vertical,
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cfgrib/dataset.py", line 372, in build_data_var_components
data_var_attrs = enforce_unique_attributes(index, data_var_attrs_keys)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cfgrib/dataset.py", line 150, in enforce_unique_attributes
raise ValueError("multiple values for unique attribute %r: %r" % (key, values))
ValueError: multiple values for unique attribute 'typeOfLevel': ['hybrid', 'cloudBase', 'unknown', 'cloudTop']

I've tried with both grib1 and grib2 file types and it seems the formatting is incorrect for all the files I've tried. Any suggestions?

@alexamici
Copy link
Contributor

@bbonenfant we only support GRIB files with a single typeOfLevel for now. Can you filter all message with a defined typeOfLevel , e.g. cloudBase and then save them to a new GRIB file? That should be simple enough for cfgrib to open it.

@alexamici alexamici changed the title Unable to read a real GRIB file. Unable to read a GRIB file with more than one value of typeOfLevel. Jul 25, 2018
@alexamici
Copy link
Contributor

To be more precise, we support only one typeOfLevel per data variable identified by paramId, so we don't support files that look like this description.

@alexamici
Copy link
Contributor

alexamici commented Jul 25, 2018

@bbonenfant I added in the master branch a filter_by_keys keyword argument to open_dataset so you can perform some basic filtering of the GRIB file before attempting to build the CDM hypercubes. So now working with complex files is cumbersome, but doable.

Some examples:

>>> from cfgrib.xarray_store import open_dataset
>>> open_dataset('nam.t00z.awip1200.tm00.grib2',
...     filter_by_keys={'typeOfLevel': 'cloudBase'})
<xarray.Dataset>
Dimensions:     (x: 614, y: 428)
Coordinates:
    time        datetime64[ns] ...
    step        timedelta64[ns] ...
    cloudBase   int64 ...
    latitude    (y, x) float64 ...
    longitude   (y, x) float64 ...
    valid_time  datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables:
    pres        (y, x) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP 
    GRIB_subCentre:          0
    history:                 GRIB to CDM+CF via cfgrib-0.8.2.dev0/ecCodes-2.7.0
>>> open_dataset('nam.t00z.awip1200.tm00.grib2',
...     filter_by_keys={'typeOfLevel': 'surface', 'stepType': 'instant'})
<xarray.Dataset>
Dimensions:     (x: 614, y: 428)
Coordinates:
    time        datetime64[ns] ...
    step        timedelta64[ns] ...
    surface     int64 ...
    latitude    (y, x) float64 ...
    longitude   (y, x) float64 ...
    valid_time  datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables:
    vis         (y, x) float32 ...
    gust        (y, x) float32 ...
    hindex      (y, x) float32 ...
    sp          (y, x) float32 ...
    orog        (y, x) float32 ...
    t           (y, x) float32 ...
    unknown     (y, x) float32 ...
    sdwe        (y, x) float32 ...
    sde         (y, x) float32 ...
    prate       (y, x) float32 ...
    sr          (y, x) float32 ...
    veg         (y, x) float32 ...
    slt         (y, x) float32 ...
    lsm         (y, x) float32 ...
    ci          (y, x) float32 ...
    al          (y, x) float32 ...
    sst         (y, x) float32 ...
    shtfl       (y, x) float32 ...
    lhtfl       (y, x) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP 
    GRIB_subCentre:          0
    history:                 GRIB to CDM+CF via cfgrib-0.8.2.dev0/ecCodes-2.7.0
>>> open_dataset('nam.t00z.awip1200.tm00.grib2',
...     filter_by_keys={'typeOfLevel': 'isobaricInhPa', 'shortName': 'absv'})
<xarray.Dataset>
Dimensions:       (air_pressure: 5, x: 614, y: 428)
Coordinates:
    time          datetime64[ns] ...
    step          timedelta64[ns] ...
  * air_pressure  (air_pressure) float64 1e+03 850.0 700.0 500.0 250.0
    latitude      (y, x) float64 ...
    longitude     (y, x) float64 ...
    valid_time    datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables:
    absv          (air_pressure, y, x) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP 
    GRIB_subCentre:          0
    history:                 GRIB to CDM+CF via cfgrib-0.8.2.dev0/ecCodes-2.7.0

Not all variables are accessible, yet.

@alexamici
Copy link
Contributor

Other useful filter_by_keys combinations:

>>> open_dataset('nam.t00z.awip1200.tm00.grib2',
...     filter_by_keys={'typeOfLevel': 'heightAboveGround', 'topLevel': 2})
<xarray.Dataset>
Dimensions:            (x: 614, y: 428)
Coordinates:
    time               datetime64[ns] ...
    step               timedelta64[ns] ...
    heightAboveGround  int64 ...
    latitude           (y, x) float64 ...
    longitude          (y, x) float64 ...
    valid_time         datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables:
    t2m                (y, x) float32 ...
    q                  (y, x) float32 ...
    d2m                (y, x) float32 ...
    r2                 (y, x) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP 
    GRIB_subCentre:          0
    history:                 GRIB to CDM+CF via cfgrib-0.8.2.dev0/ecCodes-2.7.0
>>> open_dataset('nam.t00z.awip1200.tm00.grib2',
...     filter_by_keys={'typeOfLevel': 'heightAboveGround', 'topLevel': 10})
<xarray.Dataset>
Dimensions:            (x: 614, y: 428)
Coordinates:
    time               datetime64[ns] ...
    step               timedelta64[ns] ...
    heightAboveGround  int64 ...
    latitude           (y, x) float64 ...
    longitude          (y, x) float64 ...
    valid_time         datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables:
    u10                (y, x) float32 ...
    pt                 (y, x) float32 ...
    q                  (y, x) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP 
    GRIB_subCentre:          0
    history:                 GRIB to CDM+CF via cfgrib-0.8.2.dev0/ecCodes-2.7.0

@bbonenfant
Copy link
Author

Thank you for such a substantial response to my issue. I look forward to seeing how this project progresses.

@alexamici alexamici changed the title Unable to read a GRIB file with more than one value of typeOfLevel. Unable to read a GRIB file with more than one value of a key per variable. Jul 31, 2018
@dazza-codes
Copy link

dazza-codes commented Aug 9, 2018

Thanks, also bumped into this while trying to read a GFS grib2 file, e.g.

import cfgrib
ds = cfgrib.Dataset.frompath('gfs_4_20110807_0000_000.grb2')
# snipped traceback
~/anaconda3/lib/python3.6/site-packages/cfgrib/dataset.py in enforce_unique_attributes(index, attributes_keys)
    113         values = index[key]
    114         if len(values) > 1:
--> 115             raise ValueError("multiple values for unique attribute %r: %r" % (key, values))
    116         if values and values[0] not in ('undef', 'unknown'):
    117             attributes['GRIB_' + key] = values[0]

ValueError: multiple values for unique attribute 'typeOfLevel': ['isobaricInhPa', 'tropopause', 'maxWind', 'isothermZero', 'unknown', 'potentialVorticity']

The work around seems to work, but hits another snag for this particular data example, i.e.

ds = cfgrib.Dataset.frompath('gfs_4_20110807_0000_000.grb2', filter_by_keys={'typeOfLevel': 'isobaricInhPa'})
# snipped traceback
~/anaconda3/lib/python3.6/site-packages/cfgrib/dataset.py in build_dataset_components(stream, encode_parameter, encode_time, encode_vertical, encode_geography, filter_by_keys)
    374         vars = collections.OrderedDict([(short_name, data_var)])
    375         vars.update(coord_vars)
--> 376         dict_merge(dimensions, dims)
    377         dict_merge(variables, vars)
    378     attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS)

~/anaconda3/lib/python3.6/site-packages/cfgrib/dataset.py in dict_merge(master, update)
    353         else:
    354             raise ValueError("key present and new value is different: "
--> 355                              "key=%r value=%r new_value=%r" % (key, master[key], value))
    356 
    357 

ValueError: key present and new value is different: key='air_pressure' value=26 new_value=25

It's not easy to figure out if this is cfgrib or the data is not conforming.

@alexamici
Copy link
Contributor

alexamici commented Aug 13, 2018

@darrenleeweber from your report I think the the GRIB file has two variables with a pressure dimension, but on different pressure levels and in fact we don't support that at the moment. This a different issue than this one and it is better tracked as such, see #13.

@alexamici alexamici changed the title Unable to read a GRIB file with more than one value of a key per variable. ValueError: multiple values for unique attribute... on GRIB files with more than one value of a key per variable. Aug 13, 2018
@alexamici alexamici changed the title ValueError: multiple values for unique attribute... on GRIB files with more than one value of a key per variable. ValueError: multiple values for unique key... on GRIB files with more than one value of a key per variable. Sep 22, 2018
@alexamici
Copy link
Contributor

alexamici commented Sep 30, 2018

In master we now have the experimental cfgrib.open_datasets entry-point that returns a list of xr.Dataset built selecting appropriate filter_by_keys using a simple heuristics. All examples of complex GRIB files that I have work and return all the variables, except variables that trigger #13 that get skipped.

For example:

>>> cfgrib.open_datasets('nam.t00z.awp21100.tm00.grib2')

/src/cfgrib/xarray_store.py:177: FutureWarning: open_datasets is experimental. It may be removed.
  warnings.warn("open_datasets is experimental. It may be removed.", FutureWarning)
skipping variable with paramId==3041 shortName='absv'
Traceback (most recent call last):
  File "/src/cfgrib/dataset.py", line 385, in build_dataset_components
    dict_merge(dimensions, dims)
  File "/src/cfgrib/dataset.py", line 362, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='air_pressure' value=19 new_value=5

[<xarray.Dataset>
 Dimensions:       (air_pressure: 19, x: 93, y: 65)
 Coordinates:
     time          datetime64[ns] ...
     step          timedelta64[ns] ...
   * air_pressure  (air_pressure) float64 1e+03 950.0 900.0 ... 200.0 150.0 100.0
     latitude      (y, x) float64 ...
     longitude     (y, x) float64 ...
     valid_time    datetime64[ns] ...
 Dimensions without coordinates: x, y
 Data variables:
     gh            (air_pressure, y, x) float32 ...
     t             (air_pressure, y, x) float32 ...
     r             (air_pressure, y, x) float32 ...
     w             (air_pressure, y, x) float32 ...
     u             (air_pressure, y, x) float32 ...
 Attributes:
     GRIB_edition:            2
     GRIB_centre:             kwbc
     GRIB_centreDescription:  US National Weather Service - NCEP 
     GRIB_subCentre:          0
     history:                 GRIB to CDM+CF via cfgrib-0.8.5.2.dev0/ecCodes-2...,
 <xarray.Dataset>
 Dimensions:     (x: 93, y: 65)
 Coordinates:
     time        datetime64[ns] ...
     step        timedelta64[ns] ...
     cloudBase   int64 ...
     latitude    (y, x) float64 ...
     longitude   (y, x) float64 ...
     valid_time  datetime64[ns] ...
 Dimensions without coordinates: x, y
 Data variables:
     pres        (y, x) float32 ...
     gh          (y, x) float32 ...
 Attributes:
     GRIB_edition:            2
     GRIB_centre:             kwbc
     GRIB_centreDescription:  US National Weather Service - NCEP 
     GRIB_subCentre:          0
     history:                 GRIB to CDM+CF via cfgrib-0.8.5.2.dev0/ecCodes-2...,
 <xarray.Dataset>
 Dimensions:     (x: 93, y: 65)
 Coordinates:
     time        datetime64[ns] ...
     step        timedelta64[ns] ...
     cloudTop    int64 ...
     latitude    (y, x) float64 ...
     longitude   (y, x) float64 ...
     valid_time  datetime64[ns] ...
 Dimensions without coordinates: x, y
 Data variables:
     pres        (y, x) float32 ...
     gh          (y, x) float32 ...
     t           (y, x) float32 ...
 Attributes:
     GRIB_edition:            2
     GRIB_centre:             kwbc
     GRIB_centreDescription:  US National Weather Service - NCEP 
     GRIB_subCentre:          0
     history:                 GRIB to CDM+CF via cfgrib-0.8.5.2.dev0/ecCodes-2...,
 <xarray.Dataset>
 Dimensions:     (x: 93, y: 65)
 Coordinates:
     time        datetime64[ns] ...
     step        timedelta64[ns] ...
     maxWind     int64 ...
     latitude    (y, x) float64 ...
     longitude   (y, x) float64 ...
     valid_time  datetime64[ns] ...
 Dimensions without coordinates: x, y
 Data variables:
     pres        (y, x) float32 ...
     gh          (y, x) float32 ...
     u           (y, x) float32 ...
 Attributes:
     GRIB_edition:            2
     GRIB_centre:             kwbc
     GRIB_centreDescription:  US National Weather Service - NCEP 
     GRIB_subCentre:          0
     history:                 GRIB to CDM+CF via cfgrib-0.8.5.2.dev0/ecCodes-2...,
 <xarray.Dataset>
 Dimensions:       (x: 93, y: 65)
 Coordinates:
     time          datetime64[ns] ...
     step          timedelta64[ns] ...
     isothermZero  int64 ...
     latitude      (y, x) float64 ...
     longitude     (y, x) float64 ...
     valid_time    datetime64[ns] ...
 Dimensions without coordinates: x, y
 Data variables:
     gh            (y, x) float32 ...
     r             (y, x) float32 ...
 Attributes:
     GRIB_edition:            2
     GRIB_centre:             kwbc
     GRIB_centreDescription:  US National Weather Service - NCEP 
     GRIB_subCentre:          0
     history:                 GRIB to CDM+CF via cfgrib-0.8.5.2.dev0/ecCodes-2...]

As soon as I find the time to sync the Advanced usage section of the README I'll publish release 0.8.5.1 with the update.

@rabernat
Copy link

rabernat commented Oct 1, 2018

Just a quick question about your approach...why do you return a list of xarray datasets? Why not combine them into a single dataset?

@alexamici
Copy link
Contributor

@rabernat an xarray.Dataset cannot represent a GRIB file that contains more than one hypercube of the same variable, in the example above look at t that has both air_pressure and cloudTop as vertical coordinates.

Note that a NetCDF Dataset can represent such a GRIB file according the NetCDF Data Model as long as you place different hypercubes in different Groups. The fact that an xarray.Dataset is really a map of one of the NetCDF Groups inside a NetCDF Dataset (a file) is probably a bit confusing, but this is what it is. Note the how the group argument of xarray.open_dataset work.

I prefer not to arbitrarily change the variable names (t, t1, ...) and using the same variable name for all hypercubes trying to xarray.merge the datasets you get:
MergeError: conflicting values for variable 't' on objects to be combined

@rabernat
Copy link

rabernat commented Oct 1, 2018

Understood. So the different items in this list correspond to datasets with different vertical sampling schemes?

@alexamici
Copy link
Contributor

alexamici commented Oct 1, 2018

Not just that. You may have a GRIB file containing a variable with one message with gridType equal to regular_ll and the following message with regular_gg, you will get two datasets in this case as well.

Basically we have a list of GRIB keys that are required to be identical on all messages of a hypercube.

@shoyer
Copy link
Contributor

shoyer commented Oct 2, 2018

Would it make sense to return something more like a dict of Dataset objects? It's hard to predictably program with lists where the order of entries is arbitrary.

@mdbmdb74
Copy link

mdbmdb74 commented Oct 2, 2018

I've tried with GRIB2 files relative to MSG CloudMask products (that you can download from the EUMETSAT DataCentre at the following link https://www.eumetsat.int/website/home/Data/DataDelivery/EUMETSATDataCentre/index.html), but I've got an error because there is no attribute "step"

@alexamici
Copy link
Contributor

@shoyer I see your point, and that would be consistent with the idea that the different xr.Dataset's correspond to different "NetCDF Groups" in the GRIB file. But the heuristics I use for defining the the xr.Dataset's is a bit arbitrary, that is: a file is opened consistently, but you may potentially end up with different datasets if you change the order of the messages in the GRIB file.

At the moment the unique identifier of a xr.Dataset's is the filter_by_key dictionary, that can potentially contain several keys and makes for a very long group name. I'll try to come up with a proposal.

@alexamici
Copy link
Contributor

@mdbmdb74 woops! Those GRIB files crash cfgrib in several ways!

It should not be too hard to fix, thou. We just need to relax assumptions on what coordinate need to be present. Apparently both forecast_period/step and the vertical coordinate need to be made optional.

@alexamici
Copy link
Contributor

alexamici commented Oct 2, 2018

@mdbmdb74 current master treats vertical and forecast_period coordinates as optional and can open the EUMETSAT GRIB2 files:

>>> ds = cfgrib.open_dataset('MSG4-SEVI-MSGCLMK-0100-0100-20180930100000.000000000Z-20180930101421-1296606.grb')
No latitudes/longitudes provided by ecCodes for gridType = 'space_view'
>>> ds
<xarray.Dataset>
Dimensions:     (i: 13778944)
Coordinates:
    time        datetime64[ns] ...
    valid_time  datetime64[ns] ...
Dimensions without coordinates: i
Data variables:
    p260537     (i) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             eums
    GRIB_centreDescription:  EUMETSAT Operation Centre
    GRIB_subCentre:          0
    history:                 GRIB to CDM+CF via cfgrib-0.8.5.2.dev0/ecCodes-2...
>>> ds['p260537']
<xarray.DataArray 'p260537' (i: 13778944)>
[13778944 values with dtype=float32]
Coordinates:
    time        datetime64[ns] ...
    valid_time  datetime64[ns] ...
Dimensions without coordinates: i
Attributes:
    GRIB_paramId:                    260537
    GRIB_shortName:                  ~
    GRIB_units:                      Code table 4.217
    GRIB_name:                       Cloud mask
    GRIB_cfVarName:                  p260537
    GRIB_dataType:                   sa
    GRIB_missingValue:               9999
    GRIB_numberOfPoints:             13778944
    GRIB_NV:                         0
    GRIB_stepType:                   instant
    GRIB_gridType:                   space_view
    GRIB_gridDefinitionDescription:  Space view perspective orthographic
    long_name:                       Cloud mask
    units:                           Code table 4.217

Unfortunately ecCodes does not seem to support the space_view gridType (so we cannot represent the values on a 2D grid) nor the cloud mask parameter (units == Code table 4.217?!)

@alexamici
Copy link
Contributor

I close this issue with the release of version 0.9.0.

@bbonenfant
Copy link
Author

Sorry to reopen this issue -- feel free to move this somewhere else, but I think I may have found a bug or at least something I do not understand in the implementation of the filter_by_keys argument.

Here is some output I receive when trying to open one of those NAM grib files:

>>> xr.open_dataset('nam.t06z.awip3d00.tm00.grib2',
                    engine='cfgrib',
                    backend_kwargs={
                        'filter_by_keys': {'typeOfLevel': 'isobaricInhPa'},
                        'errors': 'ignore'
                    })

skipping variable: paramId==3041 shortName='absv'
Traceback (most recent call last):
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 429, in build_dataset_components
    dict_merge(dimensions, dims)
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 407, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='isobaricInhPa' value=39 new_value=7
skipping variable: paramId==1 shortName='strf'
Traceback (most recent call last):
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 430, in build_dataset_components
    dict_merge(variables, vars)
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 407, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='isobaricInhPa' value=Variable(dimensions=('isobaricInhPa',), data=array([1000,  975,  950,  925,  900,  875,  850,  825,  800,  775,  750,
        725,  700,  675,  650,  625,  600,  575,  550,  525,  500,  475,
        450,  425,  400,  375,  350,  325,  300,  275,  250,  225,  200,
        175,  150,  125,  100,   75,   50])) new_value=Variable(dimensions=(), data=250)
skipping variable: paramId==3017 shortName='dpt'
Traceback (most recent call last):
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 429, in build_dataset_components
    dict_merge(dimensions, dims)
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 407, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='isobaricInhPa' value=39 new_value=6
skipping variable: paramId==260022 shortName='mconv'
Traceback (most recent call last):
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 429, in build_dataset_components
    dict_merge(dimensions, dims)
  File "/Users/bbonenfant/PycharmProjects/wrf2wxl/venv/lib/python3.6/site-packages/cfgrib/dataset.py", line 407, in dict_merge
    "key=%r value=%r new_value=%r" % (key, master[key], value))
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='isobaricInhPa' value=39 new_value=2

<xarray.Dataset>
Dimensions:        (isobaricInhPa: 39, x: 185, y: 129)
Coordinates:
    time           datetime64[ns] ...
    step           timedelta64[ns] ...
  * isobaricInhPa  (isobaricInhPa) int64 1000 975 950 925 900 ... 125 100 75 50
    latitude       (y, x) float64 ...
    longitude      (y, x) float64 ...
    valid_time     datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables:
    gh             (isobaricInhPa, y, x) float32 ...
    t              (isobaricInhPa, y, x) float32 ...
    r              (isobaricInhPa, y, x) float32 ...
    q              (isobaricInhPa, y, x) float32 ...
    w              (isobaricInhPa, y, x) float32 ...
    u              (isobaricInhPa, y, x) float32 ...
    tke            (isobaricInhPa, y, x) float32 ...
    clwmr          (isobaricInhPa, y, x) float32 ...
    cice           (isobaricInhPa, y, x) float32 ...
    snmr           (isobaricInhPa, y, x) float32 ...
    strf           (y, x) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP 
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP 
    history:                 GRIB to CDM+CF via cfgrib-0.9.5.1/ecCodes-2.8.0 ...

You can see that it successfully returns a Dataset, but looking at the variables it returns, there is the U component of the wind but not the V component of the wind. I'm not sure why this is the case, since I've inspected the grib and find nothing apparent wrong with the v-winds. I've additionally tried this on other NAM gribs with similar results (even in your comment above on Sept. 30 this was the case).

I am unsure if this is an error on my part or if there is a way around this.
Thank you.

@alexamici
Copy link
Contributor

alexamici commented Jan 4, 2019

@bbonenfant thanks for your help! I opened a new issue with your comment, and leave this one closed as I consider the general problem solved by filter_by_key.

@berkesenturk
Copy link

@darrenleeweber I started to learn grib data format and faced with the same issue. Use it like this;

import xarray as xr

path = "gfsanl_4_2019101000.g2"
os.chdir(path)
ds = xr.open_dataset('gfs_4_20191010_0000_000.grb2',
engine='cfgrib',
backend_kwargs=dict(filter_by_keys={'typeOfLevel': 'hybrid'}))
print(ds)

@alexamici
Copy link
Contributor

The v component of GRIB files that use the MULTI-FIELD feature is read correctly only starting with version 0.9.8.2, see: https://github.com/ecmwf/cfgrib/blob/master/CHANGELOG.rst#0982-2020-05-22

@raybellwaves
Copy link
Contributor

Sorry to bring this up but I think it's a usage question around this functionality. Was trying this today to get u10 from GFS via ftp. If you like me to ask this someone else please let me know.

(xr 0.16.2 and cfgrib 0.9.8.5)

$ wget ftp://ftp.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20210212/00/gfs.t00z.pgrb2.0p25.f000
import xarray as xr
ds = xr.open_mfdataset(
    "gfs.t00z.pgrb2.0p25.f000",
    engine="cfgrib",
    backend_kwargs=dict(filter_by_keys={"typeOfLevel": "heightAboveGround"}),
)

output is

skipping variable: paramId==165 shortName='u10'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 602, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 536, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2) new_value=Variable(dimensions=(), data=10)
skipping variable: paramId==166 shortName='v10'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 602, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 536, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2) new_value=Variable(dimensions=(), data=10)
skipping variable: paramId==131 shortName='u'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 602, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 536, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2) new_value=Variable(dimensions=('heightAboveGround',), data=array([20, 30, 40, 50, 80]))
skipping variable: paramId==132 shortName='v'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 602, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 536, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2) new_value=Variable(dimensions=('heightAboveGround',), data=array([20, 30, 40, 50, 80]))
skipping variable: paramId==130 shortName='t'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 602, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 536, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2) new_value=Variable(dimensions=('heightAboveGround',), data=array([ 80, 100]))
skipping variable: paramId==133 shortName='q'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 602, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 536, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2) new_value=Variable(dimensions=(), data=80)
skipping variable: paramId==54 shortName='pres'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 602, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 536, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2) new_value=Variable(dimensions=(), data=80)
skipping variable: paramId==228246 shortName='u100'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 602, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 536, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2) new_value=Variable(dimensions=(), data=100)
skipping variable: paramId==228247 shortName='v100'
Traceback (most recent call last):
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 602, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/cfgrib/dataset.py", line 536, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2) new_value=Variable(dimensions=(), data=100)

and see

>>> ds
<xarray.Dataset>
Dimensions:            (latitude: 721, longitude: 1440)
Coordinates:
    time               datetime64[ns] ...
    step               timedelta64[ns] ...
    heightAboveGround  int64 ...
  * latitude           (latitude) float64 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * longitude          (longitude) float64 0.0 0.25 0.5 ... 359.2 359.5 359.8
    valid_time         datetime64[ns] ...
Data variables:
    t2m                (latitude, longitude) float32 dask.array<chunksize=(721, 1440), meta=np.ndarray>
    sh2                (latitude, longitude) float32 dask.array<chunksize=(721, 1440), meta=np.ndarray>
    d2m                (latitude, longitude) float32 dask.array<chunksize=(721, 1440), meta=np.ndarray>
    r2                 (latitude, longitude) float32 dask.array<chunksize=(721, 1440), meta=np.ndarray>
    aptmp              (latitude, longitude) float32 dask.array<chunksize=(721, 1440), meta=np.ndarray>
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP 
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP 
    history:                 2021-02-12T14:02:09 GRIB to CDM+CF via cfgrib-0....

@GirmayBerhe
Copy link

Great! Worked for me.

iainrussell pushed a commit that referenced this issue Mar 2, 2023
* var_encoding options for to_netcdf convertor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants