Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cfgrib generates NaNs when reading data #313

Closed
clessig opened this issue Sep 8, 2022 · 4 comments
Closed

cfgrib generates NaNs when reading data #313

clessig opened this issue Sep 8, 2022 · 4 comments

Comments

@clessig
Copy link

clessig commented Sep 8, 2022

cfgrib generates NaNs while loading grib files, see code example below. We reproduced this on multiple machines (with slightly different configurations) and verified with grib_get_data and the old python interface that the file is free of NaNs. For your convenience, the grib of the example can be found here: http://graphics.cs.uni-magdeburg.de/misc/reanalysis_geopotential_y1979_m01_pl850.grib. However, we were able to reproduce the problem with various files (although there seems to be a correlation with the large values of geopotential compared to, e.g., vorticity).

(pyenv) [lessig1@jwlogin24 multiformer]$ python
Python 3.9.6 (default, Nov 16 2021, 12:25:13) 
[GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xarray
>>> fname = 'data/era5/pl_levels/850/geopotential/reanalysis_geopotential_y1979_m01_pl850.grib'
>>> ds = xarray.open_dataset( fname, engine='cfgrib')
Ignoring index file 'data/era5/pl_levels/850/geopotential/reanalysis_geopotential_y1979_m01_pl850.grib.923a8.idx' incompatible with GRIB file
>>> ds
<xarray.Dataset>
Dimensions:        (time: 744, latitude: 721, longitude: 1440)
Coordinates:
    number         int64 ...
  * time           (time) datetime64[ns] 1979-01-01 ... 1979-01-31T23:00:00
    step           timedelta64[ns] ...
    isobaricInhPa  float64 ...
  * latitude       (latitude) float64 90.0 89.75 89.5 ... -89.5 -89.75 -90.0
  * longitude      (longitude) float64 0.0 0.25 0.5 0.75 ... 359.2 359.5 359.8
    valid_time     (time) datetime64[ns] ...
Data variables:
    z              (time, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2022-09-08T08:06 GRIB to CDM+CF via cfgrib-0.9.1...
>>> import numpy as np
>>> data = np.array( ds['z'], dtype=np.float32)
>>> np.isnan( data).any()
True
>>> np.argwhere( np.isnan( data))
array([[   3,  123,  688],
       [   3,  140,  702],
       [   3,  638,  890],
       [  21,  121,  668],
       [  21,  142,  678],
       [  21,  631,  988],
       [  77,  599,  491],
       [  77,  606,  398],
       [  97,   55, 1206],
       [  97,   62, 1201],
       [ 140,  592, 1420],
       [ 184,  592,  755],
       [ 289,  139,  674],
       [ 289,  151,  640],
       [ 404,  608,  817],
       [ 445,  164,  671],
       [ 445,  631,  874],
       [ 519,  600,  825],
       [ 519,  641, 1114],
       [ 519,  646, 1080],
       [ 519,  656,  788],
       [ 519,  664, 1312],
       [ 555,  617,  329],
       [ 645,  587,   72],
       [ 729,  607,  189],
       [ 742,  603,  293],
       [ 742,  642,  113]])
@iainrussell
Copy link
Member

Hi @clessig , that URL does not work for me - could you check that it's correct?
Thanks,
Iain

@clessig
Copy link
Author

clessig commented Sep 8, 2022

Hi Iain,

Please try again: http://graphics.cs.uni-magdeburg.de/misc/reanalysis_geopotential_y1979_m01_pl850.grib

Thanks,
Christian

@iainrussell
Copy link
Member

Hi Christian,

Thanks, it worked this time!

So, I had a suspicion about this one, checked the data using Metview, and yes - the data values contain the magic number 9999! Unfortunately this was being used as a missing value indicator (a really bad choice, but it comes from a decision made in another package many years ago). It's easily fixed, but I want to do it properly, and with tests. In the meantime, if you have your own copy of cfgrib, you can make the relevant change. It's on line 560 of dataset.py. Currently the line is:

    missing_value = data_var_attrs.get("missingValue", 9999)

Change it to:

    missing_value = data_var_attrs.get("missingValue", np.finfo(np.float32).max)

Best regards,
Iain

@clessig
Copy link
Author

clessig commented Sep 8, 2022

Hi Iain,

Many thanks for looking into this so quickly. I can find a workaround now.

Best,
Christian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants