Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in vertical coordinate verification #1124

Closed
dopplershift opened this issue Aug 12, 2019 · 3 comments · Fixed by #1142
Closed

Bug in vertical coordinate verification #1124

dopplershift opened this issue Aug 12, 2019 · 3 comments · Fixed by #1142
Labels
Area: Xarray Pertains to xarray integration Type: Bug Something is not working like it should
Milestone

Comments

@dopplershift
Copy link
Member

Stumbled upon an issue in (vertical) coordinate identification. This DataArray results in vertical attribute is not available (from u.metpy.vertical):

<xarray.DataArray 'u-component_of_wind_isobaric' (isobaric: 31, lat: 101, lon: 161)>
[504091 values with dtype=float32]
Coordinates:
    reftime1  datetime64[ns] 2019-08-11T18:00:00
    time1     datetime64[ns] 2019-08-12T03:00:00
  * lat       (lat) float32 60.0 59.5 59.0 58.5 58.0 ... 11.5 11.0 10.5 10.0
  * lon       (lon) float32 230.0 230.5 231.0 231.5 ... 308.5 309.0 309.5 310.0
    crs       object Projection: latitude_longitude
  * isobaric  (isobaric) float32 100.0 200.0 300.0 ... 95000.0 97500.0 100000.0
Attributes:
    long_name:                      u-component of wind @ Isobaric surface
    units:                          m/s
    abbreviation:                   UGRD
    grid_mapping:                   LatLon_Projection
    Grib_Variable_Id:               VAR_0-2-2_L100
    Grib2_Parameter:                [0 2 2]
    Grib2_Parameter_Discipline:     Meteorological products
    Grib2_Parameter_Category:       Momentum
    Grib2_Parameter_Name:           u-component of wind
    Grib2_Level_Type:               100
    Grib2_Level_Desc:               Isobaric surface
    Grib2_Generating_Process_Type:  Forecast

whereas it works for temperature:

<xarray.DataArray 'Temperature_isobaric' (isobaric4: 34, lat: 101, lon: 161)>
[552874 values with dtype=float32]
Coordinates:
    reftime1   datetime64[ns] 2019-08-11T18:00:00
    time1      datetime64[ns] 2019-08-12T03:00:00
  * isobaric4  (isobaric4) float32 40.0 100.0 200.0 ... 95000.0 97500.0 100000.0
  * lat        (lat) float32 60.0 59.5 59.0 58.5 58.0 ... 11.5 11.0 10.5 10.0
  * lon        (lon) float32 230.0 230.5 231.0 231.5 ... 308.5 309.0 309.5 310.0
    crs        object Projection: latitude_longitude
Attributes:
    long_name:                      Temperature @ Isobaric surface
    units:                          K
    abbreviation:                   TMP
    grid_mapping:                   LatLon_Projection
    Grib_Variable_Id:               VAR_0-0-0_L100
    Grib2_Parameter:                [0 0 0]
    Grib2_Parameter_Discipline:     Meteorological products
    Grib2_Parameter_Category:       Temperature
    Grib2_Parameter_Name:           Temperature
    Grib2_Level_Type:               100
    Grib2_Level_Desc:               Isobaric surface
    Grib2_Generating_Process_Type:  Forecast

These data were obtained using NCSS through siphon:

from datetime import datetime
import metpy.calc as mpcalc
from siphon.catalog import TDSCatalog
import xarray as xr

cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/'
                 'NCEP/GFS/Global_0p5deg/catalog.xml')
best = cat.datasets['Best GFS Half Degree Forecast Time Series']
subset_access = best.subset()
query = subset_access.query()
query.time(datetime.utcnow())
query.variables('Temperature_isobaric', 'Geopotential_height_isobaric',
                'u-component_of_wind_isobaric', 'v-component_of_wind_isobaric',
                'Relative_humidity_isobaric')
query.lonlat_box(west=-130, east=-50, south=10, north=60)
query.accept('netcdf4')
nc = subset_access.get_data(query)
from xarray.backends import NetCDF4DataStore
ds = xr.open_dataset(NetCDF4DataStore(nc))
ds = ds.metpy.parse_cf()
temperature = ds['Temperature_isobaric'][0]
u = ds['u-component_of_wind_isobaric'][0]
temperature.metpy.vertical
u.metpy.vertical

This code works fine on 0.10.2.

@dopplershift dopplershift added Area: Xarray Pertains to xarray integration Type: Bug Something is not working like it should labels Aug 12, 2019
@dopplershift
Copy link
Member Author

So actually the example above (cobbled together from code that was failing), for some reason works--this seems to be a subtle issue. The following is verified to fail:

from datetime import datetime
import metpy.calc as mpcalc
from siphon.catalog import TDSCatalog
import xarray as xr

cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/'
                 'NCEP/GFS/Global_0p5deg/catalog.xml')
best = cat.datasets['Best GFS Half Degree Forecast Time Series']
subset_access = best.subset()
query = subset_access.query()
query.time(datetime.utcnow())
query.variables('Temperature_isobaric', 'Geopotential_height_isobaric',
                'u-component_of_wind_isobaric', 'v-component_of_wind_isobaric',
                'Relative_humidity_isobaric')
query.lonlat_box(west=-130, east=-50, south=10, north=60)
query.accept('netcdf4')
nc = subset_access.get_data(query)
from xarray.backends import NetCDF4DataStore
ds = xr.open_dataset(NetCDF4DataStore(nc))
ds = ds.metpy.parse_cf()
temperature = ds['Temperature_isobaric']

lon = temperature.metpy.x
temperature = temperature[0]

u = ds['u-component_of_wind_isobaric'][0]
temperature.metpy.vertical
u.metpy.vertical

but for some reason the following similar code works:

from datetime import datetime
import metpy.calc as mpcalc
from siphon.catalog import TDSCatalog
import xarray as xr

cat = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/grib/'
                 'NCEP/GFS/Global_0p5deg/catalog.xml')
best = cat.datasets['Best GFS Half Degree Forecast Time Series']
subset_access = best.subset()
query = subset_access.query()
query.time(datetime.utcnow())
query.variables('Temperature_isobaric', 'Geopotential_height_isobaric',
                'u-component_of_wind_isobaric', 'v-component_of_wind_isobaric',
                'Relative_humidity_isobaric')
query.lonlat_box(west=-130, east=-50, south=10, north=60)
query.accept('netcdf4')
nc = subset_access.get_data(query)
from xarray.backends import NetCDF4DataStore
ds = xr.open_dataset(NetCDF4DataStore(nc))
ds = ds.metpy.parse_cf()
temperature = ds['Temperature_isobaric'][0]

lon = temperature.metpy.x

u = ds['u-component_of_wind_isobaric'][0]
temperature.metpy.vertical
u.metpy.vertical

The only difference is where temperature is indexed with [0] for time. This should be interesting...

@jthielen
Copy link
Collaborator

jthielen commented Aug 12, 2019

So, here's what I think is going on...

In the failing example, MetPyDataArrayAccessor._parse_coordinates() is first called through temperature.metpy.x. But, here temperature at this point is just a reference to the 'Temperature_isobaric' variable on ds, so its coordinates are still shared across the dataset. However, it is just the coordinates on 'Temperature_isobaric' that the method sees, so it just associates time with 'time1', vertical with 'isobaric4', and so forth. Since it runs on the DataArray-level, it has no way of dealing with the other coordinates in the dataset. But, it still leaves the _metpy_axis attribute on 'time1', 'lat', and 'lon', so that they are still there when u is pulled out from the dataset, since these coordinates are shared. Thus, when u.metpy.vertical is called, MetPyDataArrayAccessor._parse_coordinates() finds coordinates already parsed, so it doesn't do anything to find the vertical coordinate, causing the error to be raised.

In the working example, the indexed assignment of temperature acts as a copy of that subset of the variable, rather than a reference, so, MetPyDataArrayAccessor._parse_coordinates() running on temperature has no side effects when MetPyDataArrayAccessor._parse_coordinates() runs on u later on...each of them has a "clean" set of coordinates.

Long story short, the cached result check in #1058 isn't holding up to real-world use cases. 😲

Fixing this, however, gets interesting. While I'd be tempted to suggest removing that existing-parsed-coordinates check as a quick fix, that would mean automatically overwriting the previously identified coordinates, which could be very bad if the user manually specified them. What this may take is another refactor to have the coordinate parsing take place on a type-by-type basis and not all grouped together in an all-or-nothing fashion like it currently is?

Side note: to force "clean" coordinates, you can call u.metpy.assign_coordinates(None), which will remove all _metpy_axis attributes on coordinates of u.

@jthielen
Copy link
Collaborator

A note for reference: at one point, I had the idea of caching the coordinate identification information on the accessor class itself to help resolve this, but this definitely won't work (see geoxarray/geoxarray#13, pydata/xarray#3205, pydata/xarray#3268).

So, I think the way forward for this will be checking for coordinate types on a coordinate-by-coordinate basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Xarray Pertains to xarray integration Type: Bug Something is not working like it should
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants