Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better document xarray integration #1030

Closed
ahuang11 opened this issue May 1, 2019 · 10 comments · Fixed by #1058
Closed

Better document xarray integration #1030

ahuang11 opened this issue May 1, 2019 · 10 comments · Fixed by #1058
Labels
Area: Docs Affects documentation Type: Enhancement Enhancement to existing functionality
Milestone

Comments

@ahuang11
Copy link
Contributor

ahuang11 commented May 1, 2019

I was wondering whether parse_cf should work for xr.DataArray? If so, I'm happy to contribute; just wanted to first know whether or not it was purposefully left out for a reason.

Currently, to get around it, I do

import xarray as xr
import metpy

da = xr.tutorial.open_dataset('air_temperature')['air']
da.to_dataset().metpy.parse_cf()[da.name]

If I don't convert it, I get the following error

import xarray as xr
import metpy

da = xr.tutorial.open_dataset('air_temperature')['air']
da.metpy.parse_cf()

AttributeError                            Traceback (most recent call last)
<ipython-input-38-9ef76dbf2399> in <module>
      3 
      4 ds = xr.tutorial.open_dataset('air_temperature')
----> 5 ds['air'].metpy.parse_cf()

AttributeError: 'MetPyAccessor' object has no attribute 'parse_cf'
@jthielen
Copy link
Collaborator

jthielen commented May 2, 2019

Looking at what parse_cf does, I don't believe it makes sense for it to work on a DataArray alone, because parse_cf requires projection information that would be stored in another variable in a Dataset (in other words, there isn't enough information in just the DataArray for parse_cf to do its job).

However, parse_cf takes an varname argument (see here), so to get just a DataArray in your example, you can do:

import xarray as xr
import metpy

da = xr.tutorial.open_dataset('air_temperature').metpy.parse_cf('air')

Perhaps this needs to be documented more clearly?

@dopplershift
Copy link
Member

I agree with @jthielen. The reason parse_cf only exists on Dataset is because it requires information stored in another variable--that's just part of the CF standard. We could probably use some good narrative documentation as part of the reference guide on all of the things related to the XArray integration.

I'm going to rename this issue and leave it open to remind us to do so.

@dopplershift dopplershift changed the title metpy.parse_cf for xr.DataArrays Better document xarray integration May 2, 2019
@dopplershift dopplershift added Area: Docs Affects documentation Type: Enhancement Enhancement to existing functionality labels May 2, 2019
@ahuang11
Copy link
Contributor Author

ahuang11 commented May 2, 2019

Okay so I guess my point of confusion stems from this:

import metpy
import xarray
ds = xr.tutorial.open_dataset('air_temperature')
ds.attrs = ''
ds = ds.metpy.parse_cf()
ds

<xarray.Dataset>
Dimensions:  (lat: 25, lon: 53, time: 2920)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
    crs      object Projection: latitude_longitude
Data variables:
    air      (time, lat, lon) float32 ...

it successfully parsed out a latitude/longitude projection, even when I stripped the attributes and it only has one variable: air.

Actually, I don't mind not being able to parse the projection; I just like that I can locate the axis name after parsing ds['air'].metpy.find_axis_name('x') which brings me to my next question, was find_axis_name purposefully not exposed on the Dataset level?

@dopplershift
Copy link
Member

Since ds is a dataset, that has no impact on parsing the projection. The parsing is looking for the grid_mapping attribute on the air variable.

find_axis_name is excluded from the Dataset level because axis names only make sense in the context of a variable. Each variable can have different names for axes.

@ahuang11
Copy link
Contributor Author

ahuang11 commented May 3, 2019

Sorry, I'm still not understanding. Where can I find the grid_mapping attribute? If it's on the air variable, then shouldn't parse_cf work on the DataArray as well?

>>> ds['air']
<xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)>
[3869000 values with dtype=float32]
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Attributes:
    long_name:     4xDaily Air temperature at sigma level 995
    units:         degK
    precision:     2
    GRIB_id:       11
    GRIB_name:     TMP
    var_desc:      Air temperature
    dataset:       NMC Reanalysis
    level_desc:    Surface
    statistic:     Individual Obs
    parent_stat:   Other
    actual_range:  [185.16 322.1 ]

I guess it's able to successfully parse because of the following:

        # Trying to guess whether we should be adding a crs to this variable's coordinates
        # First make sure it's missing CRS but isn't lat/lon itself
        if not self.check_axis(var, 'lat', 'lon') and 'crs' not in var.coords:
            # Look for both lat/lon in the coordinates
            has_lat = has_lon = False
            for coord_var in var.coords.values():
                has_lat = has_lat or self.check_axis(coord_var, 'lat')
                has_lon = has_lon or self.check_axis(coord_var, 'lon')

            # If we found them, create a lat/lon projection as the crs coord
            if has_lat and has_lon:
                var.coords['crs'] = CFProjection({'grid_mapping_name': 'latitude_longitude'})

Maybe break parse_cf down into two (making _generate_coordinate_map public)? One that parses for projection info and one simply parses the coordinate names so it's possible to use find_axis_name?

@dopplershift
Copy link
Member

So it can't work with just the air variable because while the grid_mapping attribute is found there, all grid_mapping has is the name of the variable that contains the projection information. It's encoded this way because the grid mapping is encoded as a set of attributes on a simple integer variable. It's imperfect, but that's the standard. Because we need access to the full list of variables in order to find the grid mapping info, we need access to the parent data set, and xarray data arrays do not keep a reference to their parent dataset, since not all data arrays have a parent dataset. Hence, we need parse_cf, for grid_mapping at least, to live on the Dataset.

I'd need to dig in further to figure out if we can put find_axis_name outside the dataset. @jthielen any thoughts?

@jthielen
Copy link
Collaborator

jthielen commented May 8, 2019

The parsing relating to the coordinates is only dependent on the coordinates for a given variable, so yes, it can be done outside the Dataset. I think that functionality should be able to be moved from the Dataset to the DataArray accessor without too difficult of a refactor.

So, would the following changes make sense to resolve this?

  1. Rename the xarray accessors from MetPyAccessor (for DataArrays) and CFConventionHandler (for Datasets) to something more generic like MetPyDataArray and MetPyDataset, since now our CF handling would be split between the two.
  2. Move the coordinate parsing (what's in parse_cf() and all the private helper methods) to the DataArray accessor (with a new method called parse_coordinates() or something similar?).
  3. Have parse_cf() call the new parse_coordinates() (or whatever it will be called) on the DataArray for the variable currently being parsed.

@dopplershift
Copy link
Member

@jthielen Most of that seems good in general. I think I'd like to have accessor in the class names to make it clear what they are, so maybe MetPyDataArrayAccessor and MetPyDatasetAccessor?

For the DataArray stuff, I'd like to avoid manually calling parse_coordinates() and having that as part of the public API, and instead maybe evaluate it lazily (and cache) as needed when the attributes on the MetPyDataArrayAccessor for the coordinates are used. Does that make sense? My thinking is that we did it that way because we already had a method doing all the work at once. Now that we're splitting, it would seem, to me anyway, that the better hook is on-demand on attribute access.

@jthielen
Copy link
Collaborator

jthielen commented May 9, 2019

Sounds good about the class names!

With the second point, that's a much better idea than requiring the manual call. Since right now all the coordinate access properties/methods on the DataArray accessor go through the _axis method, if I'm understanding your idea correctly, there can just be a check in that method if the coordinates have been parsed yet (check if the _metpy_axis attribute exists on any of the coordinates). If not, it can call a private _parse_coordinates() method to identify the axes. If all that sounds good, I can try getting a PR for this in next week.

@dopplershift
Copy link
Member

That sounds about right to me. A PR would be most welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Docs Affects documentation Type: Enhancement Enhancement to existing functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants