Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Enhanced xarray function compatibility on input/output #1304

Closed
wants to merge 4 commits into from
Closed

WIP: Enhanced xarray function compatibility on input/output #1304

wants to merge 4 commits into from

Conversation

jthielen
Copy link
Collaborator

Description Of Changes

This PR utilizes the wrapper of #1223 across the library where it naturally fit, as well allowing broadcasting of xarray elements in the xarray preprocessing decorator. It also has an attempt at filling in kinematics grid parameters (however, there are still some errors in the code).

Checklist

  • Closes #xxxx
  • Tests added
  • Fully documented

@dopplershift dopplershift added this to the 1.0 milestone Jan 14, 2020
@jthielen
Copy link
Collaborator Author

jthielen commented Feb 3, 2020

@dopplershift I'm just now getting the chance to get back to this, and after running into some ...interesting... issues combining xarray, pint, and MetPy in current versions, I wanted to ask about what the target "xarray data structure" should be in v1.0 to guide the effort here. Should it be

  • A DataArray with an ndarray inside, and units on the DataArray attribute?
  • A DataArray with a pint Quantity inside?
    • Requires newer xarray (at least 0.13) and pint (at least 0.10)
    • The more "future-proof" option (I think)
    • However, tests that this works across the xarray API are still considered WIP (see support for units with pint pydata/xarray#3594)
    • For end-user ease-of-use, may need re-implementation of/dependency on/vendoring of pint-xarray-type features that MetPy is missing

@dopplershift
Copy link
Member

@jthielen 😭 The pain of being at the cutting edge.

First question: Is it possible that both are supported?

I'd like to get to the second, but that seems like a pretty big reach today (feel free to correct me if I'm wrong). I'm ok with updating to the latest versions; I have concerns about how much back-porting might be required.

@jthielen
Copy link
Collaborator Author

jthielen commented Feb 5, 2020

First question: Is it possible that both are supported?

Now that you say that, yes, it should be technically possible to support both, they would just become separate options in the wrap_output_like helper.

However, I think it would be rather confusing for end users to have two different kinds of DataArray output from MetPy functions that they would have to interact with differently. For instance, to apply an offset to, say, potential temperature, the first option would look something like

theta = potential_temperature(pressure, temperature)
theta.metpy.unit_array = theta_0.metpy.unit_array + 5 * units.K

(in general, lots of reliance on MetPy's accessor and explicitly handling magnitudes/units, and easy to mess up if you try multiplying a DataArray by something with a unit)

and the second like

theta = potential_temperature(pressure, temperature) + 5 * units.K

(more natural, can be easily conceptualized as a "Quantity with coordinates on top")

Things could also go quite wrong if these two are mixed outside of the relative safety of MetPy's calculation wrappers (e.g., units being dropped from the units-on-attribute DataArray, but not the Quantity-DataArray).

I'd like to get to the second, but that seems like a pretty big reach today (feel free to correct me if I'm wrong). I'm ok with updating to the latest versions; I have concerns about how much back-porting might be required.

I'm glad to hear the versions aren't an issue! Although, I'm not quite sure of what you exactly mean by "pretty big reach." Is it effort in implementation? Reliance on yet-to-be-finalized upstream functionality? Impacts on documentation/downstream code/training materials? Something else?

Implementation-wise, I really don't think it's that big of a deal. Since (in the philosophy of this PR at least) most of the calculation internals remain operating on Quantities, preferring the second option over the first is a simple switch in how the wrapper reconstructs the DataArray (and may even make using DataArrays internally in the calculations easier down the road). The "pint-xarray-type" quantification/dequantification helpers should also be easy to add to the accessors.

While I've not run into any issues recently with having Quantities-in-DataArrays in my own tests (with the exclusion of coordinates/indexes of course), I'm a bit nervous about relying on the technically-WIP compatibility in production code. However, the alternative (how it is now) definitely feels like an ugly API to commit to.

Switching to just using the second would no doubt require adjustments elsewhere. However, I think they would mostly be clean-ups to the unit-compatibility hacks that are required now? Though, any place where da.data is expected to be an ndarray, or da.units/da.attrs['units'] is expected to be available/needing to be set, however, would need to be changed (I think this is mostly in declaritive plotting, cross sections, and derivatives at the moment). Regardless, this likely an easy-to-moderate difficultly task if we standardize around the second option, but could be more complex if needing to support both options.


I guess a lot of this is rambling to say that:

  • Standardizing on the second option alone seems ideally the best, but feels risky due to WIP upstream support
  • Having both is most adaptable, but also most technically complex, and likely confusing to users
  • Standardizing on the first option alone has inertia from the past going for it, but feels a shame to abandon the recent NEP-18 niceties, and may still be confusing for users.

@dopplershift
Copy link
Member

@jthielen I see what you're saying. I think it's clear that the second is the most desirable option, and I agree having two similar, but different, paths is not a good user experience.

As I understand this, the main challenges we're looking at relying on some WIP support upstream is that things could:

  1. Require us to update our wrapper
  2. End up being buggy

All we really need right now is to commit to an interface for our users. It sounds to me like we have that, it just might be a slightly rough road. Worst case, people need to drop back to numpy arrays if they run into bugs, right?

@jthielen
Copy link
Collaborator Author

jthielen commented Feb 7, 2020

All we really need right now is to commit to an interface for our users. It sounds to me like we have that, it just might be a slightly rough road. Worst case, people need to drop back to numpy arrays if they run into bugs, right?

Sounds good, and I agree. I'll move forward with the Quantity-inside-DataArray option.

@jthielen
Copy link
Collaborator Author

Closing in favor of #1353.

@jthielen jthielen closed this Apr 16, 2020
@dopplershift dopplershift removed this from the 1.0 milestone Apr 17, 2020
jthielen added a commit to jthielen/MetPy that referenced this pull request Jul 27, 2020
jthielen added a commit to jthielen/MetPy that referenced this pull request Sep 3, 2020
jthielen added a commit to jthielen/MetPy that referenced this pull request Sep 16, 2020
jthielen added a commit to jthielen/MetPy that referenced this pull request Sep 23, 2020
jthielen added a commit to jthielen/MetPy that referenced this pull request Sep 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants