Inconsistent arguments for gradient and advection #896

sgdecker · 2018-07-24T16:17:27Z

The following code:

import numpy as np
import metpy.calc as mpcalc
from metpy.units import units

T = np.array(([1, 2, 3],
              [4, 4, 4],
              [1, 3, 7],
              [0, 2, 6])) * units.K
u = np.array(([5, 4, 4],
              [6, 6, 6],
              [6, 7, 6],
              [5, 5, 5])) * units.m/units.s
v = np.array(([4, 4, 4],
              [3, 2, 3],
              [1, 2, 3],
              [0, 1, 1])) * units.m/units.s

dx = np.array(([12, 10],
               [12, 10],
               [10, 8],
               [10, 8])) * units.m
dy = np.array(([6, 5, 5],
               [5, 5, 4],
               [5, 4, 4])) * units.m

np.set_printoptions(precision=4)

dTdy, dTdx = mpcalc.gradient(T, deltas=(dy, dx))
print(dTdx)
print(dTdy)

adv = mpcalc.advection(T, wind=(u, v), deltas=(dx, dy), dim_order='yx')
print(adv)

accurately computes the temperature gradient and temperature advection corresponding to the sample data provided in 'yx' order.

However, notice that doing this calculation properly requires providing the deltas as (dy, dx) in the gradient call, but (dx, dy) in the advection call. I would have expected the order to be the same for both calls. In particular, since the data is in yx order, I would expect the correct call to advection would include deltas=(dy, dx), but currently that is not the case.

A second inconsistency involves the dim_order argument: The gradient function doesn't use it, but omitting it from advection results in a stern warning from MetPy. I would expect:

No warning from MetPy when dim_order is omitted in the advection call, and/or
gradient to allow for the dim_order to be specified as well, interpreted the same way as in advection.

Version Info

Python 3.6.5
0.8.0+32.gb19c052

The text was updated successfully, but these errors were encountered:

sgdecker · 2018-07-25T13:21:18Z

Upon further reflection, I think the cleanest, most intuitive API for these functions would include the following constraints:

Grid storage order is an implementation detail fully encapsulated by the dim_order argument (which, if absent, means 'yx' order is assumed)
The arrays returned by these functions always use the same dim_order as specified for the input (not a change from how things currently are, but included for completeness)
The deltas argument, if supplied, always puts delta_x first, regardless of dim_order
The gradient function always returns the x-derivative first, regardless of dim_order (After all, in the classroom, the gradient is usually written in Cartesian coords as (ds/dx, ds/dy) or ds/dx i + ds/dy j.)

I think these constraints would provide the cleanest interface in a typical workflow where the user reads in some netCDF data, computes some of these diagnostics, and makes some plots. The user would not even need to worry about dim_order in this workflow; he/she would only need to know that things involving delta-x go in first, or come out first (which is the "obvious" order).

Given these criteria, the interface for advection is OK in the code example above (although at some point it makes sense to remove the warning if dim_order is omitted), and it is the interface to gradient that needs work. The proper call should be:
dTdx, dTdy = mpcalc.gradient(T, deltas=(dx, dy)) (with dim_order='yx' optionally included)

dopplershift · 2018-07-25T18:45:47Z

Thank you for the thoughtful comments digging into this, it really helps laying it all out like that. Responses to the list:

Agreed. With the next release (0.9, will be released in August), 'yx' ordering is assumed and it no longer warns if you do not specify it.
Yes
This appears to be how things operate, though the docs (at least on advection) are horribly unclear on that. Definitely need to improve the docs on this.
See below

From my perspective, the fundamental difference between advection and gradient is that advection is a physically-aware calculation, meaning it knows about different dimensions like x and y and u and v wind components, while gradient only knows that it gets N-dimensional arrays and (by default) needs to calculate N finite-difference derivative estimates along each of the N dimensions.

I agree that the semantic differences between advection and gradient are problematic, especially when trying to teach students. But let me throw another wrench into the works here: up until version 0.7, we didn't even have our own gradient function--we relied on numpy's. numpy.gradient has the same semantics as we currently have in MetPy. So if we change the order of metpy.calc.gradient to return (dx, dy, dz), are we creating new confusion?

A lot of this will hopefully get simpler when we completely support xarray, so that dx, etc. arguments can go away. I am 100% in favor of making sure MetPy is consistent and as easy to use as possible.

FYI, one work-around to the gradient confusion is to use first_derivative explicitly for each of the desired dimensions.

sgdecker · 2018-07-27T13:30:22Z

I am confused by your reponse to point 3, since in my example the call to gradient requires supplying the deltas in the opposite order to what I suggest is intuitive. This is in my mind the messiest part of the current situation, where the deltas argument has to be supplied in two different orders in my example code for the correct calculations. But it sounds like that may disappear with full xarray support.

Regarding point 4, maybe gradient should become _gradient (for internal use by other functions), and a new gradient that always returns the x-derivative first should be what appears in the public API. But I agree that it may be better to wait till the xarray support is complete to revisit this.

dopplershift · 2018-07-31T15:52:31Z

Sorry, for (3), deltas are x-first, except for gradient and laplacian.

Yes, xarray should make all of this moot in general, though I'd like things to function even if you're not using it. So we may want to make some changes in the future.

Thanks for all of the thoughtful feedback!

sgdecker · 2019-01-06T21:00:21Z

An added note that laplacian() also has the confusing ordering for the deltas argument. If you think about what's happening under the hood, I guess it makes sense, but fragments like:

# 700-hPa Temperature Advection
tadv_700 = mpcalc.advection(tmpk_700s, (uwnd_700s, vwnd_700s), (dx, dy),
                            dim_order='yx').to_base_units()
# Laplacian of Temperature Advection
lap_tadv_700 = mpcalc.laplacian(tadv_700, deltas=(dy, dx))

just aren't intuitive.

dopplershift added Type: Maintenance Updates and clean ups (but not wrong) Area: Calc Pertains to calculations Type: API Change Changes to how existing functionality works labels Aug 2, 2018

jthielen mentioned this issue Sep 3, 2020

Improved xarray compatibility on function input and output #1490

Merged

6 tasks

dopplershift closed this as completed in #1490 Sep 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent arguments for gradient and advection #896

Inconsistent arguments for gradient and advection #896

sgdecker commented Jul 24, 2018

sgdecker commented Jul 25, 2018 •

edited

Loading

dopplershift commented Jul 25, 2018

sgdecker commented Jul 27, 2018

dopplershift commented Jul 31, 2018

sgdecker commented Jan 6, 2019

Inconsistent arguments for gradient and advection #896

Inconsistent arguments for gradient and advection #896

Comments

sgdecker commented Jul 24, 2018

Version Info

sgdecker commented Jul 25, 2018 • edited Loading

dopplershift commented Jul 25, 2018

sgdecker commented Jul 27, 2018

dopplershift commented Jul 31, 2018

sgdecker commented Jan 6, 2019

sgdecker commented Jul 25, 2018 •

edited

Loading