-
Notifications
You must be signed in to change notification settings - Fork 19
ENH: Added conservative interpolation approach #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2aabdec to
47684e3
Compare
stratify/__init__.py
Outdated
| EXTRAPOLATE_NAN, EXTRAPOLATE_NEAREST, | ||
| EXTRAPOLATE_LINEAR, PyFuncExtrapolator, | ||
| PyFuncInterpolator) | ||
| from ._bounded_vinterp import interpolate as interpolate_conservative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F401 '._bounded_vinterp.interpolate as interpolate_conservative' imported but unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tell flake8 to ignore it with a # noqa type comment in the code. Ref: http://flake8.pycqa.org/en/3.2.1/user/ignoring-errors.html#in-line-ignoring-errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @pelson
47684e3 to
08239e2
Compare
|
Sorry iterating through issues exposed with travis:
|
|
Tests now pass for python3 (I'll run these locally in a py3 environment in future). Ready for review, really this time. Cheers |
stratify/_conservative.pyx
Outdated
| # |------------| : Target | ||
| # |----------| : Delta (src_max - src_min) | ||
| # |----| : Overlap (between src & tgt) | ||
| # weight = overlap / delta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this diagram - it provides the key concept and several definitions very succinctly.
| if z_src.ndim != z_target.ndim: | ||
| msg = ('Expecting source and target levels dimensionality to be ' | ||
| 'identical. {} != {}.') | ||
| raise ValueError(msg.format(z_src.ndim, z_target.ndim)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this (and following assertions) be more transparent as:
('Expecting dimensionality of source levels {} and target levels {} to be '
'identical.')
It's just moving the value of the "thing" to where the definition of the "thing" is.
| cdef np.float64_t delta, weight | ||
| cdef np.ndarray[np.float64_t, ndim = 2] weights | ||
| cdef np.ndarray[np.float64_t, ndim = 1] src_cell, tgt_cell | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be cython convention that I'm not familiar with, but I'd expect these to be "ndim=2" without spaces? Similarly for lines 35 and 36.
| PyFuncInterpolator) | ||
| from ._bounded_vinterp import (interpolate as # noqa: F401 | ||
| interpolate_conservative) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be simplified by renaming _bounded_vinterp.interpolate as _bounded_vinterp.interpolate_conservative?
|
|
||
|
|
||
| def interpolate(z_target, z_src, fz_src, axis=-1): | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've suggested renaming this in another comment. If you opt to retain the existing name, I'd suggest adding that the interpolation is conservative to this docstring?
|
Thanks @cpelley, this looks good to me. There's a few suggestions above - but I want to be clear that they are just suggestions for some potential fine tuning around the edges. |
|
Thanks @hdyson . Made all changes except the exception messages. I felt that it was clearer especially in some cases and so opted for consistency. Example: Here '-' is used to denote the dimensions which are not relevant to that statement. Not so clear if pulled into the sentence itself. |
pelson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I need to carry on reading from _conservative.pyx.
| invert_transpose = [data_transpose.index(ind) for ind in | ||
| list(range(result.ndim))] | ||
| result = result.transpose(invert_transpose) | ||
| return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments:
- Looks like this could be a class with behaviour (
conservative_interpolation). - Very similar code for the non-bounded case. Would be interesting to pull this all out into common functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
I saw these two points as being connected. I think a refactor to support caching would entail pulling both existing and conservative capability into classes. I didn't want to pre-define a structure before considering this refactor in that wider context of caching (it was easier to use a proxy to the existing API for now). With regards to common functions, as far as I can tell. the existing stratify usage is rather hardcoded to a points based approach right now.
In the refactor to support caching, I hope to have a similar mindset to what I have done here, in that we pull out the code which needn't be cython code into a python module (inc. enduser API), allowing greater flexibility and functionality to be more readily shared between points-based and bounds-based approaches. Are you happy with this mindset?
Cheers
| @@ -0,0 +1,55 @@ | |||
| import numpy as np | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No docstrings 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Phil, just added (see latest commit)
I have taken a numpy doc style approach in the python module but followed the same style forming as the existing cython module for the new conservative cython module. Let me know what you think.
Cheers
pelson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy from a maintainability, plumbing and testing perspective.
I'd be interested in seeing the 0 extrapolation behaviour changed though. Thoughts?
| source_bounds, | ||
| self.data, axis=1) | ||
| target_data = np.ones((4, 7)) | ||
| target_data[:, 0] = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 0 and not some other extrapolation scheme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels to me like 'nan' (as you say) is more appropriate here. If no source points are contributing to a target cell then there is no data so it should be 'nan'.
I can't think why I chose 0 in the first place and since @hdyson didn't pick up on this either, @hdyson is there a reason that you can remember why 0 seemed to make sense at the time but doesn't now? :)
Cheers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's going back a while. From memory (so pinch of salt and all that), I seem to recall we discussed this on the phone, but I don't remember the reasons for and against 0.
That said, this behaviour is consistent with the approach we implemented for the CMIP6 aerosols (and was in turn copied from an earlier IDL routine), so it could simply be for consistency with legacy applications. Hope that helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hdyson I'll change to 'NaN' then, we can always post-process the returned array if it proves to be necessary to the UM from such ancillaries.
|
|
||
|
|
||
| if __name__ == '__main__': | ||
| unittest.main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see any non-equally spaced coordinate tests.
|
Also, could you add a check for data values being NaN too, please? |
Yep, will do. Cheers |
|
Hey @pelson, thanks for your patience.
|
pelson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
Thanks @cpelley!
| source_bounds, data, | ||
| axis=1) | ||
| target_data = np.array( | ||
| [1/1.5, 1 + ((1/3.)/1), 0.25, 0.25, 0.25, 0.25, 1.])[None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wowzers. This messes with my mind 😄 .
Looks good though.
|
Thanks for the great review @hdyson. Really appreciated you helping to shape this PR into this really great addition. 👍 |
|
Thanks both, appreciated. |
stratify._conservative.pyx) and a python module for where it doesn't (stratify._bounded_vinterp). Cython code can be further optimised in future (no bounds checking etc.) but intentionally not done here.