-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All-lazy statistics #2418
Comments
Providing a Lazy-Only-Aggregator is really pretty easy. However, the way the real/lazy split is handled explicitly in existing cube operations is currently rather messy, and properly simplifying all that might be rather more work. N.B. there are just 3 lazy cube-stats operations at present : |
@pp-mo surely this has been completed by now? |
Absolutely not : it would ideally mean that |
I believe I'm up for doing this, but it isn't clear on the parts that remain. @pp-mo - could you produce a list of the things that, when done, would allow us to close this issue? |
If this were done, would the operators still accept user-defined non-lazy aggregators? |
It would be relatively easy to turn ufunc type aggregators into lazy ones if it wasn't done out of the box (though I'd expect to be able to pass non-lazy aggregators and iris would defer the call to the aggregator until the correct point). Do you have concerns for aggregators similar to |
It's not something I've done recently, but I have written my own aggregators in the past. Most obvious example is the Even for the most straight-forward cases where a Dask equivalent to a numpy/scipy function exists, there is a convenience of just grabbing the numpy/scipy version and creating an aggregator from it. If a user has a relatively small data set, learning about Dask and its functions seems like unnecessary overhead. |
Agreed. Ideally we would be able to take any aggregator and make it lazy (though not out of core / parallelised). |
Added a summary list of things to do (edited into the top description box). |
I believe implementing this would fix #3190. |
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity. If this issue is still important to you, then please comment on this issue and the stale label will be removed. Otherwise this issue will be automatically closed in 28 days time. |
This stale issue has been automatically closed due to a lack of community activity. If you still care about this issue, then please either:
|
Ideally, make all stats calculate via dask, instead of requiring a alternative 'real data' algorithm.
Wishlist:
aggregated_by
should not realise source cube Can we keep aggregation input cube lazy? #2928 )collapse
,aggregated_by
+rolling_window
are the only existing stats methodsThe text was updated successfully, but these errors were encountered: