All-lazy statistics #2418

pp-mo · 2017-03-07T11:12:31Z

Ideally, make all stats calculate via dask, instead of requiring a alternative 'real data' algorithm.

Wishlist:

retain support for non-lazy user functions : automatically (?) convert to lazy ?
when stat is computed, don't realise the source cube
- ( see also: aggregated_by should not realise source cube Can we keep aggregation input cube lazy? #2928 )
mostly about Aggregators, but also ..
- consider 'aggregated_by' as well as 'collapse' ..
- .. ? and rolling_window ?
- N.B. collapse, aggregated_by + rolling_window are the only existing stats methods
(vague objective) make user-defined custom aggregators as simple as possible
simplify mechanism of Aggregator creation so only one calculation function is supplied (lazy or not)
(internal code hygiene) simplify cube / aggregator contract : AFAP move detail away from cube code into generic Aggregator code

The text was updated successfully, but these errors were encountered:

pp-mo · 2017-03-07T11:33:11Z

Providing a Lazy-Only-Aggregator is really pretty easy.
E.G some quick proof-of-priniciple code example here:
https://gist.github.com/pp-mo/12f97ec50ebc3e6f5c90cbf6d271d41c

However, the way the real/lazy split is handled explicitly in existing cube operations is currently rather messy, and properly simplifying all that might be rather more work.

N.B. there are just 3 lazy cube-stats operations at present :
collapsed, aggregated_by and rolling_window.

lbdreyer · 2017-07-04T12:06:23Z

@pp-mo surely this has been completed by now?

pp-mo · 2017-07-04T14:56:05Z

surely this has been completed by now?

Absolutely not : it would ideally mean that Aggregators have only an aggregate method and not an alternative lazy_aggregate, or at the least that the "main" one should call the "lazy" one.

pelson · 2017-10-27T08:11:42Z

I believe I'm up for doing this, but it isn't clear on the parts that remain. @pp-mo - could you produce a list of the things that, when done, would allow us to close this issue?

rcomer · 2017-11-24T14:58:26Z

If this were done, would the operators still accept user-defined non-lazy aggregators?

pelson · 2018-01-03T14:20:06Z

If this were done, would the operators still accept user-defined non-lazy aggregators?

It would be relatively easy to turn ufunc type aggregators into lazy ones if it wasn't done out of the box (though I'd expect to be able to pass non-lazy aggregators and iris would defer the call to the aggregator until the correct point). Do you have concerns for aggregators similar to median, which cannot be done out of core in a single-pass?

rcomer · 2018-01-03T15:57:28Z

It's not something I've done recently, but I have written my own aggregators in the past. Most obvious example is the iris.analysis.WPERCENTILE which I originally wrote for use in my team. Here is the underlying numpy function. Would it need re-writing?

Even for the most straight-forward cases where a Dask equivalent to a numpy/scipy function exists, there is a convenience of just grabbing the numpy/scipy version and creating an aggregator from it. If a user has a relatively small data set, learning about Dask and its functions seems like unnecessary overhead.

pelson · 2018-01-03T17:13:29Z

Agreed. Ideally we would be able to take any aggregator and make it lazy (though not out of core / parallelised).

pp-mo · 2018-10-09T09:24:41Z

Added a summary list of things to do (edited into the top description box).
This also refers to 'aggregated_by' -- previously also discussed under #2928

rcomer · 2020-09-28T13:16:37Z

I believe implementing this would fix #3190.

github-actions · 2022-02-11T00:57:27Z

In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity.

If this issue is still important to you, then please comment on this issue and the stale label will be removed.

Otherwise this issue will be automatically closed in 28 days time.

github-actions · 2022-03-12T00:10:58Z

This stale issue has been automatically closed due to a lack of community activity.

If you still care about this issue, then please either:

Re-open this issue, if you have sufficient permissions, or
Add a comment pinging @SciTools/iris-devs who will re-open on your behalf.

DPeterK changed the title ~~All-lazy statistics~~ All-lazy statistics (5 man days' effort) Apr 12, 2017

pp-mo mentioned this issue Oct 3, 2017

Consider Option: All-Dask approach for Aggregators #2399

Closed

pelson changed the title ~~All-lazy statistics (5 man days' effort)~~ All-lazy statistics Oct 27, 2017

pelson added Type: Performance Release: Minor Status: Needs Info labels Oct 27, 2017

pelson mentioned this issue Jan 3, 2018

Can we keep aggregation input cube lazy? #2928

Closed

zklaus mentioned this issue Feb 25, 2019

Add lazy aggregated_by #3280

Closed

pp-mo mentioned this issue Mar 7, 2019

Weighted mean in cube.aggregated_by #3290

Closed

rcomer mentioned this issue Sep 28, 2020

problem with aggregated_by and mdtol and masks with iris2.1 and numpy1.15 #3190

Closed

github-actions bot added the Stale A stale issue/pull-request label Feb 11, 2022

github-actions bot closed this as completed Mar 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All-lazy statistics #2418

All-lazy statistics #2418

pp-mo commented Mar 7, 2017 •

edited

Loading

pp-mo commented Mar 7, 2017 •

edited

Loading

lbdreyer commented Jul 4, 2017

pp-mo commented Jul 4, 2017

pelson commented Oct 27, 2017

rcomer commented Nov 24, 2017

pelson commented Jan 3, 2018

rcomer commented Jan 3, 2018

pelson commented Jan 3, 2018

pp-mo commented Oct 9, 2018

rcomer commented Sep 28, 2020

github-actions bot commented Feb 11, 2022

github-actions bot commented Mar 12, 2022

All-lazy statistics #2418

All-lazy statistics #2418

Comments

pp-mo commented Mar 7, 2017 • edited Loading

pp-mo commented Mar 7, 2017 • edited Loading

lbdreyer commented Jul 4, 2017

pp-mo commented Jul 4, 2017

pelson commented Oct 27, 2017

rcomer commented Nov 24, 2017

pelson commented Jan 3, 2018

rcomer commented Jan 3, 2018

pelson commented Jan 3, 2018

pp-mo commented Oct 9, 2018

rcomer commented Sep 28, 2020

github-actions bot commented Feb 11, 2022

github-actions bot commented Mar 12, 2022

pp-mo commented Mar 7, 2017 •

edited

Loading

pp-mo commented Mar 7, 2017 •

edited

Loading