-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable custom statistics to return multiple results #3904
Comments
So, if I've understood, you start with a cube that is (time, latitude, longitude), and you want to end up with a cube that is (durations, latitude, longitude), having done your calculation over time at each grid point. The problem is that the standard iris We do have the PercentileAggregator class, which has the capacity to add a "percent" dimension if you want to calculate more than one percentile. So we know that it is possible to add dimensions. That class is hard-coded to calculate percentiles though so, if you wanted to make use of it to calculate some other dimension-adding statistic, I think you'd need to subclass it. It also isn't even listed in the docs. So possibly what we need here is to generalise |
Having said that, this particular statistic presumably needs information from the time coordinate. I think all the existing aggregation calculations only use the cube data. 🤔 |
The threshold exceedance duration may live without information from the time coordinate for the time being. The PercentileAggregator would deliver on what I expected for starters to be an easy operation. For a generalization later, more complex combination of meta data is a possibility but that can wait. |
Perhaps it is easier if the shape of the tuple to be returned is set at the beginning. I.e it could be the list of linear regression coefficients, or the first 4 moments of normal distribution or the list of percentiles as in the Percentil Aggregator or a list of durations in time units. |
@rcomer Fancy taking this on? |
Hey @bjlittle, sorry I think I'd struggle to justify time on this one. My PRs generally fall into two categories:
While this one doesn't look huge, it looks like more that a 5 min job. |
While digging to find something else, I noticed that Here be dragons. |
Note that #3901 also makes changes to the percentile aggregator, so it may be better to wait until that is resolved before starting work on this. Otherwise we could create some nasty code conflicts. |
Hi @berndbecker, sorry for the delay on this - it's both difficult and slightly niche! Is it still something you'd be interested in seeing in Iris? If you think others would also be interested, we encourage you and them to try out the new voting feature. |
Hi Martin,
Nice to hear from you! This feature request fits with others working on threshold exceedance, percentiles, etc.
So much functionality is nearly there so it could be very rewarding, with some effort , to
Make this happen.
Albeit, for now, I am working on clustering on single point time series.
Dismantling a cube to a single time series, running the clustering and reassembling a cube
From the single point results is painful and fraught with error.
Having the facility described in the #3904 would come in handy here as well.
People are shouting out for something similar here as well:
https://web.yammer.com/main/threads/eyJfdHlwZSI6IlRocmVhZCIsImlkIjoiMTYyODUwMzA0OTkwNDEyOCJ9?search=aggregator&groupScope=eyJfdHlwZSI6Ikdyb3VwIiwiaWQiOiIxMDU5MjUyMCJ9
All the best,
Bernd.
From: Martin Yeo ***@***.***>
Sent: 06 April 2022 10:42
To: SciTools/iris ***@***.***>
Cc: Becker, Bernd ***@***.***>; Mention ***@***.***>
Subject: Re: [SciTools/iris] custom statistic to return a tuple rather than a scalar (#3904)
This email was received from an external source. Always check sender details, links & attachments.
Hi @berndbecker<https://github.com/berndbecker>, sorry for the delay on this - it's both difficult and slightly niche! Is it still something you'd be interested in seeing in Iris?
—
Reply to this email directly, view it on GitHub<#3904 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AQIRTJB2QEMAK2A6YFAXTGTVDVL45ANCNFSM4SHPX5FA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@wjbenfold has #4676 to implement an aggregator for number of days of data matching certain criteria (e.g. above a threshold), which I think addresses that Yammer thread. However, it would only handle a single threshold value at a time I think. |
I'm currently intending that it can handle being between two thresholds (or any other criterion you can write as a lambda) but only one condition at a time, yes |
I just changed the title to something a bit more general.
From an efficiency point of view, it is always possible to make multiple statistical cubes, and use the CubeList.realise_data method to efficiently calculate multiple statistics over the same data. |
✨ Feature Request
Make custom statistic return a tuple rather than a scalar.
MISSION, store a vector of threshold exceedances of increasing duration
at each gridpoint in liew for the time domain. (much shorter)
In the example
https://scitools-iris.readthedocs.io/en/stable/generated/gallery/general/plot_custom_aggregation.html
a single number is returned at each gridpoint. I am after functionality that returns more than one value for each grid point.
Motivation
Not sure if this is an issue, but I have colleagues who calculated threschold exceedance durations at great pains. Feedback on my request from an AVD surgery was also pointing to hightened frustration as to how complicated "this" is. With this I mean
doing something on a time series, stored at each grid point (3-D cube) and retaining a set of numbers rather than collapsing the time dimension to just one (max, min, mean) number.
I'm always frustrated when something is almost doable but does not quite work and
you have to go all the way back and do it with a sledge hammer.
Additional context
Click to expand this section...
I need a push to understand custom statistics better.In the attached example
( run with module load scitools/experimental-current,
python /net/home/h02/frtm/prog/wcssp/wcssp5/scripts/ts_exceedance.py)
I am compiling a threshold exceedance duration or survival function
For rainfall time series. Asking how many rainy periods were longer than 1, 2, ....5. and so on days.
This works for a demonstrator on a single time series.
Next I would like to run the same custom statistic at each grid point as in
https://scitools.org.uk/iris/docs/latest/examples/General/custom_aggregation.html#general-custom-aggregation
But I struggle to understand the shape of data being passed to aggregator, what should axis be?
And I have no idea how to store the survivers vector over the time series dimension.
But I am convinced it is not really that difficult.
See here for further details.
The text was updated successfully, but these errors were encountered: