Time points for daily climatology are not intuitive #4098

ehogan · 2021-04-14T11:23:17Z

📰 Custom Issue

Hello :)

I have a cube containing 4 full years of daily data. Using aggregated_by, I aggregate the cube using a "day" and "month" coordinate. This produces a daily climatology with time points that start in July 🤯 (the bounds are as expected).

Mathematically, this makes sense, since the mean of, e.g. (Jan 1, 2000), (Jan 1, 2001), (Jan 1, 2002) and (Jan 1, 2003) is in July 2001.

However, scientifically, this doesn't make sense, since a scientist would expect the first time point of a daily climatology created as above to be Jan 1.

The CF conventions state (at the end of paragraph 2 in Section 7.4): "The time coordinates should be values that are representative of the climatological time intervals, such that an application which does not recognise climatological time will nonetheless be able to make a reasonable interpretation.". I'm not sure this is happening in this case.

Would it be better to set the time points in the climatology file such that they are more representative of the climatological time intervals?

The text was updated successfully, but these errors were encountered:

rcomer · 2021-04-14T21:45:48Z

Related: #1422

ehogan · 2021-04-15T06:49:12Z

Thanks @rcomer; I did search for previous issues but apparently not very well! 😜

rcomer · 2021-08-25T17:58:40Z

Hi @ehogan, would the ability to tell it to use the max or min point help? #4295

ehogan · 2021-08-26T08:47:49Z

Thanks @rcomer; our workaround currently involves resetting the time points in the climatology cube with the time points from the first year from the input cube, so using the min point would help! :)

trexfeathers · 2021-08-26T08:50:21Z

I was thinking about this yesterday. Given Coord has the climatological attribute, this seems like a reasonably 'soluble' problem. Not sure if I have enough time though, we'll see.

trexfeathers · 2021-10-13T12:17:35Z

I've taken a brief look this morning, and here is the offending line:

iris/lib/iris/analysis/__init__.py

Line 2275 in 0af0c8d

new_points = new_bounds.mean(-1)

I had originally imagined adding a climatological parameter to Cube.aggregated_by(), but this would need to be passed down through several call 'layers' before it reached the above line. So it might instead be better to add code determining that the precursor coordinate in question looks climatological (covers multiple years but only one point within each year?), and to make sure the user expects the different behaviour via a clear docstring.

rcomer · 2021-10-13T13:13:31Z

covers multiple years but only one point within each year?

I think just "covers multiple years" would do it. I might, for example, start off with monthly means and then aggregate by season to make spring/summer/autumn/winter climatologies, each containing 3 points from each year.

IMO, what is done for aggregated_by points should also be done for collapsed points, so there is another offending line here:

iris/lib/iris/coords.py

Line 2009 in 0af0c8d

points = al.array(bounds.sum(axis=-1) * 0.5, dtype=self.dtype)

rcomer · 2021-10-13T13:34:27Z

Or if you want to make it opt-in but don't want to pass the parameter down the "plumbing", could use a context manager instead?

rcomer · 2021-11-03T10:10:01Z

If the plan is to use the upper or lower bound as the point, it may also be worth considering #3456.

Recent discussion on Met Office Yammer indicates that climatologies loaded from pp-files have the points matching the upper bound, which does cause problems in coord-categorisation. I understand that @lbdreyer may be looking into that issue, but it would be good to make the outcome of this consistent with the outcome of that I think.

rcomer · 2022-03-28T12:55:06Z

See also #4665, which raises the same problem for monthly climatologies.

mokhodge · 2022-03-28T14:30:33Z

I also have the same bug when calculating a monthly standard deviation monthly_std = cube.aggregated_by('month_number', iris.analysis.STD_DEV)

wjbenfold · 2022-03-30T15:41:35Z

In discussion earlier, @rcomer, @jamesp and I concluded that the main blocker to fixing this is a decision as to what the behaviour should be. Broadly, we shouldn't be guessing what the user wants unless they asked us to.

An option would be to ask the user to choose / make them choose. I'm aware that this is similar to forcing them to fix it afterwards, but it should make that process quicker (and we could leave the current default in place as it's probably not a bad common sense choice when we're not working with a climatology).

Would that work for you @ehogan @mokhodge? Or is there better logic that you're aware of that we could use to derive a default that's reasonable in more circumstances?

n.b. #4295 showed an implementation of the above suggestion.

ehogan · 2022-03-31T15:30:29Z

Thanks @wjbenfold, @rcomer, @jamesp! I would be happy to specify the required behaviour when creating the climatology. Would it be possible to include a climatological example in the documentation that demonstrates how to request this behaviour, please? :)

tkcollier · 2022-04-01T13:35:16Z

@wjbenfold , @rcomer , I have found this behaviour to be quite inconvenient, but I do deal with climatologies a lot. It was worse when I was new to Iris because it was only further down the line, when trying to complete another process with my data I realised the coordinate had been effectively reversed (dec was june etc).
It sounds good to put in an option to fix this. Would it also be possible to put in a warning message that let's users know that when they use this function this can happen if they don't choose otherwise?

wjbenfold · 2022-04-01T15:59:36Z

Would it also be possible to put in a warning message that lets users know that when they use this function this can happen if they don't choose otherwise?

Hi @tkcollier, I'm not sure this is a good route to take (though open to arguments to the contrary). When we add more warnings, people become less sensitised to warnings, and for a lot of people (everyone using aggregated_by on something that isn't a climatology) there will be a warning here that doesn't matter to them.

I'd hope that adding an argument to the function that lets you choose how to overcome this (with associated information in the docstring) and an example in the docs (as @ehogan requested above) would allow someone hitting this issue to follow a chain of discovery something along the lines of

"that's odd" -> checks docs for function -> learns how to fix it -> makes fix

without causing warning fatigue in those for whom it isn't relevant?

rcomer · 2022-04-01T16:32:32Z

I think @tkcollier has a good point that you don't want to find out that things are odd after you've done a lot of processing. You could conceivably have developed your code against a data set with an odd number of years - in that case the current behaviour would give you something that looked sensible in your time coordinate. Then you add a year to your data set and only now realise something is wrong.

So I think I'm leaning back toward's @trexfeathers' idea of changing the default behaviour if the operation "looks climatological". It could be configurable as well, of course.

We could add in a warning that is only thrown when the data "looks climatological", which would avoid the problem of clogging up the stderr when the warning is irrelevant. But that's basically admitting that the default behaviour isn't helpful...

mokhodge · 2022-04-04T08:00:16Z

@wjbenfold @rcomer @jamesp Specifying an opition would work for me. If it is monthly climatology then it would be fine in most use cases to just return the month time points e.g. 01, 02, 03... and the same for daily rather then providing the whole DD-MM-YYYY string.

wjbenfold · 2022-04-04T10:25:13Z

if the operation "looks climatological"

@rcomer I'm a little bit worried that guessing this would lead to hard to diagnose errors, but maybe just looking for non-contiguous ranges would do it.

just return the month time points e.g. 01, 02, 03...

@mokhodge I'm not sure what you mean here? Do you mean having a time coordinate that just gives a month but no year (and if so, how would that interact with the bounds, which have day, month and year)?

I went digging in the CF conventions some more (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#climatological-statistics) and found the examples. Based on example 7.9, it looks like the correct choice for @ehogan's original question is 16/1/2000?

ehogan · 2022-04-04T14:07:54Z

@wjbenfold example 7.9 is a seasonal climatology example; for the daily climatology example, I would expect there to be as many time points as there are days in a year (as defined by the input data), starting from Jan 1.

mokhodge · 2022-04-04T14:54:26Z

@wjbenfold I think what I meant is what @ehogan is describing. so if I give it a list of time points that have say 11 different months then I would want the climatology points to be 11 long, with a timepoint and corresponding .data for each month. If doing monthly climatology then I think the day and year returned are pretty meaningless. Although if CF conventions say it needs to return a day and year then fine but some kind of warning to describe why it has returned a time points as DD-MM-YYYY (and how they are calculated) instead of just MM would be useful as all I want to know for example is what the mean is for each month in the time series and knowing that the first month of the returned climatology corresponds to the first month in the original time series I provided it and so on.

Bonus functionality would be if I gave it a load a timeseries data say for Jan, March, October then it would return the monthly climatology with time points for Jan, March and October so Months 01, 03 and 10 rather than it returning the timepoints just months 1, 2, 3 (or 0, 1, 2 if using python indexing conventions) but actually the data for month 2 would be for March. Because I think the latter is what might happen in iris at the moment. Not sure if that makes sense - let me know.

rcomer · 2022-04-05T13:14:46Z

if the operation "looks climatological"

@wjbenfold I'm a little bit worried that guessing this would lead to hard to diagnose errors, but maybe just looking for non-contiguous ranges would do it.

I wondered about checking contiguity too, but any coordinate that doesn't have bounds is non-contiguous. Also you might average over a long contiguous time series to get an annual climatology. The simplest thing would be to just check whether the upper and lower output bounds are more than a year apart, and declare a climatology in that case. I'm racking my brains to think of an example where that would give a problematic false positive...

@mokhodge If doing monthly climatology then I think the day and year returned are pretty meaningless.

For you use-case, could you just ignore the output time coordinate and use the month number coordinate instead?

wjbenfold · 2022-04-25T11:22:15Z

I've been thinking a bit more about this, and I'm drawn to a climatological argument to Cube.aggregated_by that causes it to set the points based on the point in the first cell that's contributing. It can then set the climatological boolean on any coordinates that have been treated as such whilst it's doing the aggregation.

should also be done for collapsed points

@rcomer would you see this as meaning something needed to change for collapsed points too? I've just about got my head around climatological aggregation but can't tell if you can also do a climatological collapse.

rcomer · 2022-04-25T12:31:38Z

@wjbenfold if I only need a climatology for DJF, then I would just load all the data for December, January and February and collapse across the lot. I would expect the point of the resulting scalar time coordinate to be consistent with having done aggregated_by("season", iris.analysis.MEAN) on a complete data set.

Having said that, if we're adding an optional keyword rather than changing the default behaviour then I don't think the inconsistency would matter as much.

ehogan added the New: Issue label Apr 14, 2021

bjlittle added the Peloton 🚴‍♂️ Target a breakaway issue to be caught and closed by the peloton label May 26, 2021

bjlittle added this to Backlog in Peloton via automation May 26, 2021

pp-mo assigned pp-mo and trexfeathers and unassigned pp-mo Aug 25, 2021

rcomer mentioned this issue Aug 25, 2021

Straw Man: make collapsed points configurable #4295

Closed

rcomer mentioned this issue Oct 13, 2021

Unexpected datetimes obtained when producing monthly mean climatology data #1422

Closed

rcomer mentioned this issue Mar 28, 2022

aggregated_by() - time coords do not match expected output. #4665

Closed

trexfeathers removed their assignment Apr 27, 2022

trexfeathers removed the New: Issue label Apr 27, 2022

wjbenfold self-assigned this Apr 28, 2022

wjbenfold mentioned this issue Apr 29, 2022

More sensible climatological time points #4723

Merged

rcomer closed this as completed in #4723 May 11, 2022

Peloton automation moved this from Backlog to Done May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time points for daily climatology are not intuitive #4098

Time points for daily climatology are not intuitive #4098

ehogan commented Apr 14, 2021

rcomer commented Apr 14, 2021

ehogan commented Apr 15, 2021

rcomer commented Aug 25, 2021

ehogan commented Aug 26, 2021

trexfeathers commented Aug 26, 2021

trexfeathers commented Oct 13, 2021

rcomer commented Oct 13, 2021

rcomer commented Oct 13, 2021

rcomer commented Nov 3, 2021

rcomer commented Mar 28, 2022

mokhodge commented Mar 28, 2022

wjbenfold commented Mar 30, 2022

ehogan commented Mar 31, 2022

tkcollier commented Apr 1, 2022

wjbenfold commented Apr 1, 2022

rcomer commented Apr 1, 2022

mokhodge commented Apr 4, 2022

wjbenfold commented Apr 4, 2022

ehogan commented Apr 4, 2022

mokhodge commented Apr 4, 2022

rcomer commented Apr 5, 2022

wjbenfold commented Apr 25, 2022

rcomer commented Apr 25, 2022

Time points for daily climatology are not intuitive #4098

Time points for daily climatology are not intuitive #4098

Comments

ehogan commented Apr 14, 2021

📰 Custom Issue

rcomer commented Apr 14, 2021

ehogan commented Apr 15, 2021

rcomer commented Aug 25, 2021

ehogan commented Aug 26, 2021

trexfeathers commented Aug 26, 2021

trexfeathers commented Oct 13, 2021

rcomer commented Oct 13, 2021

rcomer commented Oct 13, 2021

rcomer commented Nov 3, 2021

rcomer commented Mar 28, 2022

mokhodge commented Mar 28, 2022

wjbenfold commented Mar 30, 2022

ehogan commented Mar 31, 2022

tkcollier commented Apr 1, 2022

wjbenfold commented Apr 1, 2022

rcomer commented Apr 1, 2022

mokhodge commented Apr 4, 2022

wjbenfold commented Apr 4, 2022

ehogan commented Apr 4, 2022

mokhodge commented Apr 4, 2022

rcomer commented Apr 5, 2022

wjbenfold commented Apr 25, 2022

rcomer commented Apr 25, 2022