New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time points for daily climatology are not intuitive #4098
Comments
Related: #1422 |
Thanks @rcomer; I did search for previous issues but apparently not very well! 😜 |
Thanks @rcomer; our workaround currently involves resetting the time points in the climatology cube with the time points from the first year from the input cube, so using the min point would help! :) |
I was thinking about this yesterday. Given |
I've taken a brief look this morning, and here is the offending line: iris/lib/iris/analysis/__init__.py Line 2275 in 0af0c8d
I had originally imagined adding a |
I think just "covers multiple years" would do it. I might, for example, start off with monthly means and then aggregate by season to make spring/summer/autumn/winter climatologies, each containing 3 points from each year. IMO, what is done for Line 2009 in 0af0c8d
|
Or if you want to make it opt-in but don't want to pass the parameter down the "plumbing", could use a context manager instead? |
If the plan is to use the upper or lower bound as the point, it may also be worth considering #3456. Recent discussion on Met Office Yammer indicates that climatologies loaded from pp-files have the points matching the upper bound, which does cause problems in coord-categorisation. I understand that @lbdreyer may be looking into that issue, but it would be good to make the outcome of this consistent with the outcome of that I think. |
See also #4665, which raises the same problem for monthly climatologies. |
I also have the same bug when calculating a monthly standard deviation |
In discussion earlier, @rcomer, @jamesp and I concluded that the main blocker to fixing this is a decision as to what the behaviour should be. Broadly, we shouldn't be guessing what the user wants unless they asked us to. An option would be to ask the user to choose / make them choose. I'm aware that this is similar to forcing them to fix it afterwards, but it should make that process quicker (and we could leave the current default in place as it's probably not a bad common sense choice when we're not working with a climatology). Would that work for you @ehogan @mokhodge? Or is there better logic that you're aware of that we could use to derive a default that's reasonable in more circumstances? n.b. #4295 showed an implementation of the above suggestion. |
Thanks @wjbenfold, @rcomer, @jamesp! I would be happy to specify the required behaviour when creating the climatology. Would it be possible to include a climatological example in the documentation that demonstrates how to request this behaviour, please? :) |
@wjbenfold , @rcomer , I have found this behaviour to be quite inconvenient, but I do deal with climatologies a lot. It was worse when I was new to Iris because it was only further down the line, when trying to complete another process with my data I realised the coordinate had been effectively reversed (dec was june etc). |
Hi @tkcollier, I'm not sure this is a good route to take (though open to arguments to the contrary). When we add more warnings, people become less sensitised to warnings, and for a lot of people (everyone using I'd hope that adding an argument to the function that lets you choose how to overcome this (with associated information in the docstring) and an example in the docs (as @ehogan requested above) would allow someone hitting this issue to follow a chain of discovery something along the lines of "that's odd" -> checks docs for function -> learns how to fix it -> makes fix without causing warning fatigue in those for whom it isn't relevant? |
I think @tkcollier has a good point that you don't want to find out that things are odd after you've done a lot of processing. You could conceivably have developed your code against a data set with an odd number of years - in that case the current behaviour would give you something that looked sensible in your time coordinate. Then you add a year to your data set and only now realise something is wrong. So I think I'm leaning back toward's @trexfeathers' idea of changing the default behaviour if the operation "looks climatological". It could be configurable as well, of course. We could add in a warning that is only thrown when the data "looks climatological", which would avoid the problem of clogging up the stderr when the warning is irrelevant. But that's basically admitting that the default behaviour isn't helpful... |
@wjbenfold @rcomer @jamesp Specifying an opition would work for me. If it is monthly climatology then it would be fine in most use cases to just return the month time points e.g. 01, 02, 03... and the same for daily rather then providing the whole DD-MM-YYYY string. |
@rcomer I'm a little bit worried that guessing this would lead to hard to diagnose errors, but maybe just looking for non-contiguous ranges would do it.
@mokhodge I'm not sure what you mean here? Do you mean having a time coordinate that just gives a month but no year (and if so, how would that interact with the bounds, which have day, month and year)? I went digging in the CF conventions some more (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#climatological-statistics) and found the examples. Based on example 7.9, it looks like the correct choice for @ehogan's original question is 16/1/2000? |
@wjbenfold example 7.9 is a seasonal climatology example; for the daily climatology example, I would expect there to be as many time points as there are days in a year (as defined by the input data), starting from Jan 1. |
@wjbenfold I think what I meant is what @ehogan is describing. so if I give it a list of time points that have say 11 different months then I would want the climatology points to be 11 long, with a timepoint and corresponding .data for each month. If doing monthly climatology then I think the day and year returned are pretty meaningless. Although if CF conventions say it needs to return a day and year then fine but some kind of warning to describe why it has returned a time points as DD-MM-YYYY (and how they are calculated) instead of just MM would be useful as all I want to know for example is what the mean is for each month in the time series and knowing that the first month of the returned climatology corresponds to the first month in the original time series I provided it and so on. Bonus functionality would be if I gave it a load a timeseries data say for Jan, March, October then it would return the monthly climatology with time points for Jan, March and October so Months 01, 03 and 10 rather than it returning the timepoints just months 1, 2, 3 (or 0, 1, 2 if using python indexing conventions) but actually the data for month 2 would be for March. Because I think the latter is what might happen in iris at the moment. Not sure if that makes sense - let me know. |
I wondered about checking contiguity too, but any coordinate that doesn't have bounds is non-contiguous. Also you might average over a long contiguous time series to get an annual climatology. The simplest thing would be to just check whether the upper and lower output bounds are more than a year apart, and declare a climatology in that case. I'm racking my brains to think of an example where that would give a problematic false positive...
For you use-case, could you just ignore the output time coordinate and use the month number coordinate instead? |
I've been thinking a bit more about this, and I'm drawn to a
@rcomer would you see this as meaning something needed to change for |
@wjbenfold if I only need a climatology for DJF, then I would just load all the data for December, January and February and collapse across the lot. I would expect the point of the resulting scalar time coordinate to be consistent with having done Having said that, if we're adding an optional keyword rather than changing the default behaviour then I don't think the inconsistency would matter as much. |
📰 Custom Issue
Hello :)
I have a cube containing 4 full years of daily data. Using
aggregated_by
, I aggregate the cube using a "day" and "month" coordinate. This produces a daily climatology with time points that start in July 🤯 (the bounds are as expected).Mathematically, this makes sense, since the mean of, e.g. (Jan 1, 2000), (Jan 1, 2001), (Jan 1, 2002) and (Jan 1, 2003) is in July 2001.
However, scientifically, this doesn't make sense, since a scientist would expect the first time point of a daily climatology created as above to be Jan 1.
The CF conventions state (at the end of paragraph 2 in Section 7.4): "The time coordinates should be values that are representative of the climatological time intervals, such that an application which does not recognise climatological time will nonetheless be able to make a reasonable interpretation.". I'm not sure this is happening in this case.
Would it be better to set the time points in the climatology file such that they are more representative of the climatological time intervals?
The text was updated successfully, but these errors were encountered: