Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interpretation of negative years in the units attribute #298

Closed
peterkuma opened this issue Sep 16, 2020 · 85 comments
Closed

Interpretation of negative years in the units attribute #298

peterkuma opened this issue Sep 16, 2020 · 85 comments
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@peterkuma
Copy link

peterkuma commented Sep 16, 2020

I have encountered an issue with using a negative year in the units attribute of time variables in NetCDF files. The interpretation in cftime (Python) is that year zero does not exist, while in other software such as Panoply the interpretation is that year zero exists. This affects how the time variable is read and displayed, and effectively causes one year difference between the different implementations. The CF Conventions (Section 4.4. Time Coordinate) do not explicitly state how negative years should be treated, except for stating that year 0 has a special meaning. On the contrary, ISO 8601, seems to be more on the side of including year 0.

In particular, this issue comes up when using Julian date in NetCDF files, which has a reference time of 1 January 4713 BCE, 12:00 UTC. As of now it is impossible to use it in NetCDF files and get consistent results in Python (through the netCDF4 package) and Panoply.

I suppose there are multiple possible solutions to the problem. Either all implementations start using the same method of counting
negative years (and it would be helpful if the CF Conventions make this unambiguous), or there would have to be information about the year numbering convention included in the NetCDF file, such as a new attribute or an indicator included in the units or calendar attributes.

Related issue in cftime: #200.

@peterkuma peterkuma added the defect Conventions text meaning not as intended, misleading, unclear, has typos, format or language errors label Sep 16, 2020
@Dave-Allured
Copy link
Contributor

@peterkuma, CF has never standardized the meaning of year zero or negative years. As you have discovered, different software developers have implemented various and conflicting interpretations.

Best practice is to encode all timekeeping so that zero and negative are never encountered in either recorded times or in the reference time in the units attribute. Furthermore, when recording modern real world times, use only years later than 1582 in both positions, to avoid the Julian/Gregorian crossover. Please see CF section Time Coordinate for more details and caveats.

For most applications, I recommend a reference date of January 1, hour zero, of the first year of your data domain, or some other round year number close to but earlier than the start.

It seems like you have an application that is well aware of correct real world dates and times when the data are originally being recorded. Is there something that would prevent this application from encoding times in this unambiguous way, relative to a modern reference time?

@peterkuma
Copy link
Author

@Dave-Allured, thank you for your response. As you say the issue can be avoided by using a reference time after 1582, which is fine for end users who can choose the reference time. However, I think the situation for the developers of generic software which uses NetCDF is still unresolved, because they don't have a concrete guidance on how to use these reference times. My proposal would be to add text to the CF Conventions saying how negative years should be interpreted, even if it is something as short as:

"When the year of the reference time is negative, year 0 is not counted in calculations involving this reference time."

(or a similar statement), appended to the second paragraph of Section 4.4. Time Coordinate.

To answer your question, I write relatively generic software for use with climate observations and climate modeling, which doesn't know what time range is going to be used by the user. I prefer to use Julian date everywhere in my code because it makes it easy to perform any calculations when all time variables have the same reference time. It would be great if NetCDF had a good support for this use case.

@Dave-Allured
Copy link
Contributor

@peterkuma, I share your interest in standardizing the treatment of zero and negative years. However, I am afraid your use case may not appropriate for this task. My experience so far is that almost all climate-related obs and model data sets that might use CF encoding are in the domain of only positive year numbers. I speculate that most climate model makers deliberately avoided zero and negative years because of this uncertainty. I might be wrong, I have not checked lately.

You are writing generic software for climate obs and modeling. If you plan to use a fixed reference time for internal software purposes, then I suggest 1 January +0001 00:00, rather than the astronomical calendar base that you mentioned, to reduce problems. Take appropriate care with the Julian/Gregorian discontinuity.

If that is not satisfactory, then I am glad to keep discussing a CF amendment.

@martinjuckes
Copy link
Contributor

martinjuckes commented Sep 23, 2020

@Dave-Allured : I think this does deserve some further discussion because the study of climate in the distant past is a very important part of climate science, even if it is small in scale compared to the study of present day climate. Our problem is that the distant past is more important for our community than it is for many of the people who work on standards for dates and times.

The NetCDF4 library may have resolved this for us, unless we make an active decision to depart from the treatment of negative reference times in NetCDF4. For example, if I create a file from the following CDL using ncgen (from version 4.6 of the NetCDF library):

netcdf time_ex01 {
dimensions:
	time = 3 ;
variables:
	double time(time) ;
		time:standard_name = "time" ;
		time:units = "days since -0001-01-01" ;
		time:calendar = "standard" ;

// global attributes:
	:Conventions = "CF-1.7" ;
data:
time = 0, 365, 731 ;
}

and then run ncdump -t, the resulting time values are given as: time = "-0001-01-01", "0000-01-01", "0001-01-01" ;. i.e. NetCDF4 interprets 0001-01-01 as being two years after -0001-01-01. Unfortunately, this is not well documented in the NetCDF User Guide, but it is unambiguous. I think this takes precedence over the cftime library, because we are trying to build on top of the NetCDF data model ... but there may be other interpretations of that.

@JonathanGregory
Copy link
Contributor

This has been discussed in CF before but without conclusion. It would certainly be useful to adopt a convention for it, because there are use-cases, as @peterkuma demonstrates. It's not a problem with the reference year itself, but with the definition of the calendar. Given the lengthy debates about what "calendar" means when we were discussing leap-seconds, I use that word with some nervousness! I mean by "calendar" the set of valid dates (DD-MM-YYYY), which is implied by the choice of the calendar attribute in CF.

In the standard calendar, as we all know, there is no year between 1 AD (CE) and 1 BC (BCE). I suppose it's because year 0 doesn't exist that COARDS chose year 0 to indicate climatological time. (CF supports that convention for compatibility with COARDS, which only deals with the real-world standard calendar.) I'm interested to see what @martinjuckes reports about NetCDF-4. If you accept 0 as a valid year number, it means you have to write 2 BC as year -1, 3 BC as year -2, etc. That seems rather confusing to me, and likely to lead to mistakes. However, it seems that this is what is done for the proleptic Julian calendar, which is used in astronomy. Wikipedia says, "year 1 of the Julian Period was 4713 BC (−4712)." It seems that there is a year 0 in that calendar. Is that correct? For model calendars, I guess that year zero probably does exist, because it's an inconvenience to arithmetic if you leave it out!

If we decide there isn't a well-defined best answer, and there are divergent use-cases, we could define different CF calendars with and without year zero.

@dopplershift
Copy link

I'm pretty sure that netCDF-C's support of dates is really just for minimal convenience and not intended to be any kind of standard. I'll poke @WardF and @lesserwhirls to chime in here...

@JonathanGregory
Copy link
Contributor

It might be reasonable to define the standard calendar as not permitting years less than 1. Whether or not year 0 exists is a choice for the proleptic calendars, I think.

@martinjuckes
Copy link
Contributor

I initially liked Jonathan's idea of introducing a new proleptic calendar(s) to make it explicit when people care about the interpretation of negative years in the reference time, but, after thinking over the point discussed below, I would prefer to use a qualifier, e.g. proleptic_gregorian cardinal

If you take an etymological approach, I think AD 1 refers, in effect, to "year 1": i.e. it is a reference of a time period, not a point in time. Similarly, 1 BC is "year 1 before". From that perspective, it is natural that there is no AD 0 or 0 BC. UTC adopts the convention that 2020-01-01 00:00:00 refers to the start of AD 2020, but says nothing about negative times.

The question, I think, is not "Is there a year zero in the calendar?" but rather "How do we encode a given calendar year in the time stamp?".

Given that positive YYYY in the datestamp is universally understood to refer to AD YYYY, arguments can be made for interpreting -YYYY as either and extension backwards treating the years as a sequence of integers, or as 1 BC. This is an encoding choice. For this reason, I would not modify the calendar, but instead add an optional qualifier to identify the convention for encoding BC years in the timestamp.

I'm not sure whether there is a good mnemonic term, but I suggest proleptic_gregorian cardinal to refer to use of cardinals, -1, 0, 1 rather than ordinals, 2nd before, 1st before, 1st after.

@lesserwhirls
Copy link
Contributor

netCDF-C support for dates/times/calendars is limited to the ncdump utility, and I believe only as a convenience (nctime0.h isn't installed as part of the library). It has been extended over time to be more flexible with respect to both UDUNITS as well as to better operate with different calendars. I would strongly recommend that its behavior with respect to time be viewed as any other netCDF based client lost in the fog of the various real and imaginary calendars, and not be viewed as a sort of reference implementation (or even guidance) regarding date/time/calendar handling for netCDF files generically.

@peterkuma
Copy link
Author

I really appreciate that others have joined the discussion. From my understanding the ISO 8601 standard should be considered as a mapping to the underlying calendar, i.e. year 0 and -1 in ISO 8601 are mapped to year 1 BCE and 2 BCE, respectively. In that sense, there is no conflict between ISO 8601 and the calendar, even though it is slightly confusing. The ISO 8601 document itself is probably a better source of information than Wikipedia. In section B.1 (Date and time representations) of the ISO 8601:2004 version it has an example:

Basic format Extended format Explanation
−00020412 −0002-04-12 Expanded; four digits to represent the year. The twelfth of April in the second year before the year [0000]

Other than that it is relatively short of good explanation about how negative years should be treated. Two other relevant fragments in the document are:

  • "NOTE In the proleptic Gregorian calendar, the calendar year [0000] is a leap year." (Section 3.2.1 The Gregorian calendar)

  • "By mutual agreement of the partners in information interchange, it is permitted to expand the component identifying the calendar year, which is otherwise limited to four digits. This enables reference to dates and times in calendar years outside the range supported by complete representations, i.e. before the start of the year [0000] or after the end of the year [9999]." (Section 3.5 Expansion)

The newer version of the ISO standard from 2019 defines extensions in ISO 8601-2:2019:

  • "In ISO 8601-1, a negative calendar year cannot be expressed for the years -0001 through -9999. This subclause removes that restriction. ... When a negative calendar year represents a date, the negative value is to represent the number of years prior to year zero (0) (year zero is expressed as '0000' in implicit form and '0Y' in explicit form." (Section 4.4.1.2 Calendar year)

  • "The calendar year before year one (1) is represented as follows:
    Explicit:
    [yearE]["B"]
    EXAMPLE 1 '1YB' is the first year before year one (1), equivalent to the effect of '0Y'.
    EXAMPLE 2 '12YB' is the twelfth year before year one (1), equivalent to the effect of '-11Y'."
    (Section 4.4.1.6 Calendar year before year one (1))

Therefore, it looks like both ways of counting/not counting year 0 are supported by the standard, and they are distinguished by adding "YB" to the year number without leading zeros in the "explicit form".

I think it would be desirable to follow ISO 8601 (but I am not sure about the "YB" form because it may be too complicated to parse all variations of the format), unless there are historical reasons not to, such as how the handling of negative years has been implemented in udunits. I will try to do a short survey of how existing NetCDF libraries implement zero and negative years.

I have been in contact with the developer of Panoply (Robert B. Schmunk, @rschmunk - I hope that is his GitHub username), who said that handling of dates in Panoply follows standard Java classes, which is according to the ISO 8601 standard of including year 0 in the calculations.

@Dave-Allured, I wouldn't be trying to only solve my use case here. I know I could work around it easily by choosing a reference time with a positive year. I am quite interested in solving the issue for all users of NetCDF, as much as I can contribute to the discussion. It looks like there is enough interest from others too.

@martinjuckes
Copy link
Contributor

@peterkuma thanks for the detail, I was looking at ISO 8601-2014 (accessed in 2018) ... and the treatment of years -9999 to 0 has clearly be considerably enhanced in the 2019 version, especially in the extension (8601-2). I've just downloaded a new version.

The NetCDF Java library has quite extensive calendar support .. but the only thing I could find in the NUG that our convention references was a link to the CDL functionality which I've illustrated above. I agree completely with @lesserwhirls that the NetCDF libraries, and libraries in general, should not be treated as a standard or convention, but we have to consider the consequences of recommending anything that conflicts with a widely used library.

Using the new explicit ISO 8601 form for dates may help to make the distinction between the two mappings clearer, with 1YB equivalent to 1 BC and 0000-01-01 as 1 year before 0001-01-01. A complete date and time looks like this: 1985Y4M12DT23H20M30S, which would require some extra parsing code. There is, however, a lot of flexibility in the ISO standard, as you might expect. I have doubts about an approach which would require users to deal with the full range of options.

@JonathanGregory
Copy link
Contributor

Thank you, @peterkuma, for the useful information, and to @martinjuckes for his correction to the question - not "Is there a year zero in the calendar?" but rather "How do we encode a given calendar year in the time stamp?". I think these are linked in the CF convention. To be precise, I think we could say that the CF calendar attribute indicates both the set of valid dates in the calendar (YYYY-MM-DD) and the specification for mapping a valid date-time to a unique number (i.e. the encoding of date-time as a time coordinate). Though I appreciate Martin's point that these are two concepts, which is why he prefers a qualifier, since you must have both and there are few possible combinations it feels more robust to me to keep an indivisible attribute for them.

Therefore I suggest that for the default/gregorian/standard calendar, we recommend that the reference date should not be earlier than year 1, and that no date earlier than year 1 should be encoded in this calendar, to avoid the ambiguity. That means the CF-checker would produce a warning (but it's not an error). We shouldn't retrospectively define what negative years mean because we don't know the intention of existing data. I suggest that we define two new calendars:

  • gregorian_zero, in which year 0 means 1 BC, and any year is allowed in the reference date.

  • gregorian_nozero, in year -1 means 1 BC, year 0 is not allowed in the reference date and dates in year 0 cannot be encoded. These latter two would give errors.

Any other calendar where the same ambiguity exists could have the same treatment, if there are use-cases which need them. This would be for Julian and proleptic Gregorian calendars, since the other two are model calendars in which I think we can assume year 0 is valid.

@Dave-Allured
Copy link
Contributor

Well I see that my feeble attempts to avoid the year zero issue have failed. I think that the focus in this conversation is good, and I hope we can reach a clear resolution for all six CF defined calendars. Thanks everyone so far, for your research and attention to detail.

I support the ISO 8601 "expanded representation" approach for the interpretation of year numbering. ISO 8601 deals not specifically with calendar definitions, but rather with how to construct string representations. This is relevant for the reference time in the CF units string. As reported by @peterkuma, this representation puts year numbers on a mathematically normal integer time axis that includes negative years and year zero.

@martinjuckes, I would prefer to stay with the familiar year-month-day string syntax, and avoid the new explicit ISO 8601 form that adds new designators such as "Y" and "YB". I think it will be sufficient to simply add explicit documentation for the proper CF treatment of negative and zero year numbers in the units string.

I favor adding constraints for two of the CF calendars. This is a different way of avoiding the year zero issue, specifically an attempt to keep most common applications "safe" from crossover problems. The calendar named "Gregorian" should be restricted to only dates from 1582 October 15 forward. This would apply to both the reference date and all encoded dates. This should be fully compatible with existing data sets that have paid any attention to stated best practices for many years now. Likewise, the calendar named "Julian" should be restricted to only 0001 January 1 forward.

As a result, the need to clarify negative and zero years would be reduced to only the remaining four CF calendars: 360_day, 365_day, 366_day, and proleptic_gregorian.

@JonathanGregory
Copy link
Contributor

Dear @Dave-Allured et al.

We must have had a similar discussion some time ago - it feels familiar! While I appreciate wanting to avoid the Julian-Gregorian transition, I don't think we should disallow the default/standard calendar before 1582. This calendar has always been clearly defined as the mixed Julian/Gregorian calendar; it's the real-world calendar, and we can't exclude a need for real-world time axes which cross the transition.

The Gregorian calendar is undefined before 1582. Possibly we could redefine gregorian in the way Dave suggests (not allowing encoded dates or reference dates before 1582). That would give it a different meaning from default/standard in future data. This change could be a pitfall for interpretating any existing data which says calendar="gregorian" for time coordinates before 1582, but I think gregorian is less likely to have been used than standard or default since it is truly not Gregorian for such times!

However, in view of this point of Dave's, I'd like to change my proposal for new calendar names to:

  • mixed_withzero, in which year 0 means 1 BC, and any year is allowed.

  • mixed_nozero, in which year -1 means 1 BC, year 0 is not allowed in the reference date, and dates in year 0 cannot be encoded.

For years>0, both of these calendars are the same as the default/standard calendar. In that calendar, years<1 should be deprecated.

For julian and proleptic_gregorian, years before 1 should be deprecated, and we could define _withzero and _nozero variants correspondingly if they are needed.

For noleap=365_day, all_leap=366_day and 360_day, I think we could assume that year zero and negative years are allowed. The current definitions describe them as "Gregorian" calendars, which isn't really a useful statement! I would redefine them as calendars in which months have the same lengths in every year. In noleap, the month lengths are as for a non-leap year of the Gregorian calendar, in all_leap, they are as for a leap year, and in 360_day all months have 30 days.

Best wishes

Jonathan

@martinjuckes
Copy link
Contributor

Dear @Dave-Allured , @JonathanGregory ,

I agree with Jonathan's point that someone may want to encode real world data from the 16th century (e.g. weather records from 16th century diaries) .. and so we should maintain the existing support for using actual the mixed Julian/Gregorian calendar.

For 365_day I agree that the current definition is confusing -- how about "All years are 365 days with months as in a non-leap year of the Gregorian calendar", and a similar statement for 366_day.

I agree with Dave's recommendation to avoid the new ISO 8601-2 explicit form for dates (YB etc). I think we should spell out what we are prepared to accept. E.g. Would we accept the basic form which has no separators within date and time (e.g. 19850412T101530)? If the basic form is allowed, it is necessary, to avoid confusion, that a fixed number of digits be used for the years (4 by default, but it can be expanded as long as the length is agreed between parties exchanging data). If we stick to what they call the extended form (e.g. 1985-04-12T10:15:30) the number of digits in the year can be varied without ambiguity.

There is currently a special meaning attached to reference dates of the form 0-1-1, for backward compatibility with COARDS, in section 7.4. Can we remove this feature from CF-1.9?

@Dave-Allured
Copy link
Contributor

There is currently a special meaning attached to reference dates of the form 0-1-1, for backward compatibility with COARDS, in section 7.4. Can we remove this feature from CF-1.9?

That troublesome usage of 0-1-1 is already deprecated by the wording of 7.4 in all CF versions. That section also limits the special meaning to only the "real-world calendar". So if we can get consensus that dates earlier than year 1 are not valid in the existing standard and julian calendars under CF, I think it will be safe to leave the special meaning as deprecated, rather than removing it.

Section 7.4 also includes the only explicit treatment of the year zero concept in the entire document. Year 0 may be a valid year in non-real-world calendars ... This supports the possibility of explicit treatment of zero and negative years in alternate calendars.

@JonathanGregory
Copy link
Contributor

Dear @martinjuckes and @Dave-Allured

For 365_day I agree that the current definition is confusing -- how about "All years are 365 days with months as in a non-leap year of the Gregorian calendar", and a similar statement for 366_day.

Yes. I agree.

I agree with Dave's recommendation to avoid the new ISO 8601-2 explicit form for dates (YB etc).

I do too.

I think we should spell out what we are prepared to accept.

Yes. Since CF generally supports udunits formats for units, we ought at least to allow what udunits does for time, but I haven't found out in the units documentation what formats it accepts. I think it will allow Y[-M[-D [h[:m[:s]]]]], where Y can be a large positive or negative number or zero. (NB udunits itself only handles the real-world calendar, but we use its format for the others.)

I agree with @Dave-Allured that it's OK to continue to allow but deprecate the special use of year 0 in the real-world calendar.

if we can get consensus that dates earlier than year 1 are not valid in the existing standard (or default) and julian calendars under CF ...

I think that years<1 should be deprecated in these calendars, but not disallowed, because of backward compatibility. What do @martinjuckes and others think?

Above (#298 (comment)) I have made proposals for new calendars, going before year 1, and asked whether gregorian should be redefined.

Jonathan

@martinjuckes
Copy link
Contributor

agree with @JonathanGregory on deprecating (rather than disallowing) years < 1 in standard and julian calendars (where I interpret this to refer to the year in the reference time stamp).

I'm not sure about the proposal to redefine gregorian : it is currently defined as mixed Gregorian/Julian which appears OK. I don't have a clear opinion on this.

Concerning what udunits2 supports: the command line tool treats 0-0-0 as equivalent to 1-1-1 and -1-1-1 as being 366 days apart. udunits2 uses a mixed Gregorian/Julian calendar.

The library does accept arbitrary years and ISO basic format. This means that 19850101 is equivalent to 1985 or 1985-01-01. This means that the -MM is not optional when you want to reference years with more than 4 digits. It looks like a rather fragile approach to me, and documentation is lacking. Does anyone know of people wanting to use the ISO basic format, with no delimiters in the date? Could we simplify the specification (and parsing requirements) by insisting on the ISO "extended format", which has - as a delimiter in the date and : in the time?

@JonathanGregory
Copy link
Contributor

Dear @martinjuckes

agree with @JonathanGregory on deprecating (rather than disallowing) years < 1 in standard and julian calendars (where I interpret this to refer to the year in the reference time stamp).

Yes, it would apply to the year in the reference timestamp, and I think it would also mean deprecating any attempt to decode or encode a time before year 1. The CF checker would be able to detect such years in the time coordinate, and should give a warning about it, because their meaning would be unreliable.

I'm not sure about the proposal to redefine gregorian : it is currently defined as mixed Gregorian/Julian which appears OK. I don't have a clear opinion on this.

My suggestion would be to make gregorian different from standard and default, by deprecating times before the change of calendar in 1582 for gregorian (rather than year 1 for the others). If you say it's Gregorian (rather than mixed or proleptic_gregorian), it really should not exist before that calendar was introduced!

Could we simplify the specification (and parsing requirements) by insisting on the ISO "extended format", which has - as a delimiter in the date and : in the time?

Alternatively we could define what it means if you supply a date consisting of more than eight digits and no delimiters. But that would imply a requirement on software to support our interpretation. Maybe we could deprecate it instead of disallowing it.

Jonathan

@larsbarring
Copy link
Contributor

My suggestion would be to make gregorian different from standard and default, by deprecating times before the change of calendar in 1582 for gregorian (rather than year 1 for the others). If you say it's Gregorian (rather than mixed or proleptic_gregorian), it really should not exist before that calendar was introduced!

👍 for this, and in particular for the last sentence.

@Dave-Allured
Copy link
Contributor

Dave-Allured commented Oct 8, 2020

I think it would help to confine this discussion to the requested issue, which is zero and negative years in currently defined calendars. A refinement of the "Gregorian" label is a good topic, but I should not have injected it into this discussion.

Also a full discussion of alternate date formats in the reference string is complicated. Can we please defer that to a future issue, when needed?

I wholeheartedly support new calendar names that are explicit and mathematically well-defined. It is relevant to mention those ideas here. However, can we also put off their resolution to new issues, as needed?

Let's see if we have some consensus so far on the following, acknowledging some previous agreement above.

  • For the reference date in the units string, continue to assume that the year number is easily recognizable and parseable. (There are currently no CF explicit format rules. We assume something like the UDUNITS delimited format shown in section 4.4 Time Coordinate.)

  • Rules will be added to explicitly allow zero and negative year numbers in the reference date. A negative year number will be indicated by a preceding minus sign, with no other adornment.

  • To handle all possible cases, the interpretation of zero and negative year numbers will be explicitly defined for each defined CF calendar type.

  • For the current standard and julian calendars (and synonyms), zero and negative years will be deprecated for multiple reasons.

  • For the current 360_day, 365_day, and 366-day calendars (and synonyms), year numbering will be the complete set of integers, including zero and negative.

Proleptic_gregorian is problematical. Let's talk about that more later.

Agreed so far?

@JonathanGregory
Copy link
Contributor

Dear @Dave-Allured

Thanks for the summary. Yes, I agree with all those bullet points. I would like to add

  • In calendars where negative and zero years are deprecated in the reference date, it is also deprecated to record dates in those years in the variable (even by using a reference date with a positive year).

Jonathan

@Dave-Allured
Copy link
Contributor

@JonathanGregory, I agree with your addition. That was my intention, I just did not fold that into the wording correctly.

@larsbarring
Copy link
Contributor

I have not much to add to this discussion, my earlier comment was only to express my support the suggestion to make the calendar names/terms clearer and more self-evident.

For what it is worth, @Dave-Allured's summary and @JonathanGregory's addition looks good to me.

@Dave-Allured
Copy link
Contributor

I am coming around to favoring a partial ISO 8601-2:2019 approach as described above by @peterkuma. Both ways of either counting or not counting year 0 could be supported with some minimal extension of the reference date notation, as initially suggested above by @martinjuckes. I have a suggested notation that I would like hold for later.

Let's continue to focus on the primary question of year numbers in the traditional CF format, without any new notation. By ISO 8601-2:2019, and if we agree, year zero and negative years are included.

Now the interpretation for proleptic_gregorian is still undecided. I suggest that the current, unadorned proleptic_gregorian should include year zero and negative years for general scientific usage. I do not know of any data sets that have encoded zero or negative years in a conflicting way with proleptic_gregorian. Also there is precedent for this outside of CF; see the Wikipedia article.

@JonathanGregory, you proposed that years before 1 should be deprecated for proleptic_gregorian. Do you have a specific reason for preferring this?

@martinjuckes
Copy link
Contributor

Hello @Dave-Allured

  • I agree on sticking to the format illustrated in section 4.4 Time Coordinate -- in ISO terminology this means accepting the extended format (which has year, month and day separated by "-") and excluding the ISO "basic format", which has a more compact string.

I also agree on the suggested approach to supporting zero and negative reference years with explicit specifications, and deprecating them in some cases.

Also tend to favour allowing negative years in the proleptic_gregorian, as it appears to be designed for continuity going backwards in time.

@JonathanGregory
Copy link
Contributor

you proposed that years before 1 should be deprecated for proleptic_gregorian. Do you have a specific reason for preferring this?

Only the supposition that it might be not well-defined what year zero means. If the consensus is that year 0 is a normal year in the proleptic Gregorian calendar, I think that's good. We can allow zero and negative years for this calendar.

The withzero and nozero options are most relevant for the Julian and standard calendar, I suppose.

@marqh
Copy link
Member

marqh commented Oct 27, 2020

I believe that ISO8601 is as good a definition as we have for the datetime stamp, the Gregorian calendar, and the Proleptic Gregorian calendar.

ISO8601 is explicit in the inclusion of year 0000 and its interpretation.

years prior to 1583 are not automatically allowed by the standard. Instead "values in the range [0000] through [1582] shall only be used by mutual agreement of the partners in information interchange."

CF is a good example of mutual agreement between partners.

An expanded year representation [±YYYYY] is available, again by mutual agreement. and it must be prefixed with a + or − sign.
By convention 1 BC is labelled +0000, 2 BC is labeled −0001.

@Dave-Allured
Copy link
Contributor

@marqh, I am proposing only one idea from ISO8601, the mapping of zero and negative year numbers as you just showed. Year 0 = 1 BC, etc. I am not proposing a full adoption of an ISO8601 format. ISO8601 uses fixed length numbers, whereas the CF date/time stamp allows variable length numbers with delimiters. Also, CF does not use the plus sign.

The delimited system is robust and has served us well for a long time. The CF delimited system accommodates ISO8601 fixed width formats when the standard delimiters including the "T" separator are used. E.g., YYYY-MM-DDTHH:MM:SS is correct under both systems.

@larsbarring
Copy link
Contributor

Yes, I agree, Martin's wording is good -- it is clear and succinct. A couple of minor comments:

  • The explanation of the gregorian or standard calendar should state that "Year 1 corresponds to AD 1 in the Julian calendar."
  • The last sentence of the explanation of the proleptic_gregorian calendar would become clearer if written "In the proleptic Gregorian calendar year 1 corresponds to AD 1 and year 0 corresponds to 1BC/BCE.

@davidhassell
Copy link
Contributor

davidhassell commented Jun 14, 2021

Hello,

A lot of suggestions have been made on this issue since the last update to the associated pull request (#315). I'm not sure if consensus has been reached, as the conversation has paused, but is it possible for someone to synthesise the suggestions made here during April 2021 so that the PR can be updated? (pinging @Dave-Allured for extra visibility as the PR owner).

Many thanks,
David

@JonathanGregory
Copy link
Contributor

Dear @davidhassell et al.

I will produce some text synthesising the comments made since the current pull request was written.

Since then, the preamble on calendars has been modified as a result of the agreement of issue 313 on leap seconds. As a result, I suggest that the new sentence from the pull request of this issue should go in a different place. Below I reproduce the new text from the working draft, because I think it's useful in spelling out what "calendar" means in CF. I have inserted the extra sentence in bold where I'd suggest putting it.

Cheers

Jonathan

4.4.1 Calendar

A date/time is the set of numbers which together identify an instant of time, namely its year, month, day, hour, minute and second, where the second may have a fraction but the others are all integer. A time coordinate value represents a date/time. In order to calculate a time coordinate value from a date/time, or the reverse, one must know the units attribute of the time coordinate variable (containing the time unit of the coordinate values and the reference date/time) and the calendar. The choice of calendar defines the set of dates (year-month-day combinations) which are permitted, and therefore it specifies the number of days between the times of 0:0:0 (midnight) on any two dates. Date/times which are not permitted in a given calendar are prohibited in both the encoded time coordinate values, and in the reference date/time string.

When a time coordinate value is calculated from a date/time, or the reverse, it is assumed that the coordinate value increases by exactly 60 seconds from the start of any minute (identified by year, month, day, hour, minute, all being integers) to the start of the next minute, with no leap seconds, in all CF calendars. This assumption has various consequences when real-world date/times from calendars which do contain leap seconds (such as UTC) are stored in time coordinate variables:

  • Any date/times between the end of the 60th second of the last minute of one hour and the start of the first second of the next hour cannot be represented by time coordinates e.g. 2016-12-31 23:59:60.5 cannot be represented.

  • A time coordinate value must not be interpreted as representing a date/time in the excluded range. For instance, 60 seconds after 23:59 means 00:00 on the next day.

  • A date/time in the excluded range must not be used as a reference date/time e.g. seconds since 2016-12-31 23:59:60 is not a permitted value for units.

  • It is important to realise that a time coordinate value does not necessarily exactly equal the actual length of the interval of time between the reference date/time and the date/time it represents.

It is recommended that the calendar be specified by the calendar attribute of the time coordinate variable. The values currently defined for calendar are:

[... to be continued]

@JonathanGregory
Copy link
Contributor

Dear all

I have drafted a new version of the affected parts of the text of Sect 4.4, taking account of the comments made since the pull request was revised, mostly as suggested but not quite, as follows:

  • I included the deprecation of gregorian from issue 319.

  • I suggest we insert the paragraph with rules about years before 1 just before the list of calendars, rather than the earlier place where it appears in the PR. That follows Martin's point that it's easier to deal with the detail in one place. If we put it here, I don't think we need to insert the prohibitions into the definitions of individual calendars.

  • I don't think we need to talk about year 0 = 1 BCE etc. because we don't allow years before 1 in the standard and julian calendars (except for climatology). I don't believe it is correct to say that year 0 in proleptic_gregorian is 1 BCE, because that calendar doesn't have the right year lengths. It is not a "real-world" calendar.

  • As Karl suggested, I avoided using Gregorian to describe other calendars, because they're not.

  • I stated the Julian rule for leap years, and that we use Gregorian month lengths in all calendars except 360_day and none.

  • I included Karl's text for the cell methods section, with small changes (in bold) from his version. I say that it is impossible, rather than forbidden, for the year-0 convention to be used for climatological time in calendars where it's a valid year.

I think the reference to udunits.dat is out-of-date, isn't it? What should it be changed to?

Best wishes

Jonathan

4.4 Time coordinate

Variables representing time must always explicitly include the units attribute; there is no default value. The units attribute takes a string value formatted as per the recommendations in the Udunits package UDUNITS. The following excerpt from the Udunits documentation explains the time unit encoding by example:

The specification seconds since 1992-10-8 15:15:42.5 -6:00 indicates second since October 8th, 1992 at 3 hours, 15 minutes and 42.5 seconds in the afternoon in the time zone which is six hours to the west of Coordinated Universal Time (i.e. Mountain Daylight Time). The time zone specification can also be written without a colon using one or two digits (indicating hours) or three or four digits (indicating hours and minutes).

The acceptable units for time are listed in the udunits.dat file. The most commonly used of these strings (and their abbreviations) includes day (d), hour (hr, h), minute (min) and second (sec, s). Plural forms are also acceptable. The reference date/time string (appearing after the identifier since) may include date alone; date and time; or date, time, and time zone. The reference date/time string is required.

...

4.4.1 Calendar

A date/time is the set of numbers which together identify an instant of time, namely its year, month, day, hour, minute and second, where the second may have a fraction but the others are all integer. A time coordinate value represents a date/time. In order to calculate a time coordinate value from a date/time, or the reverse, one must know the units attribute of the time coordinate variable (containing the time unit of the coordinate values and the reference date/time) and the calendar. The choice of calendar defines the set of dates (year-month-day combinations) which are permitted, and therefore it specifies the number of days between the times of 0:0:0 (midnight) on any two dates. Date/times which are not permitted in a given calendar are prohibited in both the encoded time coordinate values, and in the reference date/time string. It is recommended that the calendar be specified by the calendar attribute of the time coordinate variable.

...

The values currently defined for calendar are listed below. In all calendars except 360_day and none, the lengths of the months are the same as in the Gregorian calendar for leap years and non-leap years. In the julian and the default standard mixed Gregorian/Julian calendar, dates in years before year 0 (i.e. before 0-1-1 0:0:0) are not allowed, and the year in the reference date/time of the units must not be negative. In these calendars, year zero has a special use to indicate a climatology (see [climatological-statistics]), but this use of year zero is deprecated. In other calendars, years before year 1 are allowed.

standard: Mixed Gregorian/Julian calendar as defined by Udunits. This is the default. A deprecated alternative name for this calendar is gregorian. Year 1 in this calendar is year 1 AD or CE of the Julian calendar.

proleptic_gregorian: A calendar with the Gregorian rules for leap-years extended to dates before 1582-10-15. That is, a year is a leap year if either (i) it is divisible by 4 but not by 100 or (ii) it is divisible by 400.

julian: Julian calendar, in which a year is a leap year if it is divisible by 4, even if it is also divisible by 100.

noleap or 365_day: A calendar with no leap years, i.e. all years are 365 days long.

all_leap or 366_day: A calendar in which every year is a leap year, i.e. all years are 366 days long.

360_day: A calendar in which all years are 360 days and divided into 30-day months.

none: No calendar.

The calendar attribute may be set to none in climate experiments that simulate a fixed time of year. The time of year is indicated by the date in the reference date/time of the units attribute. The time coordinates that might apply in a perpetual July experiment are given in the following example.

...

Replace the paragraph in 7.4 beginning "The COARDS standard" with

For compatibility with the COARDS standard, a climatological time coordinate in the default standard and julian calendars may be indicated by setting the time coordinate's units attribute to midnight on 1 January in year 0 (i.e., since 0-1-1). This convention is deprecated because it does not provide any information about the intervals used to compute the climatology, and there may be inconsistencies among software packages in the interpretation of the time coordinates with a reference time of year 0. Use of year 0 for this purpose is impossible in all other calendars, because year 0 is a valid year.

Modify conformance document section 4.4 recommendations:

  • The use of time coordinates in year 0 and reference date/times in year 0 to indicate climatological time is deprecated.

Add to conformance document section 4.4.1 recommendations:

  • A time coordinate variable should have a calendar attribute.

  • The value standard should be used instead of gregorian in the calendar attribute.

@davidhassell
Copy link
Contributor

Thank you, Jonathan. Your text is very clear to me.

All of the changes required for #319 are now here - is it OK for #319 to just refer to the PR for this issue, rather removing these changes from this issue and recreating in a bespoke PR for #319?

@JonathanGregory
Copy link
Contributor

Dear @davidhassell

Yes, you're right, this does cover #319. I hadn't thought of that, but it could be convenient if no-one objects to merging them. I didn't redefine the 365_day and 366_day calendars in terms of proleptic Gregorian because I had overlooked that point of yours. I did in in a different way, by stating that months are of Gregorian lengths.

Jonathan

@davidhassell
Copy link
Contributor

it could be convenient if no-one objects to merging them

It is fine by me.

I did in in a different way, by stating that months are of Gregorian lengths.

Which I think works well!

@larsbarring
Copy link
Contributor

@JonathanGregory This is all very good, and merging the two seems logical, especially as #319 is an offshoot from this issue. A couple of minor comments:

  • In the definition of the standard calendar the switch-over date could be mentioned. How about:
    "standard: Mixed Gregorian/Julian calendar as defined by Udunits. Dates onwards from 1582-10-15 follow the Gregorian system and earlier dates follow the Julian system. This is the default."
  • Could the deprecation of the gregorian calendar be a bit more prominent? After all, it was a calendar in its own right, even though being equivalent to the standard calendar (or vice versa). How about below the list of calendars writing a sentence (own paragraph) like:
    "The gregorian calendar is deprecated, instead use the equivalent standard calendar"
  • Indeed, the udunits.dat file is outdated, see this comment from 2017. It has been replaced by xml files, which are a bit awkward [for humans] to read. @davidhassell might have a better overview.

@davidhassell
Copy link
Contributor

Hello Lars,

I like your suggestions.

With regards Udunits, the time units are distributed across multiple XML files. For example, hour is defined in udunits2-accepted.xml and second is defined in udunits2-base.xml. Perhaps it might be better to say:

"The acceptable units for time are listed in the Udunits database [UDUNITS]."

where [UDUNITS] is the existing reference in the Bibliography section, which provides a version-independent link (http://www.unidata.ucar.edu/software/udunits/) to the Udunits home page. The online viewable database for a particular Udunits version is easily findable from the given link (the latest is https://www.unidata.ucar.edu/software/udunits/udunits-2.2.28/udunits2.html#Database).

Incidentally, what is the correct way to write Udunits? CF uses "Udunits", but Unidata documentation seems to favour "UDUNITS".

Mentioning @semmerson here in case I've missed something.

@JonathanGregory
Copy link
Contributor

Dear @larsbarring and @davidhassell

Taking Lars's points in reverse.

  • I agree with what David proposes for referring to UDUNITS instead of udunits.dat.

  • It seems better to me not to have an independent entry for the deprecated gregorian, which would make it more prominent than it is now. If someone is reading the CF standard to find out what gregorian means, because they've found it in CF-netCDF data, and they don't immediately notice it in the entry for the standard calendar, they'd easily find it with a text search. If they're intending to write data, they're better off not knowing about gregorian!

  • I agree that the standard calendar of UDUNITS should be explained. Looking again at the CF 1.8, I see that it is explained, but this comes further down, following the text for arbitrary calendars about month_lengths etc. I have reproduced the current text below, for reference. I propose that we delete the existing text and instead insert into the entry for this calendar an abbreviated version rewritten to be more consistent with CF text and less UDUNITS-specific, as follows:

standard: Mixed Gregorian/Julian calendar as defined by [UDUNITS]. This is the default calendar. A deprecated alternative name for this calendar is gregorian. In this calendar, date/times after (and including) 1582-10-15 0:0:0 are in the Gregorian calendar, in which a year is a leap year if either (i) it is divisible by 4 but not by 100 or (ii) it is divisible by 400. Date/times before (and excluding) 1582-10-5 0:0:0 are in the Julian calendar. Year 1 AD or CE in the standard calendar is also year 1 of the julian calendar. In this calendar, 1582-10-15 0:0:0 is exactly 1 day later than 1582-10-4 0:0:0 and the intervening dates are undefined. Therefore it is recommended that date/times in the range from (and including) 1582-10-5 0:0:0 until (but excluding) 1582-10-15 0:0:0 should not be used as reference in units, and that a time coordinate variable should not include any date/times in this range, because their interpretation is unclear. It is also recommended that a reference date/time before the discontinuity should not be used for date/times after the discontinuity, and vice-versa.

proleptic_gregorian: A calendar with the Gregorian rules for leap-years extended to years before 1582. All dates consistent with these rules are allowed, both before and after 1582-10-15 0:0:0.

In this proposed text, I have been more explicit about the problems with the calendar and the recommendations to avoid them. I have moved the Gregorian leap-year rules from proleptic_gregorian to standard, and I have stated explicitly that the illegal range of dates in standard is OK in proleptic_gregorian, as I believe to be the case. I don't think any of this changes the convention, but I think it's useful to be clear. Is it all OK? It would imply some more things in the conformance document.

The existing text also says, "For time coordinates that do cross the discontinuity the proleptic_gregorian calendar should be used instead." I don't think this recommendation makes sense. You can't use the proleptic_gregorian calendar to represent dates in the Julian calendar. If they are real historical dates, with daily resolution, you have no choice but standard to do it properly. If you don't mind about being inexact, for instance if you only want the 1st of the month, you could use any calendar, including model calendars, to encode the dates; proleptic_gregorian wouldn't be particularly better. Hence I propose we should omit that recommendation.

Cheers

Jonathan

Current text:

The mixed Gregorian/Julian calendar used by Udunits is explained in the following excerpt from the udunits(3) man page:

The udunits(3) package uses a mixed Gregorian/Julian calendar system. Dates prior to 1582-10-15 are assumed to use the Julian calendar, which was introduced by Julius Caesar in 46 BCE and is based on a year that is exactly 365.25 days long. Dates on and after 1582-10-15 are assumed to use the Gregorian calendar, which was introduced on that date and is based on a year that is exactly 365.2425 days long. (A year
is actually approximately 365.242198781 days long.) Seemingly strange behavior of the udunits(3) package can result if a user-given time interval includes the changeover date. For example, utCalendar() and utInvCalendar() can be used to show that 1582-10-15 preceded 1582-10-14 by 9 days.

Due to problems caused by the discontinuity in the default mixed Gregorian/Julian calendar, we strongly recommend that this calendar should only be used when the time coordinate does not cross the discontinuity. For time coordinates that do cross the discontinuity the proleptic_gregorian calendar should be used instead.

@semmerson
Copy link

semmerson commented Jun 22, 2021 via email

@taylor13
Copy link

Thanks, Jonathan. Seems clear and close now. With these changes we'll be able to close both 298 and 319! Real progress.

@larsbarring
Copy link
Contributor

Dear Jonathan,
Many thanks, in particular I like the comprehensive explanation of the standard calendar. Many thanks also to @Dave-Allured for initiating this issue and steering the conversation into the good shape that allowed the final lap reaching all the way to this point. Just echoing Karl: real progress!

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Jun 23, 2021

If everyone is content with the proposed text, I will make a new pull request for this issue. In the pull request, I will add @Dave-Allured to the CF authors in recognition of his raising the issue and the work he has done on it (unless he would prefer not). Jonathan

@davidhassell
Copy link
Contributor

Thanks @JonathanGregory - I'm happy with your latest proposed text; and thanks @semmerson for putting us right :)

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Jul 2, 2021

I have prepared #331 for this issue and #319. Since the changes are exactly as discussed and agreed above, I propose that the pull request should be merged on 23rd July, three weeks from today, if no concerns are raised.

@davidhassell
Copy link
Contributor

Thanks, @JonathanGregory. Your PR looks good to me.

@JonathanGregory
Copy link
Contributor

@zklaus has made the following comment on the PR

Perhaps this is a good opportunity to update the second udunits.dat link in the following line 217 as well.

I also wonder if we should be explicit in whether CF follows udunits in the definitions of year and month. The way these two paragraphs are written, the reader might come away with the impression that we are only warning of potential misinterpretation by some other software, whereas my understanding so far is that we adopt the udunits definition, albeit begrudgingly. But perhaps my understand is wrong or you would prefer to address this in a separate issue?

@JonathanGregory
Copy link
Contributor

I realise that we agreed to delete the existing excerpt from the UDUNITS that describes the standard calendar, because we have put that information in the description of that calendar. I will modify the PR.

I agree with the point @zklaus makes about the deprecated units. Would the following be OK:

UDUNITS defines a year to be exactly 365.242198781 days (the interval between 2 successive passages of the sun through vernal equinox). It is not a calendar year. UDUNITS defines a month to be exactly year/12, which is not a calendar month. The CF standard follows UDUNITS in the definition of units, but we recommend that year and month should not be used, because of the potential for mistakes and confusion.

I propose to delete this text: "UDUNITS includes the following definitions for years: a common_year is 365 days, a leap_year is 366 days, a Julian_year is 365.25 days, and a Gregorian_year is 365.2425 days." It is correct, but I don't think it's necessary or helpful.

Although this PR can eliminate udunits.dat from the time coordinate section, the file is mentioned elsewhere in the standard, so I think a separate defect ticket is needed on that subject.

@JonathanGregory
Copy link
Contributor

I have made the above changes in #331

@davidhassell
Copy link
Contributor

Thanks, Klaus and Jonathan - This alternation (bd7498d) looks good to me.

@JonathanGregory
Copy link
Contributor

Thanks, David. If @zklaus and others are also content with the new version, I suppose it can be accepted three weeks from three days ago, which makes 23rd July

@JonathanGregory
Copy link
Contributor

There have been no further comments for three weeks and sufficient support has been expressed, so this change is therefore accepted according to the rules. I have merged #331. Thanks to all contributors to the discussion, especially @peterkuma, who raised the issue, and @Dave-Allured, who worked on the text and has been added to the list of authors of the CF convention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

Successfully merging a pull request may close this issue.