Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider increase to TwoDigitYearMax #75148

Closed
danmoseley opened this issue Sep 6, 2022 · 11 comments · Fixed by #76848
Closed

Consider increase to TwoDigitYearMax #75148

danmoseley opened this issue Sep 6, 2022 · 11 comments · Fixed by #76848
Labels
area-System.Globalization help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@danmoseley
Copy link
Member

danmoseley commented Sep 6, 2022

TwoDigitYearMax is the century assumption we use when parsing a date string with a two digit year. Currently all our calendars seem to use 29, ie., 29 implies 2029 but 30 implies 1930. This suggests back in Windows 98, Windows chose 29 and that's where we got 29 from. At least back to 2.0, we seem to have been defaulting to 29 if we couldn't read the system setting. We no longer read Windows NLS settings by default, and ICU does not have such data, so the default of 29 applies almost everywhere, unless overridden in code.

so eg --

        var c = CultureInfo.GetCultureInfo("en-US").Calendar;
        Console.WriteLine($"{c.GetType()} calendar has TwoDigitYearMax of {c.TwoDigitYearMax}");
        
        var dt = DateOnly.Parse("10/10/28");
        Console.WriteLine(dt);        
        
        dt = DateOnly.Parse("10/10/35");
        Console.WriteLine(dt);

gives

System.Globalization.GregorianCalendar calendar has TwoDigitYearMax of 2029
10/10/2028
10/10/1935

While Excel still uses 29, Windows now has moved to 49 for at least some calendars. Currently on my Windows box, with the en-US culture and Gregorian calendar, if I run with DOTNET_SYSTEM_GLOBALIZATION_USENLS=1 then I get System.Globalization.GregorianCalendar calendar has TwoDigitYearMax of 2049. Windows allows you to change this setting although it's deep in the UI:

image

Now that we're controlling this value by default on all OS, the value has become much more important than on .NET Framework. We may choose to make a similar decision as we are approaching 2030. The closer we get to 2030 the more likely it will be that if I type "1/1/30" I mean 2030.

Options

  1. Leave as-is. Document that users should explicitly increase TwoDigitYearMax if they parse 2-digit years and the data has a different assumption.
  2. Leave as-is, but on Windows when using ICU, call specially into NLS for the two digit cutoff, since ICU has no value. This makes Windows inconsistent with other OS, by default.
  3. Change to something larger that gives wiggle room, eg 49 (there appears to be no consensus, see below) possibly with a global config switch to revert to 29, similar to ICU/NLS. Doc as a breaking change.

Prior art:

Python uses a cutoff of 68
Javascript uses a cutoff of 49
Java seems to be 99
Golang seems to be 68
Nodatime seems to be 30
Excel uses 29. It respects Windows settings on Windows, but only for user input.
Rust uses 69 in at least one instance - I experimented naively with something like NaiveDate::parse_from_str("70-09-05", "%y-%m-%d")
Perl - not sure, I found one reference to 50, another package (DateTime::Format::Flexible) uses 69
RFC3339 just says the internet should always use 4 digit years.
Perhaps there are other relevant standards/RFC's ?

I don't have Linux/macOS handy. Do they have similar system settings, possibly user modifiable, and separate from ICU? If so, possibly we should read those values on all OS. This would however further increase the possibility of introducing a bug when, eg., developing on macOS and deploying on Linux.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 6, 2022
@ghost
Copy link

ghost commented Sep 6, 2022

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

TwoDigitYearMax is the century assumption we use when parsing a date string with a two digit year. Currently all our calendars seem to use 29. This suggests back in Windows 98, Windows chose 29 and that's where we got 29 from. At least back to 2.0, we seem to have been defaulting to 29 if we couldn't read the system setting. We no longer read Windows NLS settings by default, and ICU does not have such data, so the default of 29 applies almost everywhere, unless overridden in code.

so eg --

        var c = CultureInfo.GetCultureInfo("en-US").Calendar;
        Console.WriteLine($"{c.GetType()} calendar has TwoDigitYearMax of {c.TwoDigitYearMax}");
        
        var dt = DateOnly.Parse("10/10/28");
        Console.WriteLine(dt);        
        
        dt = DateOnly.Parse("10/10/35");
        Console.WriteLine(dt);

gives

System.Globalization.GregorianCalendar calendar has TwoDigitYearMax of 2029
10/10/2028
10/10/1935

While Excel still uses 29, Windows now has moved to 49 for at least some calendars. Currently on my Windows box, with the en-US culture and Gregorian calendar, if I run with DOTNET_SYSTEM_GLOBALIZATION_USENLS=1 then I get System.Globalization.GregorianCalendar calendar has TwoDigitYearMax of 2049

Now that we're controlling this value by default on all OS, the value has become much more important than on .NET Framework. We may choose to make a similar decision as we are approaching 2030. The closer we get to 2030 the more likely it will be that if I type "1/1/30" I mean 2030.

Options

  1. Leave as-is. Document that users should explicitly increase TwoDigitYearMax if they parse 2-digit years and the data has a different assumption.
  2. Leave as-is, but on Windows when using ICU, call specially into NLS for the two digit cutoff, since ICU has no value. This makes Windows inconsistent with other OS, by default.
  3. Change to something larger that gives wiggle room, eg 49 (there appears to be no consensus, see below) possibly with a global config switch to revert to 29, similar to ICU/NLS.

Prior art:

Python uses a cutoff of 68
Javascript uses a cutoff of 49
Java seems to be 99
Golang seems to be 68
Nodatime seems to be 30
Excel uses 29.
RFC3339 just says the internet should always use 4 digit years.
Perhaps there are other relevant standards/RFC's ?

Author: danmoseley
Assignees: -
Labels:

area-System.Globalization

Milestone: -

@tarekgh tarekgh added this to the Future milestone Sep 6, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 6, 2022
@tarekgh tarekgh added the help wanted [up-for-grabs] Good issue for external contributors label Sep 6, 2022
@tarekgh
Copy link
Member

tarekgh commented Sep 6, 2022

My preference here is trying to read the settings when running on Windows. And we change the default values to 49 as it is reasonable number to me. 68 can be ok too. If we go with that, we'll need to document that as a breaking change and update any existing docs as needed.

@Clockwork-Muse
Copy link
Contributor

RPGLE uses a cutoff of 39 (In case we should possibly consider interactions with some "legacy" systems)
DB2 SQL uses the first two digits of the current year
SQL Server uses a cutoff of 49 (by default)
PostgreSQL uses the year closest to 2020 (the rules for some other formats make things more complicated).

@danmoseley
Copy link
Member Author

danmoseley commented Sep 6, 2022

A (weak) argument for 69 is that the start of the Unix epoch is 1970.
Another line of argument is -- what is the furthest in the future that is likely to be stored in a parseable format yet where 2-digits are (in context) unambiguous? Perhaps government bonds. From a web search, I see Canada at least has issued 50 year bonds. They would expire around 2070.
A contrary argument -- modern data really ought to use 4 digit dates, so 2 digit dates are more likely to represent older data.

@cdbullard
Copy link
Contributor

Hi all,
I thought I would give this a shot for updating the value from 2029 to 2049 per the discussion above, but had a couple of questions:

  1. Would this change be limited to files currently listing 2029 as the TwoDigitYearMax value inside of the System.Globalization projects? (Calendar.cs, CalendarData.cs, EastAsianLunisolarCalendar.cs, GregorianCalendar.cs, JulianCalendar.cs)?
  2. Would a change to any unit tests be required? I am not seeing any that would be impacted if the five files listed above would be modified, but want to double-check.

@danmoseley
Copy link
Member Author

@cdbullard you would be welcome, but I'm not sure there's consensus yet on what number to choose. Also on whether we should load the Windows OS value - if the date being parsed is directly from the user (e.g. a UI) then it would make sense, but if the code is processing data from a file one would likely want to have consistent behavior across OS.

@dotnet/area-system-globalization would probably best make this call.

@danmoseley
Copy link
Member Author

danmoseley commented Sep 8, 2022

Would a change to any unit tests be required?

If a change didn't break a unit test, it probably would mean we are missing necessary unit tests and should add them.

@tarekgh
Copy link
Member

tarekgh commented Sep 8, 2022

Before using ICU and when using NLS, Core the .NET and .NET Framework depended on the system settings which can be changed by the users. This means parsing dates with 2 digits years will still be dependent on that. Usually, users parsing dates with 2 digits year and want to ensure the consistent results will set the Calendar.TwoDigitYearMax before parsing, something like:

            CultureInfo ci = (CultureInfo)CultureInfo.InvariantCulture.Clone();
            ci.DateTimeFormat.Calendar.TwoDigitYearMax = 2049;

            Console.WriteLine($"{DateTime.Parse("2/11/30", ci)}");

Windows has already changed the default too and that affected .NET Framework and we didn't hear any complaints. This can tell the breaking change is really scoped. I am inclining to what I said in #75148 (comment).

@tarekgh
Copy link
Member

tarekgh commented Sep 18, 2022

@cdbullard are you still interested in making this change? We can help guide you as needed.

@danmoseley
Copy link
Member Author

@cdbullard ?

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Oct 11, 2022
@cdbullard
Copy link
Contributor

I apologize for the delay! I am happy to give it a shot--I have created a PR (#76848) with some initial changes, but have not yet attempted any unit test updates (was not sure exactly which ones to consider for this change). Please have a look when you get a chance and I'll try to quickly respond to any feedback.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Oct 22, 2022
@tarekgh tarekgh modified the milestones: Future, 8.0.0 Oct 22, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Nov 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Globalization help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants