Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Linux] [CoreClr] Inconsistent results during value formatting #4565

Closed
mviorno opened this issue Oct 9, 2015 · 21 comments
Closed

[Linux] [CoreClr] Inconsistent results during value formatting #4565

mviorno opened this issue Oct 9, 2015 · 21 comments

Comments

@mviorno
Copy link

mviorno commented Oct 9, 2015

I have several tests, passed ok under windows and under linux with mono runtime, but same tests failed with the linux coreclr runtime:

test
CultureInfo culture = new CultureInfo("en-US");
Assert.AreEqual("0.00", 0.0.ToString("n", culture));

result

Expected: 0.00
Actual:   0.000
test
CultureInfo culture = new CultureInfo("en-US");
Assert.AreEqual("0.00", 0.0.ToString("f", culture));

result

Expected: 0.00
Actual:   0.000
test
CultureInfo culture = new CultureInfo("en-US");
Assert.AreEqual("0.00 %", 0.0.ToString("p", culture));

result

Expected: 0.00 %
Actual:   0.000%
test
CultureInfo culture = new CultureInfo("en-US");
Assert.AreEqual("0.0 %", 0.0.ToString("p1", culture));

result

Expected: 0.0 %
Actual:   0.0%
test
double negativeNumber = -38456.123456789123456789;
CultureInfo culture = new CultureInfo("en-US");
Assert.AreEqual("($38,456.1)", negativeNumber.ToString("c1", culture));

result

Expected: ($38,456.1)
Actual:   -$38,456.1
test
CultureInfo culture = new CultureInfo("de-DE");
DateTime date = new DateTime(2015, 3, 9, 15, 43, 28); // 9 Mar 2015

Assert.AreEqual("09.03.2015", date.ToString(culture.DateTimeFormat.ShortDatePattern, culture));

result

Expected: 09.03.2015
Actual:   9.3.2015
test
CultureInfo culture = new CultureInfo("ru-RU");
DateTime date = new DateTime(2015, 3, 9, 15, 43, 28); // 9 Mar 2015

Assert.AreEqual("9 марта 2015 г.", date.ToString(culture.DateTimeFormat.LongDatePattern, culture));

result

Expected: 9 марта 2015 г.
Actual:   Понедельник, 9 марта 2015 г.

Environment

uname -a

Linux debianDNX 3.16.0-4-amd64 dotnet/coreclr#1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04) x86_64 GNU/Linux
dnvm list

Active Version              Runtime Arch OperatingSystem Alias
------ -------              ------- ---- --------------- -----
       1.0.0-beta6          mono         linux/darwin
  *    1.0.0-rc1-15838      coreclr x64  linux           default
       1.0.0-beta8-15618    mono         linux/darwin
       1.0.0-beta8-15530    mono         linux/darwin
       1.0.0-rc1-15838      mono         linux/darwin
       1.0.0-beta8-15618    coreclr x64  linux
       1.0.0-beta7          mono         linux/darwin
@stephentoub
Copy link
Member

cc: @tarekgh, @steveharter

@steveharter
Copy link
Member

For Linux and MacOX this is expected as we are returning raw locale data from the underlying framework which is ICU (International Components for Unicode) which is then then used by coreclr to format numbers, dates, etc. That raw data is different than Windows (and mono). ICU obtains its values from CLDR (Common Locale Data Repository).

There are also differences in that data between releases of ICU\CLDR, just like there's differences between Windows 8.1 and 10 in some of their locale information. Windows has fixed data for common locales but does 'import' some of the less common locale data from CLDR.

Note that ICU\CLDR has two settings for currency - regular and 'accounting' where accounting typically uses parenthesis instead of the negative sign for negative values. However, Windows only one setting for currency. The behavior of coreclr currently uses the 'regular' setting so no parenthesis by default, unlike some locales like en-US which use parenthesis by default.

@tarekgh
Copy link
Member

tarekgh commented Oct 9, 2015

cc @eerhardt

I believe the tests need to be fixed as we can expect different data across platforms. @eerhardt may help with this issue I think.

@eerhardt
Copy link
Member

eerhardt commented Oct 9, 2015

We also use ICU/CLDR to get the short and long date patterns for a culture.

For LongDatePattern, we use the "full" and "long" date format, with the "full" format being the default. For "ru-RU", you can see the full format is EEEE, d MMMM y 'г'. which corresponds to the actual result you are seeing.

For ShortDatePattern, we use the "short" and "medium" formats from CLDR, but we also use the "yMd" skeleton, with the "yMd" skeleton being the default. For "de-DE", you can see the "yMd" skeleton is d.M.y, which matches your actual result.
The reason we are using "yMd" skeleton is that for "en-US", it matches the Windows ShortDatePattern - "M/d/y". The "short" format in CLDR is "M/d/yy" and the "medium" is "MMM d, y". If you look between even between the two cultures: "en" and "de", there is no 1 format in CLDR that corresponds to the data in Windows. "en" matches with "yMd" skeleton and "de" matches with the "medium" format. So no matter which format we'd pick it would match Windows for some cultures, and not match Windows for other cultures.

In short, until Windows and CLDR are in 100% agreement, there will always be differences between the culture data on Windows and Linux. Which means if you are dependent on this data (like your tests are), then you will have to switch depending on what platform you are running on. In our tests, we use RuntimeInformation.IsOSPlatform(OSPlatform.Windows) or Linux or OSX, etc.

This issue is pretty much a dupe of https://github.com/dotnet/corefx/issues/3698.

@akoeplinger
Copy link
Member

@eerhardt FWIW as a native German speaker, having 9.3.2015 as the short date pattern is very weird, 09.03.2015 is definitely the correct one (i.e. the short date format in CLDR, this also matches what a German Windows does).

Maybe it'd be better to default to the short CLDR format for ShortDatePattern?

@tarekgh
Copy link
Member

tarekgh commented Oct 9, 2015

I think we are seeing similar issue with Serbian cultures too dotnet/coreclr#3616. I am wondering why ICU is not picking the CLDR data?

@eerhardt
Copy link
Member

eerhardt commented Oct 9, 2015

(i.e. the short date format in CLDR, this also matches what a German Windows does).

The short date format in CLDR for the de culture is dd.MM.yy, which means it would output 09.03.15, which still wouldn't match Windows 09.03.2015 according to the test above.
Using the short format would mean that for en-US, the pattern would be M/d/yy, which would output 9/3/15, which also doesn't match Windows behavior (or my expected behavior either).

But I agree that using the yMd skeleton as the default for ShortDatePattern just because that's what matches Windows for en-US doesn't seem correct. From a programmatic/theoretical point of view, it would make total sense that LongDatePattern uses 'full' from CLDR and ShortDatePattern uses 'short' from CLDR.

Maybe the right thing to do here is to use short from CLDR and get CLDR to change their data for cultures where the short format isn't correct.

@akoeplinger
Copy link
Member

@eerhardt you're right, I misread the yy (so in fact medium would be correct for German here to match Windows 😭 though you could argue that 09.03.15 is indeed an acceptable short date).

To be honest, using the short pattern seems like the most "correct" approach to me, at least it then matches with what CLDR defines as short.

@eerhardt
Copy link
Member

eerhardt commented Oct 9, 2015

Logged dotnet/coreclr#1736 for this specific issue.

@mviorno
Copy link
Author

mviorno commented Oct 9, 2015

Could you consider my tests from compatibility/consistence point of view? When we (users) develop a software using aspvnext, we expected to get most similar results (better the same results) running against different OS and different runtimes. Both, NET framework and runtime are abstraction layers intended to leverage differences in underlying OS, libraries, hardware etc. That's why it would be great, if tests similar to mine pass ok against different runtimes and OS, without writing any spaghetti code.

@eerhardt
Copy link
Member

eerhardt commented Oct 9, 2015

Could you consider my tests from compatibility/consistence point of view?

See @ellismg's comment on https://github.com/dotnet/corefx/issues/3698.

Windows' locale data deviates from ICU/CLDR's in some cases. There are some cases where Windows uses the CLDR data and they match. But there are cases where it doesn't - for example if the Windows team felt it was "wrong", or if there are geopolitical issues, or if there would be compatibility concerns with earlier versions of Windows, etc.

The only way to ensure the same culture data on other platforms would be to take Windows' culture data and ship it with .NET, which we decided not to do for obvious reasons such as maintainability, updatability, package size, etc.

@mellinoe
Copy link
Contributor

The only way to ensure the same culture data on other platforms would be to take Windows' culture data and ship it with .NET, which we decided not to do for obvious reasons such as maintainability, updatability, package size, etc.

And what about the other way around; match CoreCLR on Windows with CoreCLR on other platforms, rather than with the full CLR on Windows? Not saying it's a good option or even feasible, just curious if it was considered. Is ICU/CLDR available for use on Windows?

@eerhardt
Copy link
Member

ICU is available for use on Windows for both 32 and 64 bit.

I'm not sure if @ellismg considered using ICU on Windows for CoreCLR, but I don't think the approach would be correct. It would create a new dependency on Windows, and some would consider the dependency needless, since culture data already exists on Windows.
Also, using the "default" culture data on a system makes sense. That way all your Windows apps use the same data - whether they use Win32, full .NET, or CoreCLR - and all your Linux app use the same data - whether native built, using CoreCLR, Java, etc.

@tarekgh
Copy link
Member

tarekgh commented Oct 12, 2015

Right, we need to have the culture data consistent with the OS to avoid undesirable behaviors. also as mentioned we don't have to add more dependency when running on Windows.

in general I expect moving forward the data on Windows and data on ICU will converge and the differences will be very little.

@mellinoe
Copy link
Contributor

in general I expect moving forward the data on Windows and data on ICU will converge and the differences will be very little

That sounds good to me. I was just sort of playing "Devil's advocate", because the most obvious way to align behavior would just be to use the same library, and we hadn't mentioned that. It would also probably have bad implications for debugging Universal apps, or other Windows-only things, which would be using the Windows culture data.

@ShawnSteele
Copy link

Not a lot to add, but you might want to look at http://blogs.msdn.com/b/shawnste/archive/2015/08/28/locale-culture-data-churn.aspx
Tests like these are perhaps interesting for a point in time, however the results aren't great for applications to take a dependency on. This kind of information shifts over time (and, as noted here, between platforms and versions of platforms)

@ellismg
Copy link
Contributor

ellismg commented Oct 13, 2015

I'm not sure if @ellismg considered using ICU on Windows for CoreCLR, but I don't think the approach would be correct. It would create a new dependency on Windows, and some would consider the dependency needless, since culture data already exists on Windows.

For folks like @mviorno that need consistent data across platforms, it seems like the correct long term direction is for someone to spend time investing in the following:

  1. The ability to build a version of System.Globalization.Native that statically links against ICU.
  2. The ability to use System.Globalization.Native on Windows, instead of Win32.

I don't think there are technical reasons preventing the following from happening. I suspect a solution looks like setting up a way to our build with FEATURE_COREFX_GLOBALIZATION as true and then work to decouple the "Unix" portions of the code with the "ICU" parts of the code (today FEATURE_COREFX_GLOBALIZATION means use the version of globalization which has less ties to the runtime (it's a fork of the code we use in .NET Native) and then PLATFORM_UNIX means go to ICU).

Having said that, I don't think that we (Microsoft) are willing to invest in this area. As Shawn, Tarek and Eric said, our general philosophy is to use the underlying OS when possible. We also don't want to get in the game of statically linking against our dependencies. However, if there were patches or PR that would make the code more amenable to this, we would definitely consider them, amusing they don't impact existing scenarios. Like the FreeBSD port, I expect this would need to be a largely community supported thing, we would try not to break things, but we wouldn't spend a ton of resources in the area.

If there's someone who is passionate about it, I'm more than happy to provide guidance and code reviews. My e-mail should be easy enough to find (it's in every commit and should be public on GitHub) if someone wants to get in touch.

For now, I'm going to close this issue.

@ellismg ellismg closed this as completed Oct 13, 2015
@effyteva
Copy link

effyteva commented Dec 8, 2018

@ellismg
This seems like a major issue in switching applications from using .NET core on Windows and Linux, as the CultureInfo contain different values per platform.
In our use case, we need to use both Windows and Linux servers, pointing to the same databases.
For instance, the MySQL library tried to read Int64 values from the database.

  1. On windows, it stores negatives values with a minus sign.
  2. On Linux, it stores negatives values with a dash sign.
    We looked into using a custom CultureInfo, but that's also platform dependent, and doesn't support creating a custom CultureInfo.

Is there any workaround for that issue?

@eerhardt
Copy link
Member

eerhardt commented Dec 8, 2018

Have you tried using the InvariantCulture to store the numbers into the database? (Side question: why are you serializing numbers into strings to store them in a DB?)

If you use the InvariantCulture, it will be the same across all machines. Then when parsing the numbers back - use the InvariantCulture as well. Only use the CurrentCulture for displaying to the user.

@tarekgh
Copy link
Member

tarekgh commented Dec 8, 2018

@effyteva The applications cannot assume the globalization data not changing even on the same OS. even now Windows is changing its data to use CLDR. if you want to use consistent data across the OS and OS versions then use invariant culture. We have invariant culture for this purpose.

@effyteva
Copy link

effyteva commented Dec 9, 2018

Thanks guys, updating all our Parse methods to InvariantCulture solved this.
We don't serialize numbers to the DB, not sure where the MySQL issue original from, but it was fixed during the Parse methods update.

briansull referenced this issue in briansull/coreclr Jan 25, 2020
- Pull Request Runtime\#1734
- This change corrects a cut-and paste typo with a previous commit.
- Includes test case Runtime_1104.cs
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants