Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

en_us.POSIX locale environment for .NET Core on Linux #25678

Closed
alexkeh opened this issue Mar 29, 2018 · 12 comments
Closed

en_us.POSIX locale environment for .NET Core on Linux #25678

alexkeh opened this issue Mar 29, 2018 · 12 comments
Labels
area-System.Globalization question Answer questions and provide assistance, not an issue with source code or documentation.
Milestone

Comments

@alexkeh
Copy link

alexkeh commented Mar 29, 2018

Does .NET Core support Linux in en_us.POSIX locale environment?

Based on this commit, which adds a regression test for a bug fix and it invokes CultureInfo("en-US-POSIX"). However, my team has run into a fundamental problem running .NET Core tests in Linux with en_us.POSIX locale. We have no issue with the en_us.UTF8 locale. For example, string.Compare() does not work properly in case-sensitive comparison.

  • string.Compare("C03DATE", "C03DATE", true) return zero, which is expected
  • string.Compare("c03DATE", "C03DATE", true) return non-zero, which is not expected because it is case-insensitive comparison. Third parameter is true. In other locales, this method returns zero.

Thanks to @divega for providing some initial help on this question.

@joshfree
Copy link
Member

cc: @tarekgh

@tarekgh
Copy link
Member

tarekgh commented Mar 29, 2018

@alexkeh The behavior of the string comparisons when using en-US-POSIX is unuseful and should be avoided if you are doing some linguistic string comparisons. 'c' is not equal to 'C' even when using case insensitive option with en-US-POSIX. That is how the collation rules defined for en-US-POSIX. In other words, en-US-POSIX is not using the expected Unicode collation.

I would recommend changing the default locale other than en-US-POSIX to get the desired behavior. you can do that by setting the LC_ALL environment variable or in your code you can set CultureInfo.CurrentCulture.

you may look at the thread https://marc.info/?l=icu&m=101779972120281&w=2 too regarding this behavior.

@tarekgh tarekgh closed this as completed Mar 29, 2018
@mwoo-o
Copy link

mwoo-o commented May 1, 2018

We wrote .NET Core code to support different kinds of platforms (Windows, Linux and any platforms that .NET Standard2.0 supports). string.compare() is a very common functions and it should handle POSIX character sets properly if POSIX is supported in .NET Core. Our drivers have been verified on Windows with different character sets propery, as .NET Core supports this character sets with no problem. We also tested it on Linux with different character sets and it works, except POSIX.

Question:
If .NET Core supports POSIX, shouldn't this string.compare() method works as other character sets in
.NET Core. We need to determine if we can officially supports our driver on Linux-POSIX.

Does Microsoft officially support POSIX in .NET Core?

@tarekgh
Copy link
Member

tarekgh commented May 1, 2018

@mwoo-o The issue here is if .Net Core supporting POSIX. the issue here is what is the POSIX string comparison behavior. if you look at the link I sent before https://marc.info/?l=icu&m=101779972120281&w=2 you'll see

The purpose of the en_US_POSIX locale in general is to make ICU more locale insensitive in certain circumstances.

This makes the POSIX locale has special behavior for string comparison. in another word, en_US_POSIX behavior is really different than en_US behavior. the string comparison behavior is defined by LCDR (Unicode.org) here https://unicode.org/cldr/trac/browser/trunk/common/collation/en_US_POSIX.xml and this behavior is carried by ICU library which the framework depends on for performing the string comparison operations. so, the framework is not really defining the behavior here.

You still can control the behavior here by setting the current culture to whatever locale you want or even Invariant if you like.

@mwoo-o
Copy link

mwoo-o commented May 1, 2018

We can't control the locale that are used by the customers. So that will mean that if we want to
support POSIX, then we will have to make some code differentiation for POSIX locale in our driver.
Is this correct understanding?

Questions:
Does Windows support POSIX locale?
Other than string functions in POSIX, is there something that we need to aware for POSIX behavior?
Any link information for POSIX behaviors will be appreciated.

@tarekgh
Copy link
Member

tarekgh commented May 1, 2018

Does Windows support POSIX locale?

No

Other than string functions in POSIX, is there something that we need to aware for POSIX behavior?
Any link information for POSIX behaviors will be appreciated.
.Net Core is just using ICU library. you may consult the ICU library documentation here http://userguide.icu-project.org/locale

We can't control the locale that are used by the customers.

You can still set the current culture in your code. using CultureInfo.CurrentCulture

So that will mean that if we want to support POSIX, then we will have to make some code differentiation for POSIX locale in our driver. Is this correct understanding?

We discourage users to set their locale to POSIX because of the issues we talked about here. If the user decided to use POSIX locale, then that is their choice and they will get the behavior for that locale which may not be desired but that is what the user chose.

@mwoo-o
Copy link

mwoo-o commented May 2, 2018

@tarekgh
We are still not convince that this is not .NET Core issue for POSIX locale. In Java, String.equalsIgnoreCase(), as well as equals(), compareTo(), compareToIgnoreCase(), compares the two strings using the case mapping information from the UnicodeData file. It means that the behavior is independent from the locale setting. In order to compare two strings honoring a certain locale, you use java.text.Collator class.
In .Net, the String.Compare() apparently compares two strings honoring the current culture (=locale):
https://msdn.microsoft.com/en-us/library/zkcaxw5y(v=vs.110).aspx
Remarks
The comparison uses the current culture to obtain culture-specific information such as casing rules and the alphabetic order of individual characters. For example, a culture could specify that certain combinations of characters be treated as a single character, or uppercase and lowercase characters be compared in a particular way, or that the sorting order of a character depends on the characters that precede or follow it.
The comparison is performed using word sort rules. For more information about word, string, and ordinal sorts, see System.Globalization.CompareOptions.

@tarekgh
Copy link
Member

tarekgh commented May 2, 2018

@mwoo-o

The .Net framework chose to use the current culture for the default comparisons. You have the choice to perform the comparisons differently as you desired. Just pass the parameter StringComparison.OrdinalIgnoreCase or StringComparison.InvariantCultureIgnoreCase to the comparison APIs and you'll get what you want:

https://msdn.microsoft.com/en-us/library/t4411bks(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/c64xh8f9(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/e6883c06(v=vs.110).aspx

why you can't do that?

@mwoo-o
Copy link

mwoo-o commented May 2, 2018

@tarekgh
Are you suggesting to use
string.Compare(string1, string2, StringComparison. InvariantCultureIgnoreCase)?

In our code, we have a lot of
string.Compare(string1, string2, true)
If we change it to use StringComparison. InvariantCultureIgnoreCase, do you see any impact on other
locales that will cause different behaviors?

@tarekgh
Copy link
Member

tarekgh commented May 2, 2018

@mwoo-o

You need to look at the places you are doing string comparison in your code and figure out if the comparison you are doing is one of the following cases:

  • Does the comparison need to be performed linguistically according to the current culture (whatever the user choose the current culture)? if so, then you don't need to pass the StringComparison at all and we'll use the current culture to perform the comparison. example of that, you want to comparison handle the Turkish I comparison correctly when the user chooses to set the current culture as Turkish locale. This type of comparisons usually needed for the strings used by the UI.
  • Does the comparison need to be performed ordinal and not Linguistically, at that time just pass the StringComparison.OrdinalIgnoreCase. this comparison will do the comparison as comparing every character in the first string to another character in the second string (and handling the casing according to Unicode spec). example of that, if parsing XML element and you know the element names are usually in English. Or comparing file names and paths.
  • Does the comparison need to be performed linguistically but independently from the current culture, then pass StringComparison.InvariantCultureIgnoreCase. Note that, if the user sets the current locale to Turkish and comparing strings containing Turkish I then the comparison may not perform as the user expect because you didn't choose to use the current culture. using Invariant comparison is useful when want to have consistent behavior regardless of the linguistic correctness according to current culture.
  • Does the comparison need to behave according to a specific language, if so then you pass the culture has this language to the string comparison method. for example, if you want all comparisons performed according to the English language, then you pass en-US culture to the Compare method.

You may look at the code sample https://docs.microsoft.com/en-us/dotnet/api/system.stringcomparison?view=netframework-4.7.1#examples to get some idea about what you expect from each comparison type.

To summarize, you need to know what string comparison behavior you want and choose the right option according to that. If you have some example from your code, I can help you tell which option might be good in that case.

@mwoo-o
Copy link

mwoo-o commented May 2, 2018

@tarekgh
Thank you very much for the information. We will need to go through 100+ string.Compare() calls. Will let you know if we have questions.

@tarekgh
Copy link
Member

tarekgh commented Oct 8, 2018

dotnet/docs#8179

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Globalization question Answer questions and provide assistance, not an issue with source code or documentation.
Projects
None yet
Development

No branches or pull requests

5 participants