Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect value of CultureInfo.TextInfo.ListSeparator on Unix #536

Closed
AndreyAkinshin opened this issue Dec 5, 2019 · 3 comments
Closed
Labels
area-System.Globalization enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@AndreyAkinshin
Copy link
Member

AndreyAkinshin commented Dec 5, 2019

Consider the following expression:

CultureInfo("ru-RU").TextInfo.ListSeparator

The expected value is ;, we get it on .NET Framework, Mono, and .NET Core on Windows. However, .NET Core on Linux returns  . The problem is in pal_localeStringData.c, see these lines:

case LocaleString_ListSeparator:
// fall through
case LocaleString_ThousandSeparator:
    status = GetLocaleInfoDecimalFormatSymbol(locale, UNUM_GROUPING_SEPARATOR_SYMBOL, value, valueLength);
    break;

As we can see, it always returns LocaleString_ThousandSeparator (CultureInfo.NumberFormat.NumberGroupSeparator) instead of the expected LocaleString_ListSeparator.

If I understand correctly, there is no such thing like "ListSeparator" on Unix. So, there is no a standard function that we can call and get the relevant value. In Mono, the same problem was solved with the help of locale-builder. Could we apply the same approach in .NET Core?

Some additional details can be found in my blog post: https://aakinshin.net/posts/how-listseparator-depends-on-runtime-and-operating-system/

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Globalization untriaged New issue has not been triaged by the area owner labels Dec 5, 2019
AndreyAkinshin added a commit to dotnet/BenchmarkDotNet that referenced this issue Dec 5, 2019
The goal of these refactoring is to unify some of our API and respect the given value of CultureInfo.
The implementation is still not perfect, but it's an important first step.
Highlights:

* Unify ToString APIs
  All of the ToStr(), ToTimeStr(), ToSizeStr() methods were removed.
  Initially, they were introduced as hacked shortcuts to formatting purposes, but today they become misleading in some cases.
  Instead of them, now we always should use ToString().
  Where it makes sense, the proper overloads like ToString(CultureInfo, format) are available.
* Introduce CultureInfo property in Config
  Now it's possible to set the CultureInfo, the given value will be used everywhere including exporters (Fix #1295)
* Remove MultiEncodingString and Encodings in Configs
  The original goal of these classes was providing a way to enable Unicode support in exporter (see #487 for details).
  Unfortunately, it made many APIs overcomplicated because we had to pass Encoding everywhere.
  If we think carefully, we will understand that the only problem that we actually have relates to the terminal that doesn't support Unicode
    (in other places, we can use Unicode symbols without any problems).
  I decided to delete the MultiEncodingString class, use Unicode symbols by default, and patch them only in ConsoleLogger
    (see AsciiHelper for details).
  Now, users can turn on Unicode support in console output via [UnicodeConsoleLogger]
    (see IntroUnicode for details).
  All other exporters are always able to use Unicode.
  Some of them may patch Unicode symbols to achieve better portability (e.g., see HtmlExporter.HtmlLoggerWrapper.Escape)
* Introduce ILogger.Id and ILogger.Priority
  These APIs allow overriding existing loggers by custom.
  For each Id, only the logger with the highest priority will be chosen.
  With the help of this feature, we can override ConsoleLogger.Ascii with ConsoleLogger.Unicode
* Introduce SizeValue
  This structure is a wrapper for a long value that helps to operate with size values (bytes, kilobytes, etc.)
* Rename TimeInterval->TimeValue
  The original name was confusing because an interval typically has start and end.
  In our situation, it similar to TimeSpan but has its own features and use cases.
  I decided to rename it to TimeValue (to make consistent with SizeValue).
* Introduce CultureInfo.GetActualListSeparator()
  TextInfo.ListSeparator shouldn't be used anymore because it returns incorrect value on .NET Core+Unix
    (see dotnet/runtime#536 for details)
AndreyAkinshin added a commit to dotnet/BenchmarkDotNet that referenced this issue Dec 5, 2019
The goal of these refactoring is to unify some of our API and respect the given value of CultureInfo.
The implementation is still not perfect, but it's an important first step.
Highlights:

* Unify ToString APIs
  All of the ToStr(), ToTimeStr(), ToSizeStr() methods were removed.
  Initially, they were introduced as hacked shortcuts to formatting purposes, but today they become misleading in some cases.
  Instead of them, now we always should use ToString().
  Where it makes sense, the proper overloads like ToString(CultureInfo, format) are available.
* Introduce CultureInfo property in Config
  Now it's possible to set the CultureInfo, the given value will be used everywhere including exporters (Fix #1295)
* Remove MultiEncodingString and Encodings in Configs
  The original goal of these classes was providing a way to enable Unicode support in exporter (see #487 for details).
  Unfortunately, it made many APIs overcomplicated because we had to pass Encoding everywhere.
  If we think carefully, we will understand that the only problem that we actually have relates to the terminal that doesn't support Unicode
    (in other places, we can use Unicode symbols without any problems).
  I decided to delete the MultiEncodingString class, use Unicode symbols by default, and patch them only in ConsoleLogger
    (see AsciiHelper for details).
  Now, users can turn on Unicode support in console output via [UnicodeConsoleLogger]
    (see IntroUnicode for details).
  All other exporters are always able to use Unicode.
  Some of them may patch Unicode symbols to achieve better portability (e.g., see HtmlExporter.HtmlLoggerWrapper.Escape)
* Introduce ILogger.Id and ILogger.Priority
  These APIs allow overriding existing loggers by custom.
  For each Id, only the logger with the highest priority will be chosen.
  With the help of this feature, we can override ConsoleLogger.Ascii with ConsoleLogger.Unicode
* Introduce SizeValue
  This structure is a wrapper for a long value that helps to operate with size values (bytes, kilobytes, etc.)
* Rename TimeInterval->TimeValue
  The original name was confusing because an interval typically has start and end.
  In our situation, it similar to TimeSpan but has its own features and use cases.
  I decided to rename it to TimeValue (to make consistent with SizeValue).
* Introduce CultureInfo.GetActualListSeparator()
  TextInfo.ListSeparator shouldn't be used anymore because it returns incorrect value on .NET Core+Unix
    (see dotnet/runtime#536 for details)
AndreyAkinshin added a commit to dotnet/BenchmarkDotNet that referenced this issue Dec 5, 2019
The goal of these refactoring is to unify some of our API and respect the given value of CultureInfo.
The implementation is still not perfect, but it's an important first step.
Highlights:

* Unify ToString APIs
  All of the ToStr(), ToTimeStr(), ToSizeStr() methods were removed.
  Initially, they were introduced as hacked shortcuts to formatting purposes, but today they become misleading in some cases.
  Instead of them, now we always should use ToString().
  Where it makes sense, the proper overloads like ToString(CultureInfo, format) are available.
* Introduce CultureInfo property in Config
  Now it's possible to set the CultureInfo, the given value will be used everywhere including exporters (Fix #1295)
* Remove MultiEncodingString and Encodings in Configs
  The original goal of these classes was providing a way to enable Unicode support in exporter (see #487 for details).
  Unfortunately, it made many APIs overcomplicated because we had to pass Encoding everywhere.
  If we think carefully, we will understand that the only problem that we actually have relates to the terminal that doesn't support Unicode
    (in other places, we can use Unicode symbols without any problems).
  I decided to delete the MultiEncodingString class, use Unicode symbols by default, and patch them only in ConsoleLogger
    (see AsciiHelper for details).
  Now, users can turn on Unicode support in console output via [UnicodeConsoleLogger]
    (see IntroUnicode for details).
  All other exporters are always able to use Unicode.
  Some of them may patch Unicode symbols to achieve better portability (e.g., see HtmlExporter.HtmlLoggerWrapper.Escape)
* Introduce ILogger.Id and ILogger.Priority
  These APIs allow overriding existing loggers by custom.
  For each Id, only the logger with the highest priority will be chosen.
  With the help of this feature, we can override ConsoleLogger.Ascii with ConsoleLogger.Unicode
* Introduce SizeValue
  This structure is a wrapper for a long value that helps to operate with size values (bytes, kilobytes, etc.)
* Rename TimeInterval->TimeValue
  The original name was confusing because an interval typically has start and end.
  In our situation, it similar to TimeSpan but has its own features and use cases.
  I decided to rename it to TimeValue (to make consistent with SizeValue).
* Introduce CultureInfo.GetActualListSeparator()
  TextInfo.ListSeparator shouldn't be used anymore because it returns incorrect value on .NET Core+Unix
    (see dotnet/runtime#536 for details)
adamsitnik pushed a commit to dotnet/BenchmarkDotNet that referenced this issue Dec 12, 2019
* CultureInfo Refactoring

The goal of these refactoring is to unify some of our API and respect the given value of CultureInfo.
The implementation is still not perfect, but it's an important first step.
Highlights:

* Unify ToString APIs
  All of the ToStr(), ToTimeStr(), ToSizeStr() methods were removed.
  Initially, they were introduced as hacked shortcuts to formatting purposes, but today they become misleading in some cases.
  Instead of them, now we always should use ToString().
  Where it makes sense, the proper overloads like ToString(CultureInfo, format) are available.
* Introduce CultureInfo property in Config
  Now it's possible to set the CultureInfo, the given value will be used everywhere including exporters (Fix #1295)
* Remove MultiEncodingString and Encodings in Configs
  The original goal of these classes was providing a way to enable Unicode support in exporter (see #487 for details).
  Unfortunately, it made many APIs overcomplicated because we had to pass Encoding everywhere.
  If we think carefully, we will understand that the only problem that we actually have relates to the terminal that doesn't support Unicode
    (in other places, we can use Unicode symbols without any problems).
  I decided to delete the MultiEncodingString class, use Unicode symbols by default, and patch them only in ConsoleLogger
    (see AsciiHelper for details).
  Now, users can turn on Unicode support in console output via [UnicodeConsoleLogger]
    (see IntroUnicode for details).
  All other exporters are always able to use Unicode.
  Some of them may patch Unicode symbols to achieve better portability (e.g., see HtmlExporter.HtmlLoggerWrapper.Escape)
* Introduce ILogger.Id and ILogger.Priority
  These APIs allow overriding existing loggers by custom.
  For each Id, only the logger with the highest priority will be chosen.
  With the help of this feature, we can override ConsoleLogger.Ascii with ConsoleLogger.Unicode
* Introduce SizeValue
  This structure is a wrapper for a long value that helps to operate with size values (bytes, kilobytes, etc.)
* Introduce CultureInfo.GetActualListSeparator()
  TextInfo.ListSeparator shouldn't be used anymore because it returns incorrect value on .NET Core+Unix
    (see dotnet/runtime#536 for details)

* Improve EncodingAttribute

* Improve PlainExporter

* Improve TostResult

* Improve AsciiHelper
@tarekgh tarekgh added enhancement Product code improvement that does NOT require public API changes/additions and removed untriaged New issue has not been triaged by the area owner labels Jun 17, 2020
@tarekgh tarekgh added this to the Future milestone Jun 17, 2020
@KalleOlaviNiemitalo
Copy link

The same problem was reported in #43795, but now in .NET 5, it applies to Windows as well.

CLDR has data for list patterns, but those are more complex than just a ListSeparator string:

Perhaps .NET could take the "middle" pattern of "and" lists, e.g. {0}, {1}; verify that it starts with {0} and ends with {1}; and use the text in between as the ListSeparator. This would usually result in a string that ends with a space, unlike typical ListSeparator values on .NET Framework.

@tarekgh tarekgh modified the milestones: Future, 6.0.0 Oct 25, 2020
@KalleOlaviNiemitalo
Copy link

Are the list patterns currently available from the ICU API? I did not find anything dedicated in https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/.

@tarekgh
Copy link
Member

tarekgh commented Nov 1, 2020

Closing this one and we are tracking the issue on #43795 as it include more discussions there.

@tarekgh tarekgh closed this as completed Nov 1, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Globalization enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
Development

No branches or pull requests

4 participants