Skip to content

Fix culture creation with undetermined lang tag #115166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ internal sealed partial class CultureData
/// </summary>
/// <param name="name">The locale name that ICU returns.</param>
/// <param name="extension">The extension part in the original culture name.</param>
/// <param name="originalName">The original input culture name.</param>
/// <param name="collationStart">The index of the collation in the name.</param>
/// <remarks>
/// BCP 47 specifications allow for extensions in the locale name, following the format language-script-region-extensions-collation. However,
Expand All @@ -29,7 +30,7 @@ internal sealed partial class CultureData
/// between names with extensions and those without. For example, we may have a name like en-US and en-US-u-xx. Although .NET doesn't support the extension xx,
/// we still include it in the name to distinguish it from the name without the extension.
/// </remarks>
private static string NormalizeCultureName(string name, ReadOnlySpan<char> extension, out int collationStart)
private static string NormalizeCultureName(string name, ReadOnlySpan<char> extension, string originalName, out int collationStart)
{
Debug.Assert(name is not null);
Debug.Assert(name.Length <= ICU_ULOC_FULLNAME_CAPACITY);
Expand All @@ -39,6 +40,16 @@ private static string NormalizeCultureName(string name, ReadOnlySpan<char> exten
Span<char> buffer = stackalloc char[ICU_ULOC_FULLNAME_CAPACITY];
int bufferIndex = 0;

// Using `Undetermined` language (e.g. `und-US`), the returned name from ICU getName is `_US`. We return back the original language `und`
if (name.Length > 1 && (name[0] == '-' || name[0] == '_') && originalName.StartsWith("und", StringComparison.OrdinalIgnoreCase))
{
buffer[bufferIndex++] = 'u';
buffer[bufferIndex++] = 'n';
buffer[bufferIndex++] = 'd';

changed = true;
}

for (int i = 0; i < name.Length && bufferIndex < ICU_ULOC_FULLNAME_CAPACITY; i++)
{
char c = name[i];
Expand Down Expand Up @@ -141,7 +152,7 @@ private bool InitIcuCultureDataCore()

Debug.Assert(_sWindowsName != null);

_sRealName = NormalizeCultureName(_sWindowsName, indexOfExtensions > 0 ? _sRealName.AsSpan(indexOfExtensions) : ReadOnlySpan<char>.Empty, out int collationStart);
_sRealName = NormalizeCultureName(_sWindowsName, indexOfExtensions > 0 ? _sRealName.AsSpan(indexOfExtensions) : ReadOnlySpan<char>.Empty, _sRealName, out int collationStart);

_iLanguage = LCID;
if (_iLanguage == 0)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,18 @@ public static IEnumerable<object[]> Ctor_String_TestData()
}
}

[ConditionalTheory(typeof(PlatformDetection), nameof(PlatformDetection.IsIcuGlobalization))]
[InlineData("und-us", "und-US", "und-US")]
[InlineData("und-us_tradnl", "und-US", "und-US_tradnl")]
[InlineData("und-es-u-co-phoneb", "und-ES", "und-ES_phoneb")]
[InlineData("und-es-t-something", "und-ES", "und-ES")]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test for und-fonipa? That's a common tag for IPA, referenced on Wikipedia for example: https://en.wikipedia.org/wiki/International_Phonetic_Alphabet#IETF_language_tags

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunetly, Windows currently doesn't allow using this name. You can try the code:

            const uint LOCALE_SNAME = 0x0000005C;
            char[] buffer = new char[256];
            int res = GetLocaleInfoEx("und-fonipa", LOCALE_SNAME, buffer, buffer.Length);
            int errorCode = Marshal.GetLastWin32Error();

            Console.WriteLine($"GetLocaleInfoEx: {res} ... Error Code: {errorCode}");

        [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
        internal static extern unsafe int GetLocaleInfoEx(string lpLocaleName, uint LCType, char[] buffer, int cchData);

and you will see GetLocaleInfoEx: 0 ... Error Code: 87. 87 means The parameter is incorrect.

I would suggest you log issue for Windows through Windows Feedback hub. If Windows fixes that, it will automatically work with .NET.

By the way, if you run on Linux, this should work fine with the current fix here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xadxura may help with this too?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows requires the script tag to be declared unless a suppress-script value is defined for that langauge in BCP-47. The language tag 'und' has no suppress script defined so the script must be declared. One could say taht und-fonipa implicitly declares the script from the variant tag fonipa however we don't allow leftward propagation of script info. This is also why we don't infer Hans from zh-CN. Please use the tag und-Latn-fonipa, which does work with Windows:

image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xadxura I am seeing Windows don't allow und-Latn-fonipa when calling GetLocaleInfoEx. Am I missing something? should und-Latn-fonipa need to get registered on Windows before using this tag?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @tarekgh and @xadxura. Duh, I knew that Windows needed Latn but just plain forgot when asking here. Sorry for the side-track. But it sounds like und-Latn-fonipa still has an open question.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, custom BCP-47 tags must be registered on the system first.

public void CtorUndeterminedLanguageTag(string cultureName, string expectedCultureName, string expectedSortName)
{
CultureInfo culture = new CultureInfo(cultureName);
Assert.Equal(expectedCultureName, culture.Name);
Assert.Equal(expectedSortName, culture.CompareInfo.Name);
}

[Theory]
[MemberData(nameof(Ctor_String_TestData))]
public void Ctor_String(string name, string[] expectedNames)
Expand Down
Loading