Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String comparisons with the CompareOptions.StringSort value produce incorrect results under .NET 5 and later #102579

Open
daverayment opened this issue May 22, 2024 · 4 comments
Labels
area-System.Globalization documentation Documentation bug or enhancement, does not impact product or test code help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@daverayment
Copy link

Description

When comparing strings, CompareOptions.StringSort should apply low sort weights to hyphens and other non-alphanumeric characters. This works in .NET Framework projects. In .NET 5 and later, however, the weightings are not applied and the results of sorting with CompareOptions.StringSort are the same as when CompareOptions.None is chosen.

Note: I am using the default ICU Unicode processing for .NET 5+ testing.

Reproduction Steps

This code is adapted from the CompareOptions Enum documentation page here. The word list has been copied verbatim.

using System;
using System.Collections.Generic;
using System.Globalization;

public class SamplesCompareOptions
{
	public static void Main()
	{
		var wordList = new List<string> { "cant", "bill's", "coop", "cannot", "billet", "can't", "con", "bills", "co-op" };

		wordList.Sort((x, y) => string.Compare(x, y, CultureInfo.CurrentCulture, CompareOptions.None));
		Console.WriteLine("\nAfter default sort (CompareOptions.None):");
		foreach (string word in wordList)
		{
			Console.WriteLine(word);
		}

		wordList.Sort((x, y) => string.Compare(x, y, CultureInfo.CurrentCulture, CompareOptions.StringSort));
		Console.WriteLine("\nAfter sorting with CompareOptions.StringSort:");
		foreach (string word in wordList)
		{
			Console.WriteLine(word);
		}
	}
}

DotNetFiddle for the code here.

Expected behavior

The CompareOptions.StringSort should apply a correct weighted ordering to the unordered collection of strings. The results are correct in .NET Framework 4.7.2 and Roslyn 4.8:

After default sort (CompareOptions.None):
billet
bills
bill's
cannot
cant
can't
con
coop
co-op

After sorting with CompareOptions.StringSort:
bill's
billet
bills
can't
cannot
cant
co-op
con
coop

Actual behavior

In .NET 5 and later, CompareOptions.StringSort is incorrect, producing the same results as CompareOptions.None:

After default sort (CompareOptions.None):
bill's
billet
bills
can't
cannot
cant
co-op
con
coop

After sorting with CompareOptions.StringSort:
bill's
billet
bills
can't
cannot
cant
co-op
con
coop

Regression?

According to testing on dotnetfiddle.net, the correct results were produced in .NET Framework 4.7.2 and Roslyn 4.8. .NET 5 and later produce the incorrect sort order.

Known Workarounds

A potential workaround may be to switch from ICU to NLS, but I have not tested this.

Configuration

My system:

  • .NET 8
  • Windows 11 latest
  • x64

I don't think the issue is specific to my OS or architecture, as the same problem can be seen via dotnetfiddle.

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 22, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

@tarekgh
Copy link
Member

tarekgh commented May 22, 2024

In .NET 5.0 and later, we switched to using the ICU library. For more information, please refer to this article.

You may notice some behavioral differences between the legacy NLS (used in .NET Framework) and ICU. In ICU, the StringSort behavior is enabled by default, rendering the StringSort option ineffective. This default setting is why you consistently see the following order:

bill's
billet
bills
can't
cannot
cant
co-op
con
coop

This behavior is explained in the comment in the code here. We do not plan to change this behavior in the future as we adhere to ICU behavior, which aligns with the Unicode Standard.

We may add some information about this specific case in the documentation in the article.

@tarekgh tarekgh added this to the Future milestone May 22, 2024
@tarekgh tarekgh added documentation Documentation bug or enhancement, does not impact product or test code help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels May 22, 2024
@daverayment
Copy link
Author

@tarekgh Thank you for the quick response.

Sorry, I do see now that the StringSort option is being applied by default in .NET 5+ rather than not being applied at all.

This still means the CompareOptions documentation is incorrect for .NET 5 and later. The example code says to expect different outputs for None and StringSort options.

I will raise a separate documentation issue for that page and refer back here. I also thank you for suggesting an update to the ICU article to mention the CompareOptions change - that would be very useful, as I read that article myself while trying to troubleshoot.

Thanks again!

@daverayment
Copy link
Author

I've raised a new documentation issue for the CompareOptions enum page: dotnet/docs#41052

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Globalization documentation Documentation bug or enhancement, does not impact product or test code help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests

2 participants