Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringComparison.OrdinalIgnoreCase does not consistently match ToUpperInvariant #67873

Open
GrabYourPitchforks opened this issue Apr 11, 2022 · 1 comment

Comments

@GrabYourPitchforks
Copy link
Member

Per the following two documents, comparing strings using StringComparison.OrdinalIgnoreCase is explicitly documented as equivalent to calling ToUpperInvariant on each string, then performing an ordinal comparison against the contents.

These statements within the docs imply that ToUpperInvariant and ToLowerInvariant are ordinal case conversions ("simple case mapping"), not linguistic case conversions. However, it looks like we're not consistenly following this pattern.

string s1 = "s";
string s2 = "\u017f"; // Latin Sharp S, which uppercase-maps to a normal ASCII "S"
Console.WriteLine(s1.Equals(s2, StringComparison.OrdinalIgnoreCase)); // False
Console.WriteLine(s1.ToUpperInvariant() == s2.ToUpperInvariant()); // True

This has collateral impact. For example, recent PRs like #67758 assume that non-ASCII characters cannot case-map to ASCII characters, which is not a guarantee offered by Unicode, but which might be a guarantee we'd be willing to make separately within the runtime by munging the Unicode tables.

See also #30960 for further discussion on case mapping as a more general Unicode concept.

/cc @tarekgh, who had thoughts on this offline.

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 11, 2022
@ghost
Copy link

ghost commented Apr 11, 2022

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

Per the following two documents, comparing strings using StringComparison.OrdinalIgnoreCase is explicitly documented as equivalent to calling ToUpperInvariant on each string, then performing an ordinal comparison against the contents.

These statements within the docs imply that ToUpperInvariant and ToLowerInvariant are ordinal case conversions ("simple case mapping"), not linguistic case conversions. However, it looks like we're not consistenly following this pattern.

string s1 = "s";
string s2 = "\u017f"; // Latin Sharp S, which uppercase-maps to a normal ASCII "S"
Console.WriteLine(s1.Equals(s2, StringComparison.OrdinalIgnoreCase)); // False
Console.WriteLine(s1.ToUpperInvariant() == s2.ToUpperInvariant()); // True

This has collateral impact. For example, recent PRs like #67758 assume that non-ASCII characters cannot case-map to ASCII characters, which is not a guarantee offered by Unicode, but which might be a guarantee we'd be willing to make separately within the runtime by munging the Unicode tables.

See also #30960 for further discussion on case mapping as a more general Unicode concept.

/cc @tarekgh, who had thoughts on this offline.

Author: GrabYourPitchforks
Assignees: -
Labels:

area-System.Globalization

Milestone: -

@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Apr 11, 2022
@tarekgh tarekgh added this to the Future milestone Apr 11, 2022
@tarekgh tarekgh modified the milestones: Future, 9.0.0 Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants