StringComparison.OrdinalIgnoreCase does not consistently match ToUpperInvariant #67873

GrabYourPitchforks · 2022-04-11T21:14:26Z

Per the following two documents, comparing strings using StringComparison.OrdinalIgnoreCase is explicitly documented as equivalent to calling ToUpperInvariant on each string, then performing an ordinal comparison against the contents.

These statements within the docs imply that ToUpperInvariant and ToLowerInvariant are ordinal case conversions ("simple case mapping"), not linguistic case conversions. However, it looks like we're not consistenly following this pattern.

string s1 = "s";
string s2 = "\u017f"; // Latin Sharp S, which uppercase-maps to a normal ASCII "S"
Console.WriteLine(s1.Equals(s2, StringComparison.OrdinalIgnoreCase)); // False
Console.WriteLine(s1.ToUpperInvariant() == s2.ToUpperInvariant()); // True

This has collateral impact. For example, recent PRs like #67758 assume that non-ASCII characters cannot case-map to ASCII characters, which is not a guarantee offered by Unicode, but which might be a guarantee we'd be willing to make separately within the runtime by munging the Unicode tables.

See also #30960 for further discussion on case mapping as a more general Unicode concept.

/cc @tarekgh, who had thoughts on this offline.

The text was updated successfully, but these errors were encountered:

ghost · 2022-04-11T21:14:31Z

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

Per the following two documents, comparing strings using StringComparison.OrdinalIgnoreCase is explicitly documented as equivalent to calling ToUpperInvariant on each string, then performing an ordinal comparison against the contents.

These statements within the docs imply that ToUpperInvariant and ToLowerInvariant are ordinal case conversions ("simple case mapping"), not linguistic case conversions. However, it looks like we're not consistenly following this pattern.

string s1 = "s";
string s2 = "\u017f"; // Latin Sharp S, which uppercase-maps to a normal ASCII "S"
Console.WriteLine(s1.Equals(s2, StringComparison.OrdinalIgnoreCase)); // False
Console.WriteLine(s1.ToUpperInvariant() == s2.ToUpperInvariant()); // True

This has collateral impact. For example, recent PRs like #67758 assume that non-ASCII characters cannot case-map to ASCII characters, which is not a guarantee offered by Unicode, but which might be a guarantee we'd be willing to make separately within the runtime by munging the Unicode tables.

See also #30960 for further discussion on case mapping as a more general Unicode concept.

/cc @tarekgh, who had thoughts on this offline.

Author:	GrabYourPitchforks
Assignees:	-
Labels:	`area-System.Globalization`
Milestone:	-

GrabYourPitchforks added the area-System.Globalization label Apr 11, 2022

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 11, 2022

tarekgh removed the untriaged New issue has not been triaged by the area owner label Apr 11, 2022

tarekgh added this to the Future milestone Apr 11, 2022

tarekgh mentioned this issue Aug 23, 2023

[API Proposal]: Add ToLowerOrdinal() & ToUpperOrdinal() #90999

Open

tarekgh modified the milestones: Future, 9.0.0 Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StringComparison.OrdinalIgnoreCase does not consistently match ToUpperInvariant #67873

StringComparison.OrdinalIgnoreCase does not consistently match ToUpperInvariant #67873

GrabYourPitchforks commented Apr 11, 2022

ghost commented Apr 11, 2022

StringComparison.OrdinalIgnoreCase does not consistently match ToUpperInvariant #67873

StringComparison.OrdinalIgnoreCase does not consistently match ToUpperInvariant #67873

Comments

GrabYourPitchforks commented Apr 11, 2022

ghost commented Apr 11, 2022