API Proposal: StringComparer.IsWellKnownComparer #50059

GrabYourPitchforks · 2021-03-22T20:38:06Z

Scenario

When serializing Dictionary<string, ...>, HashSet<string>, and similar collection types, the serializer may need to know what comparer is in use so that it can include that information in the payload. This is normally provided by ISerializable.GetObjectData, but that interface is intended for serializers that embed full type information inside the payload - a practice we strongly discourage.

Even though APIs like Dictionary<,>.Comparer and HashSet<>.Comparer are public, there's no good way to inspect the returned instance and see what comparison is being used under the covers. This API proposal provides a way to perform this inspection without relying on BinaryFormatter or related infrastructure.

Proposed API

namespace System
{
    public abstract class StringComparer
    {
        // new proposed APIs

        public static bool IsWellKnownOrdinalComparer(IEqualityComparer<string?>? comparer, out bool ignoreCase);
        public static bool IsWellKnownCultureAwareComparer(IEqualityComparer<string?>? comparer, [NotNullWhen(true)] out CompareInfo? compareInfo, out CompareOptions compareOptions);
    }
}

Usage

Dictionary<string, Foo> dict = GetDictionary();
StringComparer? reconstructedComparer = null;

if (StringComparer.IsWellKnownOrdinalComparer(dict.Comparer, out bool ignoreCase))
{
    if (ignoreCase)
    {
        Console.WriteLine("Using StringComparer.OrdinalIgnoreCase.");
        reconstructedComparer = StringComparer.OrdinalIgnoreCase;
    }
    else
    {
        Console.WriteLine("Using StringComparer.Ordinal.");
        reconstructedComparer = StringComparer.Ordinal;
    }
}
else if (StringComparer.IsWellKnownCultureAwareComparer(dict.Comparer, out CompareInfo compareInfo, out CompareOptions compareOptions))
{
    Console.WriteLine($"Using culture-aware comparer for culture '{compareInfo.Name}'.");
    reconstructedComparer = compareInfo.GetStringComparer(compareOptions);
}

if (reconstructedComparer is null)
{
    Console.WriteLine("Unknown comparer.");
}

Behavior and discussion

Between the returned ignoreCase, compareInfo, and compareOptions values, enough information is provided to reconstruct an equivalent StringComparer instance. It is possible that both of the IsWellKnownComparer APIs will return false; e.g., if a completely custom comparer is in use, or if a new comparer type is added in a future release. Callers must be resilient against these possibilities and should implement a graceful fallback mechanism.

It is possible for a call to IsWellKnownOrdinalComparer to return true, even if the instance being queried isn't the exact same object reference as either StringComparer.Ordinal or StringComparer.OrdinalIgnoreCase. For example, the singleton EqualityComparer<string>.Default is a well-known ordinal comparer, even though it is not an instance of StringComparer. There are additionally some edge cases (primarily involving BinaryFormatter or DataContractSerializer) where the runtime will instantiate a new ordinal StringComparer rather than use the singleton StringComparer.Ordinal. The proposed IsWellKnownOrdinalComparer method only describes whether the queried instance's behavior is indistinguishable from one of those built-in singleton instances.

Technically, EqualityComparer<string>.Default is still distinguishable from StringComparer.Ordinal, as EqualityComparer<string?>.Default.GetHashCode((string?)null) will return 0, and StringComparer.Ordinal.GetHashCode((string?)null) will throw an exception. However, the typical use case for GetHashCode(T) is to generate a hash code for a non-null key for insertion into a bucketed collection. So it's good enough for our purposes and we'll just treat them as indistinguishable to make life easier for our callers.

The IsWellKnownOrdinalComparer method will return false when given null, EqualityComparer<object>.Default, or similar, as these are object comparers, not string comparers. It also avoids the slippery slope problem of trying to special-case allowing EqualityComparer<TInterface>.Default for any TInterface that string implements.

Security notes

Deserializers should use caution when reading comparer information from any serialized payload. Nominally, the application itself retains jurisdiction over the behavior of the collection. That is, the application calls the collection ctor with an appropriate comparer matched to the business logic, and the deserializer populates the collection based on the contents of the incoming payload. Allowing the deserialized payload to control the active comparer may result in the creation and population of a collection whose behavior is mismatched to the app's expected business logic. This could cause unexpected behaviors or security holes within the receiving application. Deserializers should only honor any incoming comparer information if the payload is coming from a trusted source.

Additionally, applications should avoid calling CompareInfo.GetCompareInfo(string) with untrusted inputs. This can result in resource exhaustion (a DoS attack). If an application absolutely must read culture information from an incoming payload, it should compare the culture string (e.g., "en-US") against an allow list before creating a CompareInfo around that string. (This same guideline applies to CultureInfo more generally.)

Alternative designs

I had considered simply returning an instance of the StringComparison enum from this API, but this will not work given Orleans's core scenario of serialization. The linguistic enum values of StringComparison are dependent on the ambient environment (CultureInfo.CurrentCulture), so there's no guarantee that two different machines will see the same behavior when instantiating a collection around a culture-aware comparison.

In an ideal world we'd be able to use discriminated unions (see dotnet/csharplang#113), and the API could be defined as below.

enum class StringComparerInfo
{
    Ordinal(bool ignoreCase),
    CultureAware(CompareInfo compareInfo, CompareOptions compareOptions),
}

namespace System
{
    public abstract class StringComparer
    {
        public static bool IsWellKnownComparer(IEqualityComparer<string?>? comparer, [NotNullWhen(true)] out StringComparerInfo? info);
    }
}

In the absence of this language feature, the cleanest design seemed to be splitting the two queries "are you ordinal?" and "are you linguistic?" across two separate methods.

Finally, we could consider making these new APIs instance methods on StringComparer, as shown below.

namespace System
{
    public abstract class StringComparer
    {
        public bool IsWellKnownOrdinalComparer(out bool ignoreCase);
        public bool IsWellKnownCultureAwareComparer([NotNullWhen(true)] out CompareInfo? compareInfo, out CompareOptions compareOptions);
    }
}

The upside to this is that it doesn't introduce a static API, which usability studies have shown are difficult for consumers to use. However, it doesn't allow passing EqualityComparer<string>.Default, whose return value is not in the StringComparer type hierarchy. This means that there would be no easy way to retrieve the comparer behavior from new Dictionary<string, ...>().Comparer.

Finally, we could consider making the entire type hierarchy public and requiring callers to check the types themselves. However, for various reasons including performance and compatibility, this would involve making the following types public: OrdinalComparer, CultureAwareComparer, OrdinalCaseSensitiveComparer, OrdinalIgnoreCaseComparer, NonRandomizedStringEqualityComparer, and GenericEqualityComparer<T>; and it would require us to add properties to each of these types to allow for caller inspection. This would significantly increase our API surface and complicate the scenario. Using helper APIs hanging off of StringComparison allows us to paper over this complexity and present a simpler story for our users.

>> Marked partner blocking (/cc @ReubenBond)

The text was updated successfully, but these errors were encountered:

jkotas · 2021-03-23T17:04:52Z

The compares override Equals(object o) to handle this exact scenario. Why can't we just use that?

jkotas · 2021-03-23T17:08:29Z

Ah, ok - Equals(object o) won't allow you to get CompareOptions, etc.

GrabYourPitchforks · 2021-03-23T17:09:45Z

The compares override Equals(object o) to handle this exact scenario. Why can't we just use that?

EqualityComparer<string>.Default, StringComparer.Ordinal, and NonRandomizedStringEqualityComparer should ostensibly all compare as "equivalent to the ordinal comparer", but they're not strictly equal (nor should they be, really) since they all have slightly different behavior. There's also the CompareOptions issue that you mentioned.

Edit: Somehow I edited your original comment instead of posting my own response. Brilliant. 😑

Edit x2: I think my wording "this comparer is indistinguishable from" is incorrect. It should really be thought of more along the lines of "if you need to instantiate a collection that replicates this behavior, I'll help you discover which StringComparer you can pass in via the ctor in order to get the collection to behave identically."

terrajobst · 2021-03-26T20:41:07Z

Video

Looks good as proposed

namespace System
{
    public abstract class StringComparer
    {
        public static bool IsWellKnownOrdinalComparer(IEqualityComparer<string?>? comparer,
                                                      out bool ignoreCase);

        public static bool IsWellKnownCultureAwareComparer(IEqualityComparer<string?>? comparer,
                                                           [NotNullWhen(true)] out CompareInfo? compareInfo,
                                                           out CompareOptions compareOptions);
    }
}

GrabYourPitchforks added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime blocking Marks issues that we want to fast track in order to unblock other important work labels Mar 22, 2021

GrabYourPitchforks added this to the 6.0.0 milestone Mar 22, 2021

dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 22, 2021

GrabYourPitchforks added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Mar 22, 2021

GrabYourPitchforks removed the untriaged New issue has not been triaged by the area owner label Mar 23, 2021

terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work labels Mar 26, 2021

GrabYourPitchforks self-assigned this Mar 27, 2021

GrabYourPitchforks mentioned this issue Mar 27, 2021

Add IsWellKnownStringComparer methods #50312

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 27, 2021

GrabYourPitchforks closed this as completed in #50312 Mar 31, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 31, 2021

ghost locked as resolved and limited conversation to collaborators Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Proposal: StringComparer.IsWellKnownComparer #50059

API Proposal: StringComparer.IsWellKnownComparer #50059

GrabYourPitchforks commented Mar 22, 2021 •

edited

Loading

jkotas commented Mar 23, 2021 •

edited by GrabYourPitchforks

Loading

jkotas commented Mar 23, 2021

GrabYourPitchforks commented Mar 23, 2021 •

edited

Loading

terrajobst commented Mar 26, 2021 •

edited

Loading

API Proposal: StringComparer.IsWellKnownComparer #50059

API Proposal: StringComparer.IsWellKnownComparer #50059

Comments

GrabYourPitchforks commented Mar 22, 2021 • edited Loading

Scenario

Proposed API

Usage

Behavior and discussion

Alternative designs

jkotas commented Mar 23, 2021 • edited by GrabYourPitchforks Loading

jkotas commented Mar 23, 2021

GrabYourPitchforks commented Mar 23, 2021 • edited Loading

terrajobst commented Mar 26, 2021 • edited Loading

GrabYourPitchforks commented Mar 22, 2021 •

edited

Loading

jkotas commented Mar 23, 2021 •

edited by GrabYourPitchforks

Loading

GrabYourPitchforks commented Mar 23, 2021 •

edited

Loading

terrajobst commented Mar 26, 2021 •

edited

Loading