Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal: StringComparer.IsWellKnownComparer #50059

Closed
GrabYourPitchforks opened this issue Mar 22, 2021 · 4 comments · Fixed by #50312
Closed

API Proposal: StringComparer.IsWellKnownComparer #50059

GrabYourPitchforks opened this issue Mar 22, 2021 · 4 comments · Fixed by #50312
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Mar 22, 2021

Scenario

When serializing Dictionary<string, ...>, HashSet<string>, and similar collection types, the serializer may need to know what comparer is in use so that it can include that information in the payload. This is normally provided by ISerializable.GetObjectData, but that interface is intended for serializers that embed full type information inside the payload - a practice we strongly discourage.

Even though APIs like Dictionary<,>.Comparer and HashSet<>.Comparer are public, there's no good way to inspect the returned instance and see what comparison is being used under the covers. This API proposal provides a way to perform this inspection without relying on BinaryFormatter or related infrastructure.

Proposed API

namespace System
{
    public abstract class StringComparer
    {
        // new proposed APIs

        public static bool IsWellKnownOrdinalComparer(IEqualityComparer<string?>? comparer, out bool ignoreCase);
        public static bool IsWellKnownCultureAwareComparer(IEqualityComparer<string?>? comparer, [NotNullWhen(true)] out CompareInfo? compareInfo, out CompareOptions compareOptions);
    }
}

Usage

Dictionary<string, Foo> dict = GetDictionary();
StringComparer? reconstructedComparer = null;

if (StringComparer.IsWellKnownOrdinalComparer(dict.Comparer, out bool ignoreCase))
{
    if (ignoreCase)
    {
        Console.WriteLine("Using StringComparer.OrdinalIgnoreCase.");
        reconstructedComparer = StringComparer.OrdinalIgnoreCase;
    }
    else
    {
        Console.WriteLine("Using StringComparer.Ordinal.");
        reconstructedComparer = StringComparer.Ordinal;
    }
}
else if (StringComparer.IsWellKnownCultureAwareComparer(dict.Comparer, out CompareInfo compareInfo, out CompareOptions compareOptions))
{
    Console.WriteLine($"Using culture-aware comparer for culture '{compareInfo.Name}'.");
    reconstructedComparer = compareInfo.GetStringComparer(compareOptions);
}

if (reconstructedComparer is null)
{
    Console.WriteLine("Unknown comparer.");
}

Behavior and discussion

Between the returned ignoreCase, compareInfo, and compareOptions values, enough information is provided to reconstruct an equivalent StringComparer instance. It is possible that both of the IsWellKnownComparer APIs will return false; e.g., if a completely custom comparer is in use, or if a new comparer type is added in a future release. Callers must be resilient against these possibilities and should implement a graceful fallback mechanism.

It is possible for a call to IsWellKnownOrdinalComparer to return true, even if the instance being queried isn't the exact same object reference as either StringComparer.Ordinal or StringComparer.OrdinalIgnoreCase. For example, the singleton EqualityComparer<string>.Default is a well-known ordinal comparer, even though it is not an instance of StringComparer. There are additionally some edge cases (primarily involving BinaryFormatter or DataContractSerializer) where the runtime will instantiate a new ordinal StringComparer rather than use the singleton StringComparer.Ordinal. The proposed IsWellKnownOrdinalComparer method only describes whether the queried instance's behavior is indistinguishable from one of those built-in singleton instances.

Technically, EqualityComparer<string>.Default is still distinguishable from StringComparer.Ordinal, as EqualityComparer<string?>.Default.GetHashCode((string?)null) will return 0, and StringComparer.Ordinal.GetHashCode((string?)null) will throw an exception. However, the typical use case for GetHashCode(T) is to generate a hash code for a non-null key for insertion into a bucketed collection. So it's good enough for our purposes and we'll just treat them as indistinguishable to make life easier for our callers.

The IsWellKnownOrdinalComparer method will return false when given null, EqualityComparer<object>.Default, or similar, as these are object comparers, not string comparers. It also avoids the slippery slope problem of trying to special-case allowing EqualityComparer<TInterface>.Default for any TInterface that string implements.

Security notes

Deserializers should use caution when reading comparer information from any serialized payload. Nominally, the application itself retains jurisdiction over the behavior of the collection. That is, the application calls the collection ctor with an appropriate comparer matched to the business logic, and the deserializer populates the collection based on the contents of the incoming payload. Allowing the deserialized payload to control the active comparer may result in the creation and population of a collection whose behavior is mismatched to the app's expected business logic. This could cause unexpected behaviors or security holes within the receiving application. Deserializers should only honor any incoming comparer information if the payload is coming from a trusted source.

Additionally, applications should avoid calling CompareInfo.GetCompareInfo(string) with untrusted inputs. This can result in resource exhaustion (a DoS attack). If an application absolutely must read culture information from an incoming payload, it should compare the culture string (e.g., "en-US") against an allow list before creating a CompareInfo around that string. (This same guideline applies to CultureInfo more generally.)

Alternative designs

I had considered simply returning an instance of the StringComparison enum from this API, but this will not work given Orleans's core scenario of serialization. The linguistic enum values of StringComparison are dependent on the ambient environment (CultureInfo.CurrentCulture), so there's no guarantee that two different machines will see the same behavior when instantiating a collection around a culture-aware comparison.

In an ideal world we'd be able to use discriminated unions (see dotnet/csharplang#113), and the API could be defined as below.

enum class StringComparerInfo
{
    Ordinal(bool ignoreCase),
    CultureAware(CompareInfo compareInfo, CompareOptions compareOptions),
}

namespace System
{
    public abstract class StringComparer
    {
        public static bool IsWellKnownComparer(IEqualityComparer<string?>? comparer, [NotNullWhen(true)] out StringComparerInfo? info);
    }
}

In the absence of this language feature, the cleanest design seemed to be splitting the two queries "are you ordinal?" and "are you linguistic?" across two separate methods.

Finally, we could consider making these new APIs instance methods on StringComparer, as shown below.

namespace System
{
    public abstract class StringComparer
    {
        public bool IsWellKnownOrdinalComparer(out bool ignoreCase);
        public bool IsWellKnownCultureAwareComparer([NotNullWhen(true)] out CompareInfo? compareInfo, out CompareOptions compareOptions);
    }
}

The upside to this is that it doesn't introduce a static API, which usability studies have shown are difficult for consumers to use. However, it doesn't allow passing EqualityComparer<string>.Default, whose return value is not in the StringComparer type hierarchy. This means that there would be no easy way to retrieve the comparer behavior from new Dictionary<string, ...>().Comparer.

Finally, we could consider making the entire type hierarchy public and requiring callers to check the types themselves. However, for various reasons including performance and compatibility, this would involve making the following types public: OrdinalComparer, CultureAwareComparer, OrdinalCaseSensitiveComparer, OrdinalIgnoreCaseComparer, NonRandomizedStringEqualityComparer, and GenericEqualityComparer<T>; and it would require us to add properties to each of these types to allow for caller inspection. This would significantly increase our API surface and complicate the scenario. Using helper APIs hanging off of StringComparison allows us to paper over this complexity and present a simpler story for our users.

>> Marked partner blocking (/cc @ReubenBond)

@GrabYourPitchforks GrabYourPitchforks added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime blocking Marks issues that we want to fast track in order to unblock other important work labels Mar 22, 2021
@GrabYourPitchforks GrabYourPitchforks added this to the 6.0.0 milestone Mar 22, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 22, 2021
@GrabYourPitchforks GrabYourPitchforks added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Mar 22, 2021
@jkotas
Copy link
Member

jkotas commented Mar 23, 2021

The compares override Equals(object o) to handle this exact scenario. Why can't we just use that?

@jkotas
Copy link
Member

jkotas commented Mar 23, 2021

Ah, ok - Equals(object o) won't allow you to get CompareOptions, etc.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 23, 2021

The compares override Equals(object o) to handle this exact scenario. Why can't we just use that?

EqualityComparer<string>.Default, StringComparer.Ordinal, and NonRandomizedStringEqualityComparer should ostensibly all compare as "equivalent to the ordinal comparer", but they're not strictly equal (nor should they be, really) since they all have slightly different behavior. There's also the CompareOptions issue that you mentioned.

Edit: Somehow I edited your original comment instead of posting my own response. Brilliant. 😑

Edit x2: I think my wording "this comparer is indistinguishable from" is incorrect. It should really be thought of more along the lines of "if you need to instantiate a collection that replicates this behavior, I'll help you discover which StringComparer you can pass in via the ctor in order to get the collection to behave identically."

@GrabYourPitchforks GrabYourPitchforks removed the untriaged New issue has not been triaged by the area owner label Mar 23, 2021
@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work labels Mar 26, 2021
@terrajobst
Copy link
Member

terrajobst commented Mar 26, 2021

Video

  • Looks good as proposed
namespace System
{
    public abstract class StringComparer
    {
        public static bool IsWellKnownOrdinalComparer(IEqualityComparer<string?>? comparer,
                                                      out bool ignoreCase);

        public static bool IsWellKnownCultureAwareComparer(IEqualityComparer<string?>? comparer,
                                                           [NotNullWhen(true)] out CompareInfo? compareInfo,
                                                           out CompareOptions compareOptions);
    }
}

@GrabYourPitchforks GrabYourPitchforks self-assigned this Mar 27, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 27, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 31, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants