New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex collections should implement generic collection interfaces #271

Closed
justinvp opened this Issue Dec 16, 2014 · 51 comments

Comments

@justinvp
Copy link
Collaborator

justinvp commented Dec 16, 2014

CaptureCollection, GroupCollection, and MatchCollection currently only implement the non-generic ICollection interface. These collections should implement the generic collection interfaces to better interoperate with more modern APIs, such as LINQ. Since these collections are already indexable, they should implement IList<T> and IReadOnlyList<T>, as well as the non-generic IList (to be consistent with the generic interfaces).

Rationale and Usage

This is certainly a nice-to-have, but it is a long-standing request that developers still ask about. Implementing the generic interfaces will allow these collections to be used more easily with LINQ and interoperate better with more modern framework and library APIs.

For example, to use these collections with LINQ right now you have to know about and remember to use Enumerable.Cast<TSource>() to cast the non-generic IEnumerable into an IEnumerable<T>:

var captures =
    from capture in match.Groups.Cast<Group>().Last().Captures.Cast<Capture>()
    select capture.Value;

With these changes you'd no longer have to do that:

var captures =
    from capture in match.Groups.Last().Captures
    select capture.Value;

Plus, in the above example, you'd get a performance improvement when using Enumerable.Last<TSource>.() as its implementation has a fast-path for collections that implement IList<T>.

Proposed API

// DebuggerDisplay and DebuggerTypeProxy added
[DebuggerDisplay("Count = {Count}")]
[DebuggerTypeProxy(typeof(RegexCollectionDebuggerProxy<Capture>))]
// Previously only implemented ICollection
public class CaptureCollection : IList<Capture>, IReadOnlyList<Capture>, IList
{
    // Existing members
    public int Count { get; }
    public Capture this[int i] { get; }
    public IEnumerator GetEnumerator();
    object ICollection.SyncRoot { get; }
    bool ICollection.IsSynchronized { get; }
    void ICollection.CopyTo(Array array, int arrayIndex);

    // Proposed members
    public void CopyTo(Capture[] array, int arrayIndex);
    IEnumerator<Capture> IEnumerable<Capture>.GetEnumerator();
    int IList<Capture>.IndexOf(Capture item);
    void IList<Capture>.Insert(int index, Capture item);
    void IList<Capture>.RemoveAt(int index);
    Capture IList<Capture>.this[int index] { get; set; }
    void ICollection<Capture>.Add(Capture item);
    void ICollection<Capture>.Clear();
    bool ICollection<Capture>.Contains(Capture item);
    bool ICollection<Capture>.IsReadOnly { get; }
    bool ICollection<Capture>.Remove(Capture item);
    int IList.Add(object value);
    void IList.Clear();
    bool IList.Contains(object value);
    int IList.IndexOf(object value);
    IList.Insert(int index, object value);
    bool IList.IsFixedSize { get; }
    bool IList.IsReadOnly { get; }
    void IList.Remove(object value);
    void IList.RemoveAt(int index);
    object IList.this[int index] { get; set; }
}

// DebuggerDisplay and DebuggerTypeProxy added
[DebuggerDisplay("Count = {Count}")]
[DebuggerTypeProxy(typeof(RegexCollectionDebuggerProxy<Group>))]
// Previously only implemented ICollection
public class GroupCollection : IList<Group>, IReadOnlyList<Group>, IList
{
    // Existing members
    public int Count { get; }
    public Group this[int groupnum] { get; }
    public Group this[String groupname] { get; }
    public IEnumerator GetEnumerator();
    object ICollection.SyncRoot { get; }
    bool ICollection.IsSynchronized { get; }
    void ICollection.CopyTo(Array array, int arrayIndex);

    // Proposed members
    public void CopyTo(Group[] array, int arrayIndex);
    IEnumerator<Group> IEnumerable<Group>.GetEnumerator();
    int IList<Group>.IndexOf(Group item);
    void IList<Group>.Insert(int index, Group item);
    void IList<Group>.RemoveAt(int index);
    Group IList<Group>.this[int index] { get; set; }
    void ICollection<Group>.Add(Group item);
    void ICollection<Group>.Clear();
    bool ICollection<Group>.Contains(Group item);
    bool ICollection<Group>.IsReadOnly { get; }
    bool ICollection<Group>.Remove(Group item);
    int IList.Add(object value);
    void IList.Clear();
    bool IList.Contains(object value);
    int IList.IndexOf(object value);
    IList.Insert(int index, object value);
    bool IList.IsFixedSize { get; }
    bool IList.IsReadOnly { get; }
    void IList.Remove(object value);
    void IList.RemoveAt(int index);
    object IList.this[int index] { get; set; }
}

// DebuggerDisplay and DebuggerTypeProxy added
[DebuggerDisplay("Count = {Count}")]
[DebuggerTypeProxy(typeof(RegexCollectionDebuggerProxy<Match>))]
// Previously only implemented ICollection
public class MatchCollection : IList<Match>, IReadOnlyList<Match>, IList
{
    // Existing members
    public int Count { get; }
    public virtual Match this[int i] { get; }
    public IEnumerator GetEnumerator();
    object ICollection.SyncRoot { get; }
    bool ICollection.IsSynchronized { get; }
    void ICollection.CopyTo(Array array, int arrayIndex);

    // Proposed members
    public void CopyTo(Match[] array, int arrayIndex);
    IEnumerator<Match> IEnumerable<Match>.GetEnumerator();
    int IList<Match>.IndexOf(Match item);
    void IList<Match>.Insert(int index, Match item);
    void IList<Match>.RemoveAt(int index);
    Match IList<Match>.this[int index] { get; set; }
    void ICollection<Match>.Add(Match item);
    void ICollection<Match>.Clear();
    bool ICollection<Match>.Contains(Match item);
    bool ICollection<Match>.IsReadOnly { get; }
    bool ICollection<Match>.Remove(Match item);
    int IList.Add(object value);
    void IList.Clear();
    bool IList.Contains(object value);
    int IList.IndexOf(object value);
    IList.Insert(int index, object value);
    bool IList.IsFixedSize { get; }
    bool IList.IsReadOnly { get; }
    void IList.Remove(object value);
    void IList.RemoveAt(int index);
    object IList.this[int index] { get; set; }
}

Details

  • There was some discussion as to whether only the read-only interfaces should be implemented, or both the read-only and mutable interfaces. The consensus is to implement both the read-only and mutable interfaces. This is consistent with other collections in the framework. The mutable interfaces are implemented as read-only: mutable members are implemented explicitly and throw NotSupportedException (like ReadOnlyCollection<T>).
  • There was an open question as to whether the non-generic IList should be implemented as well. These collections are indexable and if IList<T> and IReadOnlyList<T> are being implemented, IList should be implemented as well. This does add several more members, but they are all implemented explicitly so they don't add any new public members to intellisense, and the implementations are very straightforward.
  • ICollection<T>.CopyTo is implemented implicitly (public).
  • All other new members are implemented explicitly (non-public):
    • Mutable members are implemented explicitly because these collections are read-only and the mutable members throw NotSupportedException (like ReadOnlyCollection<T>).
    • IList members are implemented explicitly to hide non-generic members from intellisense.
    • IList<T>.IndexOf and ICollection<T>.Contains are implemented explicitly because these methods aren't very useful for these collections and should not be visible in intellisense by default. They're not useful because an implementation using EqualityComparer<T>.Default (consistent with other collections) will search the collection using reference equality due to the fact that Capture, Group, and Match do not implement IEquatable<T> and do not override Equals() and GetHashCode(). Further, these types do not have public constructors -- they are created internally by the regex engine, making it very unlikely that you'd want to search for an item in a collection "A" that was obtained from collection "B".
    • IEnumerable<T>.GetEnumerator() must be implemented explicitly because the non-generic IEnumerable.GetEnumerator() is already implemented implicitly and we can't overload on return type. This also precludes returning a struct Enumerator (for better enumeration performance) because changing the return type of the existing method would be a binary breaking change. As a result, you'll still have to specify the type when using foreach (e.g. foreach (Capture capture in captures)); you won't be able to use var (e.g. foreach (var capture in captures)), unfortunately.

Open Questions

  • Should GroupCollection implement IDictionary<string, Group>, IReadOnlyDictionary<string, Group>, and IDictionary? GroupCollection already has a string indexer. Is it worth implementing the dictionary interfaces as part of this? Personally, I'm leaning toward "no" because there isn't a compelling scenario for the dictionary interfaces, and they can always be added in the future when needed.

Pull Request

A PR with the proposed changes is available: #1756

Updates

  • Edited this description to make it more of a speclet, based on the discussion below and the proposed API Review process.
  • Some improvements based on feedback from @sharwell.
  • Fixed existing members.
  • Added IList. These collections are indexable and it would be strange if IList<T> and IReadOnlyList<T> were implemented alongside ICollection but without IList.
  • Added DebuggerDisplay and DebuggerTypeProxy attributes.
  • Made ICollection<T>.CopyTo implicit (public).
@joshfree

This comment has been minimized.

Copy link
Member

joshfree commented Dec 16, 2014

Hey Justin!
We'd be happy to work with you on this API addition in January (things are slowing down right now with the upcoming holidays. 🎄 ).

/cc @terrajobst @KrzysztofCwalina

@sharwell

This comment has been minimized.

Copy link
Member

sharwell commented Dec 17, 2014

Love the idea. I'm not sure how strict the backwards compatibility policy is for this particular library, but I'm going to assume it's pretty high considering how widespread use of this library is. Based on that, I believe the path with the highest chance of success is adding the following interfaces:

Type New Interface(s)
CaptureCollection IReadOnlyList<Capture>
MatchCollection IReadOnlyList<Match>
GroupCollection IReadOnlyList<Group>, IReadOnlyDictionary<string, Group>

In particular, the above intentionally avoid adding ICollection<T>, specifically due to the fact that ICollection<T> does not extend ICollection and therefore runs a risk of causing problems with overload resolution for some users.

Also note that it won't be possible to use the struct-style enumerators like you see for List<T>, etc., because this would also be a breaking change.

@justinvp

This comment has been minimized.

Copy link
Collaborator Author

justinvp commented Dec 17, 2014

@sharwell what overloads from ICollection<T> are you concerned about? All members from ICollection<T> would be implemented explicitly.

public class CaptureCollection : ICollection, ICollection<Capture>, IReadOnlyCollection<Capture>
{
    // Existing...
    public Capture this[int i] { get; }
    public int Count { get; }
    public IEnumerator GetEnumerator();
    Object ICollection.SyncRoot { get; }
    bool ICollection.IsSynchronized { get; }
    void ICollection.CopyTo(Array array, int arrayIndex);

    // Proposed...
    bool ICollection<Capture>.IsReadOnly { get; }
    void ICollection<Capture>.Add(Capture item);
    void ICollection<Capture>.Clear();
    bool ICollection<Capture>.Contains(Capture item);
    void ICollection<Capture>.CopyTo(Capture[] array, int arrayIndex);
    bool ICollection<Capture>.Remove(Capture item);
    IEnumerator<Capture> IEnumerable<Capture>.GetEnumerator();
}
@sharwell

This comment has been minimized.

Copy link
Member

sharwell commented Dec 17, 2014

@justinvp It's not the methods of ICollection<T> I'm worried about. Consider the following two extension methods:

public static void Foo(this ICollection collection)
{
}

public static void Foo<T>(this ICollection<T> collection)
{
}

Currently, if you call captureCollection.Foo(), you'll call the first overload. If you update CaptureCollection to implement ICollection<T>, the code will no longer compile because there is no best match. While a similar scenario is still possible, implementing IReadOnlyList<T> instead of ICollection<T> has two benefits over the above:

  1. These are read-only collections, so the new interfaces are a more accurate representation of the available behavior.
  2. IReadOnlyCollection<T>, which is the interface capable of causing a problem like the above, is less likely to result in an overload conflict even if the only reason is the ~10 year age difference between it and ICollection<T>. In addition, code designed to work with IReadOnlyCollection<T> is often focused on the read-only aspect of these interfaces so is less likely to appear side by side with non-generic, read-write interfaces.
@justinvp

This comment has been minimized.

Copy link
Collaborator Author

justinvp commented Dec 17, 2014

@sharwell Fair enough :)

sharwell added a commit to sharwell/corefx that referenced this issue Dec 17, 2014

Update regular expressions collections to implement IReadOnlyList<T> (f…
…ixes dotnet#271)

* CaptureCollection implements IReadOnlyList<Capture>
* GroupCollection implements IReadOnlyList<Group>
* MatchCollection implements IReadOnlyList<Match>
@justinvp

This comment has been minimized.

Copy link
Collaborator Author

justinvp commented Dec 17, 2014

How common are such overloads in practice?

public static void Foo(this ICollection collection)
{
}

public static void Foo<T>(this ICollection<T> collection)
{
}

These would already be problematic as they are, because you would already run into the "ambiguous call" issue when used with the most commonly used collections (T[], List<T>, Collection<T>, ReadOnlyCollection<T>, etc.), forcing you to always disambiguate the calls.

I realize that for existing collections, such calls would already be disambiguated in order for the the code to compile, and that by implementing ICollection<T> on the regex collections, it would require someone to update their code to disambiguate the calls when upgrading to a new version of the API.

Would this be a binary breaking change or just a potential source breaking change?

By not implementing ICollection<T> (or IList<T>), we're sacrificing being able to easily make use of these collections in places that do operate on an ICollection<T> (or IList<T>) -- places that haven't been updated to use IReadOnlyList<T>.

@sharwell

This comment has been minimized.

Copy link
Member

sharwell commented Dec 17, 2014

That may be true, but you do get all the functionality that operates on IEnumerable<T>. I think that covers all the major cases I've run into personally.

Since I'm not on the team (just a random [hopeful] contributor), I wanted to start with what seemed to be a path of least resistance. Do you have a motivating example for why it should implement additional interfaces?

@justinvp

This comment has been minimized.

Copy link
Collaborator Author

justinvp commented Dec 17, 2014

I agree with you on IEnumerable<T>. I think that alone is a huge improvement!

However, there are places that make use of ICollection<T> and IList<T>, that haven't been updated (yet) to take advantage of IReadOnlyCollection<T> and IReadOnlyList<T>.

For example, LINQ's Enumerable.Last<TSource>() has a fast path for IList<T>:

        public static TSource Last<TSource>(this IEnumerable<TSource> source) {
            if (source == null) throw Error.ArgumentNull("source");
            IList<TSource> list = source as IList<TSource>;
            if (list != null) {
                int count = list.Count;
                if (count > 0) return list[count - 1];
            }
            else {
                using (IEnumerator<TSource> e = source.GetEnumerator()) {
                    if (e.MoveNext()) {
                        TSource result;
                        do {
                            result = e.Current;
                        } while (e.MoveNext());
                        return result;
                    }
                }
            }
            throw Error.NoElements();
        }

Enumerable.Last<TSource>() should probably be updated to check for IReadOnlyList<T>, but I'm sure there are other places that do similar things with ICollection<T> and IList<T> that haven't been updated for the newer read-only collection interfaces.

@davkean

This comment has been minimized.

Copy link
Member

davkean commented Dec 17, 2014

@sharwell This is a compile-time break. However, unless it is proven to be prevalent, is not we consider a breaking change from our stand point (theoretically, any addition could be a breaking change.)

If we do this, I'd like make sure there's no serialization impact - I think there is none[1], but I'd like someone to make explicitly sure there isn't; XML serializer doesn't require an opt in by the type and we have we've broken a few consumers in the past (and backed them out before we shipped).

[1] In this case, I think the XML serializer needs a public "Add" before it will attempt to serialize this, but please confirm.

@sharwell

This comment has been minimized.

Copy link
Member

sharwell commented Dec 17, 2014

Even if IList<T> or ICollection<T> were allowed to be added, I lean against them here mostly because each of these is a read-only collection. Implementing these interfaces would be reminiscent of ReadOnlyCollection<T>, where a large number of members needed to be added only to throw NotSupportedException in the end.


Asterisks make bad footnotes in Markdown¹.

¹ Alt+0185 (number pad only) makes for a good one²
² Alt+0178 is another³
³ So is Alt+0179

😄

@davkean

This comment has been minimized.

Copy link
Member

davkean commented Dec 17, 2014

If we had a time machine and added IReadOnlyList<T> and IReadOnlyCollection<T> back in v1 (and had the mutable versions implement them) I would agree with you. But unfortunately, we've not yet developed such technology (or someone developed it, and then used it to prevent it from being developed), but I digress.

From our API review standpoint however, it is perfectly fine to opt into ICollection<T>/IList<T> for the sole reason of interoperability. Even today, a consumer of these interfaces needs to make a choice; use stronger semantics and clear intentions I get from IReadOnlyCollection<T> but accept that not everyone can interop with me, or accept the weaker semantics by taking ICollection<T> but be able to interop with more things.

@sharwell

This comment has been minimized.

Copy link
Member

sharwell commented Dec 17, 2014

So your vote is for implementing both IList<T> and IReadOnlyList<T> here?

Either way #277 should be a good starting point. 😉

@terrajobst

This comment has been minimized.

Copy link
Member

terrajobst commented Dec 19, 2014

We treat IReadOnlyXxx as views of collections that could be mutable, read-only, or immutable. See this article for more details.

As far the serialization impact goes: I don't think implementing the read-only interfaces make a difference. AFAIK making a type implement IEnumerable or IEnumerable<T> would impact serailization but this change shouldn't.

EDIT

My point is that we should implement both, the read-only interface as well as mutable-interfaces. That's the same thing we did for immutable and it's important for interop with existing APIs.

@davkean

This comment has been minimized.

Copy link
Member

davkean commented Dec 19, 2014

@terrajobst Yep, but my comment was around implementing the mutable versions as well as the readonly versions.

@terrajobst

This comment has been minimized.

Copy link
Member

terrajobst commented Dec 19, 2014

Agreed. I've clarified my comment above.

@sharwell

This comment has been minimized.

Copy link
Member

sharwell commented Dec 19, 2014

@terrajobst Your edit made a huge difference.

@davkean and @terrajobst : Since I haven't worked with you directly, it can be tough to understand from the wording up until that edit when you are simply pointing out a historical observation, and when you are for/against taking a particular action. I took a conservative viewpoint because I've found it to be generally more acceptable for arbitrary 3rd party projects I contribute to. Since you were both for addressing the issue with a more comprehensive solution, it helps to be more clear about it.

@davkean

This comment has been minimized.

Copy link
Member

davkean commented Dec 19, 2014

@sharwell Sorry, these sorts of things will come up in the API review/speclet review, where it will be a little clearer; feedback is usually given in the form of "you must do this", "you must not do that" and "you should consider this". Very similar style to the Framework Design Guidelines themselves.

@sharwell

This comment has been minimized.

Copy link
Member

sharwell commented Dec 19, 2014

If it makes you feel any better, it wasn't long ago (a few months maybe?) that I was about the worst at this aspect. 😯

@justinvp

This comment has been minimized.

Copy link
Collaborator Author

justinvp commented Dec 19, 2014

I edited the description for this issue to make it more of a speclet, based on the discussion so far and the proposed API Review process.

@justinvp

This comment has been minimized.

Copy link
Collaborator Author

justinvp commented Dec 19, 2014

@sharwell In the original description I had said I would work on the implementation for this. I was planning to submit a PR after the issue had been reviewed (per the Contributing guide), but you jumped in with an initial PR. I guess there needs to be a better way of claiming "dibs" on a contribution? Not sure how we'll proceed with this after the issue is tagged "Accepting PRs".

@sharwell

This comment has been minimized.

Copy link
Member

sharwell commented Dec 19, 2014

@justinvp I missed that part. I closed my pull request so it's all yours. I'll reopen it only if you find you aren't going to have time to get it done and post a message back here saying so.

@sharwell

This comment has been minimized.

Copy link
Member

sharwell commented Dec 19, 2014

@justinvp IMO your spec reads well and is comprehensive. The only thing I can even think to change is to clarify that the new explicitly-implemented members are due to overload conflicts (GetEnumerator()) or because they represent methods that modify a collection, but these collections are read-only.

You might also state that methods which change the collection throw NotSupportedException in the same manner as ReadOnlyCollection<T> (though this might be considered obvious).

@justinvp

This comment has been minimized.

Copy link
Collaborator Author

justinvp commented Dec 19, 2014

@sharwell Thanks! I updated the description based on your feedback.

@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment