New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should ToArray methods be allowed to return Array.Empty<T>()? #2363

Closed
stephentoub opened this Issue Jul 15, 2015 · 13 comments

Comments

@stephentoub
Member

stephentoub commented Jul 15, 2015

We've been trying to use Array.Empty<T> in most places where we would otherwise construct a new T[0]. Some places we've explicitly tried to stay away from changing are ToArray methods, in particular because of an assumption that there's an expection that these methods always produce a new array, e.g. with MSDN documentation that states things like "Creates an array", "Copies the elements to a new array", etc.

The question is: should we relax that and instead use Array.Empty<T> in ToArray methods as well?

Several months back, before the corefx libraries were on GitHub, I'd done a sweep through them doing hundreds of empty array replacements. In general I stayed away from changing anything that looked like it was risky, e.g. ToArray methods, places where the arrays were obviously used for reference equality, etc. I apparently was a little over aggressive with regards to my own guidelines, and ended up changing some of the non-generic collection's ToArray methods to use Array.Empty, though.

Personally I'm tempted to say that ToArray methods should be able to use Array.Empty, and that we should update the generic collections and LINQ to do so, but that since the non-generic collections exist primarily for compatibility reasons, we should revert the changes to those ToArray methods.

Thoughts?

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Jul 15, 2015

Collaborator

I was half-way to writing this exact issue last night when a crying baby took priority, and I say yes.

I don't think reference uniqueness is something that should be inferred from any API method's result unless:

  1. The object is mutable.
  2. The documentation explicitly says that it is unique (rather than just language that implies it like "creates").

In any case of an immutable reference type, I think using a pre-existing object is not just a reasonable optimisation to take, but a reasonable optimisation to expect that other code may or may not take. (And by extension therefore, it is unreasonable to expect the same policy from version to version and build a dependency on it).

I'd note that there have been some changes about just what returns string.Empty over past frameworks, and while they seem to have confused some people in a general "why is that?" level, not stymied any.

Collaborator

JonHanna commented Jul 15, 2015

I was half-way to writing this exact issue last night when a crying baby took priority, and I say yes.

I don't think reference uniqueness is something that should be inferred from any API method's result unless:

  1. The object is mutable.
  2. The documentation explicitly says that it is unique (rather than just language that implies it like "creates").

In any case of an immutable reference type, I think using a pre-existing object is not just a reasonable optimisation to take, but a reasonable optimisation to expect that other code may or may not take. (And by extension therefore, it is unreasonable to expect the same policy from version to version and build a dependency on it).

I'd note that there have been some changes about just what returns string.Empty over past frameworks, and while they seem to have confused some people in a general "why is that?" level, not stymied any.

@JohanLarsson

This comment has been minimized.

Show comment
Hide comment
@JohanLarsson

JohanLarsson Jul 15, 2015

I vote for performance and updated documentation, starve the GC.

JohanLarsson commented Jul 15, 2015

I vote for performance and updated documentation, starve the GC.

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Jul 15, 2015

Collaborator

cc @James-Ko since they've done the work agreement here would allow for.

Collaborator

JonHanna commented Jul 15, 2015

cc @James-Ko since they've done the work agreement here would allow for.

@rickbrew

This comment has been minimized.

Show comment
Hide comment
@rickbrew

rickbrew Jul 15, 2015

I'm highly in favor of doing this. I've always needed to have a ToArrayEx extension method (lovely name, right?) that does this (it does a few other things too). You'd be surprised how many 0-length arrays are lying around if you don't do anything to consolidate them.

In particular it makes it much cheaper to have something like an immutable tree data structure where each node can have a list of attributes or children, but where many nodes have 0 of those. Instead of scattering null checks everywhere (or some other non-terse pattern), just have a constructor that takes your usual IEnumerable and call ToArray() to copy it. So it's good for performance as well as readability and maintainability. And on a mobile platforms you'll have fewer objects and smaller code size, which is really important.

By baking this into a core extension method, it will propagate and encourage good habits and performance. If someone really needs unique object identity then they can do their own new T[0] or write their own ToUniqueArray() extension method.

It may also be worth taking this pattern further and ensuring that methods such as AsReadOnly() employ a cached 0-length ReadOnlyCollection.

I'd also like to say that this seems like a very good place to test out a policy of "if no one screams then ship it." This is a trivial change to make, and also a trivial one to undo if it really becomes necessary.

rickbrew commented Jul 15, 2015

I'm highly in favor of doing this. I've always needed to have a ToArrayEx extension method (lovely name, right?) that does this (it does a few other things too). You'd be surprised how many 0-length arrays are lying around if you don't do anything to consolidate them.

In particular it makes it much cheaper to have something like an immutable tree data structure where each node can have a list of attributes or children, but where many nodes have 0 of those. Instead of scattering null checks everywhere (or some other non-terse pattern), just have a constructor that takes your usual IEnumerable and call ToArray() to copy it. So it's good for performance as well as readability and maintainability. And on a mobile platforms you'll have fewer objects and smaller code size, which is really important.

By baking this into a core extension method, it will propagate and encourage good habits and performance. If someone really needs unique object identity then they can do their own new T[0] or write their own ToUniqueArray() extension method.

It may also be worth taking this pattern further and ensuring that methods such as AsReadOnly() employ a cached 0-length ReadOnlyCollection.

I'd also like to say that this seems like a very good place to test out a policy of "if no one screams then ship it." This is a trivial change to make, and also a trivial one to undo if it really becomes necessary.

@justinvp

This comment has been minimized.

Show comment
Hide comment
@justinvp

justinvp Jul 15, 2015

Collaborator

👍

Collaborator

justinvp commented Jul 15, 2015

👍

@jamesqo

This comment has been minimized.

Show comment
Hide comment
@jamesqo

jamesqo Jul 15, 2015

Contributor

I do think that method names like ToArray seem to imply that you are creating a new array, but the average person probably wouldn't test for this in code. I rarely call methods like ToArray, ToList, etc., and when I do, I call them only once (no comparisons between two results of ToArray). Also, there doesn't seem to be a good reason to test for reference equality between arrays- they're just containers for other objects. I don't see a practical place where this could happen.


Edit: If this passes, I'm wondering if the same logic could be applied to other To... methods in Enumerable. Should we check for

if (source is T[]) return (T[])source;

in ToArray, and similarly in ToList, to avoid creating a new array if it already is one? To my knowledge lists don't have any side-effects when you enumerate through them, so I'm wondering if that works as well. Obviously this might be less sound than returning Array.Empty for empty arrays, but I still haven't seen two arrays being tested for reference equality.

Contributor

jamesqo commented Jul 15, 2015

I do think that method names like ToArray seem to imply that you are creating a new array, but the average person probably wouldn't test for this in code. I rarely call methods like ToArray, ToList, etc., and when I do, I call them only once (no comparisons between two results of ToArray). Also, there doesn't seem to be a good reason to test for reference equality between arrays- they're just containers for other objects. I don't see a practical place where this could happen.


Edit: If this passes, I'm wondering if the same logic could be applied to other To... methods in Enumerable. Should we check for

if (source is T[]) return (T[])source;

in ToArray, and similarly in ToList, to avoid creating a new array if it already is one? To my knowledge lists don't have any side-effects when you enumerate through them, so I'm wondering if that works as well. Obviously this might be less sound than returning Array.Empty for empty arrays, but I still haven't seen two arrays being tested for reference equality.

@bbowyersmyth

This comment has been minimized.

Show comment
Hide comment
@bbowyersmyth

bbowyersmyth Jul 15, 2015

Contributor

I'm in favour of ToArray returning a cached empty array. I would shy away from making any behaviour changes to the legacy collection types though. Anyone still using them likely has compatibility high on their agenda.

Contributor

bbowyersmyth commented Jul 15, 2015

I'm in favour of ToArray returning a cached empty array. I would shy away from making any behaviour changes to the legacy collection types though. Anyone still using them likely has compatibility high on their agenda.

@rickbrew

This comment has been minimized.

Show comment
Hide comment
@rickbrew

rickbrew Jul 15, 2015

@James-Ko, you can't just return the same array back to the caller if it has 1+ items in it because one of the most important use cases for ToArray() and ToList() is specifically making a copy so you can make changes without modifying the original. You can't change the length of an array, and there's no items you can modify within a zero-length array, therefore all zero-length arrays are interchangeable. Zero-count Lists aren't interchangeable because you can modify them, e.g. by calling Add().

What you're thinking of could be articulated as something like AsArrayOrToArray(): if the input is an array, then give it back to me, otherwise call ToArray and give me its result. It's certainly easy enough to implement but I'm not sure it needs to be in CoreFX because the return value creates two drastically different outcomes depending on what the input is, and it's not clear which one you've got just by inspecting the code locally ("did I get a copy or was this basically just a cast?"). It is a useful method (I have it in my code!) but you do have to be careful and intentional about using it.

rickbrew commented Jul 15, 2015

@James-Ko, you can't just return the same array back to the caller if it has 1+ items in it because one of the most important use cases for ToArray() and ToList() is specifically making a copy so you can make changes without modifying the original. You can't change the length of an array, and there's no items you can modify within a zero-length array, therefore all zero-length arrays are interchangeable. Zero-count Lists aren't interchangeable because you can modify them, e.g. by calling Add().

What you're thinking of could be articulated as something like AsArrayOrToArray(): if the input is an array, then give it back to me, otherwise call ToArray and give me its result. It's certainly easy enough to implement but I'm not sure it needs to be in CoreFX because the return value creates two drastically different outcomes depending on what the input is, and it's not clear which one you've got just by inspecting the code locally ("did I get a copy or was this basically just a cast?"). It is a useful method (I have it in my code!) but you do have to be careful and intentional about using it.

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Jul 16, 2015

Collaborator

Should we check for

if (source is T[]) return (T[])source;

in ToArray, and similarly in ToList,

Definitely not. The optimisation of replacing a new T[0] with Array.Empty<T>() is one of the general class of aliasing different cases of equivalent immutable types together.

Non-empty arrays, lists, and most other collections are not immutable, so such a use would always be a bug. (Internal ToLists can benefit from such code but they put the onus on the calling code not to mutate which can't be enforced by a public API and reduces usefulness if it could).

More generally, it's worth noting that replacing new T[0] with Array.Empty<T>() is not really an optimisation at all; it's almost always a pure improvement because it has up-sides (faster, less memory use) but no down-side (assuming we do agree that expecting reference difference isn't reasonable).

There are some cases though where the use is more of a question of optimisation for one quality over another, or for performance in one case over another.

An optimisation would be one that introduced a _size == 0 case where one had not existed before; costing non-empty cases a slight penalty to gain empty-cases a benefit. (Conversely, you could remove the _size == 0 case from Queue<T>.ToArray() for the opposite trade-off). Such introductions are less no-brainers than where _size == 0 or similar already exists, with the benefit depending not just on the relative improvement, but the relative frequency of empty to non-empty cases.

I think that ToArray() and indeed all other API methods should be free to return Array.Empty<T>() instead of T[0]. I also feel they should be free to switch from Array.Empty<T>() to T[0] unless documented specifically as returning Array.Empty<T>(). Whether they should in a given case is another matter. Queue<T>.ToArray() follows the empty test with another branch and at least one method call, perhaps two, so it's probably worth keeping the size == 0 test there. With a ToArray() that after creating the array went straight into a for loop that terminated on i == _size the trade-offs are less clear-cut.

Collaborator

JonHanna commented Jul 16, 2015

Should we check for

if (source is T[]) return (T[])source;

in ToArray, and similarly in ToList,

Definitely not. The optimisation of replacing a new T[0] with Array.Empty<T>() is one of the general class of aliasing different cases of equivalent immutable types together.

Non-empty arrays, lists, and most other collections are not immutable, so such a use would always be a bug. (Internal ToLists can benefit from such code but they put the onus on the calling code not to mutate which can't be enforced by a public API and reduces usefulness if it could).

More generally, it's worth noting that replacing new T[0] with Array.Empty<T>() is not really an optimisation at all; it's almost always a pure improvement because it has up-sides (faster, less memory use) but no down-side (assuming we do agree that expecting reference difference isn't reasonable).

There are some cases though where the use is more of a question of optimisation for one quality over another, or for performance in one case over another.

An optimisation would be one that introduced a _size == 0 case where one had not existed before; costing non-empty cases a slight penalty to gain empty-cases a benefit. (Conversely, you could remove the _size == 0 case from Queue<T>.ToArray() for the opposite trade-off). Such introductions are less no-brainers than where _size == 0 or similar already exists, with the benefit depending not just on the relative improvement, but the relative frequency of empty to non-empty cases.

I think that ToArray() and indeed all other API methods should be free to return Array.Empty<T>() instead of T[0]. I also feel they should be free to switch from Array.Empty<T>() to T[0] unless documented specifically as returning Array.Empty<T>(). Whether they should in a given case is another matter. Queue<T>.ToArray() follows the empty test with another branch and at least one method call, perhaps two, so it's probably worth keeping the size == 0 test there. With a ToArray() that after creating the array went straight into a for loop that terminated on i == _size the trade-offs are less clear-cut.

@omariom

This comment has been minimized.

Show comment
Hide comment
@omariom

omariom Jul 16, 2015

Contributor

I never assume reference equality of results of any methods I don't control.
👍 to Array.Empty

Contributor

omariom commented Jul 16, 2015

I never assume reference equality of results of any methods I don't control.
👍 to Array.Empty

@stephentoub

This comment has been minimized.

Show comment
Hide comment
@stephentoub

stephentoub Jul 17, 2015

Member

@terrajobst, good topic for next design meeting?

Member

stephentoub commented Jul 17, 2015

@terrajobst, good topic for next design meeting?

@MrJul

This comment has been minimized.

Show comment
Hide comment
@MrJul

MrJul Aug 30, 2015

Considering that even some string constructors return cached objects today, I think that sets a good precedence for allowing BCL methods to return cached immutable objects in general.

ReferenceEquals(new string(new char[0]), String.Empty) is true.

MrJul commented Aug 30, 2015

Considering that even some string constructors return cached objects today, I think that sets a good precedence for allowing BCL methods to return cached immutable objects in general.

ReferenceEquals(new string(new char[0]), String.Empty) is true.

@terrajobst

This comment has been minimized.

Show comment
Hide comment
@terrajobst

terrajobst Jan 21, 2016

Member

We discussed this offline and concluded that we should do the following:

  • We make this change to .NET Core without any quirking. Customers using a newer package will get the new behavior.
  • We bring this change to .NET Framework, but we’ll quirk it. Only customers targeting the latest version will get the new behavior.
Member

terrajobst commented Jan 21, 2016

We discussed this offline and concluded that we should do the following:

  • We make this change to .NET Core without any quirking. Customers using a newer package will get the new behavior.
  • We bring this change to .NET Framework, but we’ll quirk it. Only customers targeting the latest version will get the new behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment