Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Avoid trimming groupings in GroupBy with result selectors #15367

Closed
wants to merge 1 commit into from

Conversation

jamesqo
Copy link
Contributor

@jamesqo jamesqo commented Jan 21, 2017

AFAICS we can pass the grouping as the enumerable rather than the array, which eliminates the need for an expensive resize. It also prevents users from getting at the underlying array of the grouping by casting the enumerable to a TElement[].

Related: https://github.com/dotnet/corefx/issues/8848

@jamesqo
Copy link
Contributor Author

jamesqo commented Jan 21, 2017

Test Innerloop Ubuntu14.04 Release Build and Test

/cc @JonHanna @VSadov

@JonHanna
Copy link
Contributor

At https://github.com/dotnet/corefx/issues/8848#issuecomment-274222936

would prevent someone from casting to TElement[] within the selector

That's a reason for keeping it as-is. It's been possible to do this since 3.5, so there could be dependencies on this.

and violating Grouping's immutability-after-creation

That's a big reason for wishing it was done like this (or at the very least, that something hid the array from resultSelector) but stopping someone from shooting themselves in the foot is still a breaking change if they could have some reason for shooting near their feet.

It's a good change, with a result that is both better in what it exposes and I would imagine faster in most cases (though perhaps slower on repeated iterations, or inputs that just happen to hit a size threshold perfectly). The question is whether that break is too strong a break.

@JonHanna
Copy link
Contributor

If we exposed the _elements to the selector as an IList<T> then anything that cast to that or a base interface would still work.

More reasonably, any operations (such as those in Enumerable itself) that accepted IEnumerable<T> and then had optimised paths for IList<T> would still have that optimised path. It's not uncommon to use .Count() in result selectors, and this change would make selecting that go from O(1) per grouping to O(n). A private implementation of IList<T> wrapping _elements would prevent that.

@jamesqo
Copy link
Contributor Author

jamesqo commented Jan 21, 2017

It's not uncommon to use .Count() in result selectors, and this change would make selecting that go from O(1) per grouping to O(n).

@JonHanna Grouping implements IList<T>, though. https://github.com/dotnet/corefx/blob/master/src/System.Linq/src/System/Linq/Grouping.cs#L64

That's a reason for keeping it as-is. It's been possible to do this since 3.5, so there could be dependencies on this.

Maybe, but someone who is familiar with functions like GroupBy/fluent in Linq is probably more experienced in .NET. I doubt (s)he'd make the mistake of trying to cast the enumerable to an array rather than calling ToArray.

@JonHanna
Copy link
Contributor

Ah. True that it does implement that. Hurrah.

I doubt (s)he'd make the mistake of trying to cast the enumerable to an array rather than calling ToArray.

If they did it, they'd likely done it as an optimisation. Not a safe optimisation, granted.

@jamesqo
Copy link
Contributor Author

jamesqo commented Jan 22, 2017

Self-note: https://github.com/dotnet/corefx/pull/15365/files#diff-87d47ae56131fbbcc56111336ea8e9e3R247 touches on this when using ToHashSet in this pattern, so depending on if/when this is merged that method may have to be changed, either in this PR or over there.

@@ -247,8 +245,7 @@ public IEnumerable<TResult> ApplyResultSelector<TResult>(Func<TKey, IEnumerable<
do
{
g = g._next;
g.Trim();
yield return resultSelector(g._key, g._elements);
yield return resultSelector(g._key, g);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep Trim here. There could be reasons why it was done. Most likely performance of iterating.
Also, while it is unfortunate to expose the array, there is a risk now that someone took dependency on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another reason might be to separate keys from elements before passing to the selector - that is just to ensure if eventually only elements are retained, they do not hold the keys as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to other cases where Trim is used.
I think we should keep trimming

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, while it is unfortunate to expose the array, there is a risk now that someone took dependency on that.

Is there really? I mean Linq is generally not that high-perf anyways so it's hard for me to imagine someone (purposefully or accidentally) casting the parameter to an array to get better perf.

Another reason might be to separate keys from elements before passing to the selector - that is just to ensure if eventually only elements are retained, they do not hold the keys as well.

Good point. I think we can fix this and still avoid trimming by passing in a new ArraySegment<T>(g._elements, 0, g._count), though.

@stephentoub
Copy link
Member

@VSadov, do you want to continue exploring this or should it be closed?

@VSadov
Copy link
Member

VSadov commented Feb 20, 2017

I am not sure this is worth pursuing. The problem to solve here is not that big.
The solutions proposed have concerns one way or another and there is a compat consideration.

It seems it is better to just leave this as it is

@jamesqo jamesqo deleted the grouping.notrim branch February 22, 2017 04:07
@karelz karelz modified the milestone: 2.0.0 Feb 22, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
7 participants