-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Ensure the selector gets run during Count. #14435
Conversation
if (!onlyIfCheap) | ||
{ | ||
int end = _minIndexInclusive + count; | ||
for (int i = _minIndexInclusive; i != end; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An issue came up here that I wasn't sure best how to approach:
-
lazyEnumerable.Select(i => i).Skip(1).Count()
runs the selectorlazyEnumerable.Count()
times, becauseSelect.Skip
on a lazy enumerable isn't specially recognized and the selector gets run on the first item. -
list.Select(i => i).Skip(1).Count()
is specially recognized, however, and it returns aSelectListPartitionIterator
which does not run the selector on the first item.
One way to fix this would be to start from 0
instead of _minIndexInclusive
here. However, if we do that, we break Skip(1).Select(i => i)
which also ends up here; patterns like those should definitely not run the selector on the first item.
Ideally, we would somehow have a way to differentiate if Skip
or Select
was called first from within the iterator, and start from _minIndexInclusive
or 0
accordingly. But then we might need to add an extra field...
cc @JonHanna
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to think that we don't care.
A scenario that was called out as important is someone calling Count() on a Select result specifically to trigger side effects in selectors. (Not a sound practice IMO, but that's another matter). Such a use would be stymied by optimisations that skipped the selectors, and so we avoid such optimisation.
A user who skips something has indicated indifference to that thing. As such I'm inclined to think it doesn't matter whether we run n or n-1 selectors. Indeed, I'm happy running 0 in this case and just calculating what the result of Count() would be.
Others may not be as willing to go with quite so observable a difference to .Net4.6 Framework behaviour though. TBH if this was my PR I'd be taking the fastest route but prepared to back down if I failed to convince on that point.
Assert.Equal(source.Count(), timesRun); | ||
} | ||
|
||
// [Theory] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabled currently because the first assert is giving inconsistent results. See comment above
@@ -226,6 +255,14 @@ public List<TResult> ToList() | |||
|
|||
public int GetCount(bool onlyIfCheap) | |||
{ | |||
if (!onlyIfCheap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this is obscure and would likely benefit from being commented on. Without context this looks like pointless busy work that should be deleted to improve efficiency.
return _source.Count; | ||
int count = _source.Count; | ||
|
||
if (!onlyIfCheap) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, your reason for doing this isn't obvious from the code alone, so should be commented on. And likely elsewhere, so I won't call out other cases.
ca5307f
to
5d629d1
Compare
{ | ||
_selector(item); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, if we just returned -1
in this case then the calling Count()
method would do pretty much the above. I would imagine this would be slightly faster, but only slightly (that is just a guess though). Do we need the extra code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JonHanna During #12703 when I had optimized Where.Select
and the issue of running these selectors had come up, I had originally had a EnumerableHelpers.Count
function that Enumerable.Count
would call after checking for Linq interfaces (just like what ToArray
does today). This was the code I had written for the iterators
// Leave it to Count to iterate through us
public int GetCount(bool onlyIfCheap) => onlyIfCheap ? -1 : EnumerableHelpers.Count(this);
However, @stephentoub argued against this. See here for context: #12703 (comment) I ended up writing everything inline for GetCount
in those iterators. So I just employed the same strategy here.
I would imagine this would be slightly faster, but only slightly
Virtual method calls are pretty expensive; going from 2 -> 3 virtual method calls (MoveNext
& Current
to MoveNext
, Current
& MoveNext
) should probably be more than half of a 33% difference. I haven't measured either, but I'm not sure if it would be wise to regress perf here regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
for (int i = 0; i < count; i++) | ||
{ | ||
_selector(_source[i]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, could we just return -1
here and let the caller do this?
I'm passing the buck on the matter you raised, but aside from that, LG2M. |
Test Innerloop CentOS7.1 Release Build and Test |
Nit: Please change title, this one won't be useful in git history ... |
The change itself - LGTM |
Enetered https://github.com/dotnet/corefx/issues/14729 to follow up with the Skip, or more precisely the behavior of Partition. |
Please consider weighing on the "side issue" if you think it is a good question. |
Is there any chance of these changes to be rolled back in the future? I came across this PR while researching LINQ performance and it seems a shame that these opimisations still cannot be enabled because of bad code that relies on Is there scope to add an additional LINQ API to force enumeration to give codebases that rely on this behaviour an easy out, while the general case can benefit? Alternatively, is there any scope for these optimisations to be enabled if LINQ is used with static anonymous functions that are less likely to have side effects? |
@crozone We do not monitor closed PRs/bug, esp. in dotnet/corefx repo which is now used only for servicing. I would recommend to open a question on dotnet/runtime repo and link back to this PR. |
Ensure the selector gets run during Count. Commit migrated from dotnet/corefx@b879dc0
Select
does not change the count of an enumerable, so previously we made an optimization where ifCount()
was called we would bypass running the selector altogether and iterate directly through the source. This commit undoes that and makes sure we always run the selector ifonlyIfCheap
is false.Fixes #13910
cc @JonHanna, @stephentoub, @VSadov