Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Ensure the selector gets run during Count. #14435

Merged
merged 3 commits into from
Dec 27, 2016
Merged

Conversation

jamesqo
Copy link
Contributor

@jamesqo jamesqo commented Dec 11, 2016

Select does not change the count of an enumerable, so previously we made an optimization where if Count() was called we would bypass running the selector altogether and iterate directly through the source. This commit undoes that and makes sure we always run the selector if onlyIfCheap is false.

Fixes #13910

cc @JonHanna, @stephentoub, @VSadov

if (!onlyIfCheap)
{
int end = _minIndexInclusive + count;
for (int i = _minIndexInclusive; i != end; ++i)
Copy link
Contributor Author

@jamesqo jamesqo Dec 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An issue came up here that I wasn't sure best how to approach:

  • lazyEnumerable.Select(i => i).Skip(1).Count() runs the selector lazyEnumerable.Count() times, because Select.Skip on a lazy enumerable isn't specially recognized and the selector gets run on the first item.

  • list.Select(i => i).Skip(1).Count() is specially recognized, however, and it returns a SelectListPartitionIterator which does not run the selector on the first item.

One way to fix this would be to start from 0 instead of _minIndexInclusive here. However, if we do that, we break Skip(1).Select(i => i) which also ends up here; patterns like those should definitely not run the selector on the first item.

Ideally, we would somehow have a way to differentiate if Skip or Select was called first from within the iterator, and start from _minIndexInclusive or 0 accordingly. But then we might need to add an extra field...

cc @JonHanna

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to think that we don't care.

A scenario that was called out as important is someone calling Count() on a Select result specifically to trigger side effects in selectors. (Not a sound practice IMO, but that's another matter). Such a use would be stymied by optimisations that skipped the selectors, and so we avoid such optimisation.

A user who skips something has indicated indifference to that thing. As such I'm inclined to think it doesn't matter whether we run n or n-1 selectors. Indeed, I'm happy running 0 in this case and just calculating what the result of Count() would be.

Others may not be as willing to go with quite so observable a difference to .Net4.6 Framework behaviour though. TBH if this was my PR I'd be taking the fastest route but prepared to back down if I failed to convince on that point.

Assert.Equal(source.Count(), timesRun);
}

// [Theory]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabled currently because the first assert is giving inconsistent results. See comment above

@@ -226,6 +255,14 @@ public List<TResult> ToList()

public int GetCount(bool onlyIfCheap)
{
if (!onlyIfCheap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is obscure and would likely benefit from being commented on. Without context this looks like pointless busy work that should be deleted to improve efficiency.

return _source.Count;
int count = _source.Count;

if (!onlyIfCheap)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, your reason for doing this isn't obvious from the code alone, so should be commented on. And likely elsewhere, so I won't call out other cases.

{
_selector(item);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, if we just returned -1 in this case then the calling Count() method would do pretty much the above. I would imagine this would be slightly faster, but only slightly (that is just a guess though). Do we need the extra code here?

Copy link
Contributor Author

@jamesqo jamesqo Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonHanna During #12703 when I had optimized Where.Select and the issue of running these selectors had come up, I had originally had a EnumerableHelpers.Count function that Enumerable.Count would call after checking for Linq interfaces (just like what ToArray does today). This was the code I had written for the iterators

// Leave it to Count to iterate through us
public int GetCount(bool onlyIfCheap) => onlyIfCheap ? -1 : EnumerableHelpers.Count(this);

However, @stephentoub argued against this. See here for context: #12703 (comment) I ended up writing everything inline for GetCount in those iterators. So I just employed the same strategy here.


I would imagine this would be slightly faster, but only slightly

Virtual method calls are pretty expensive; going from 2 -> 3 virtual method calls (MoveNext & Current to MoveNext, Current & MoveNext) should probably be more than half of a 33% difference. I haven't measured either, but I'm not sure if it would be wise to regress perf here regardless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

for (int i = 0; i < count; i++)
{
_selector(_source[i]);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, could we just return -1 here and let the caller do this?

@JonHanna
Copy link
Contributor

I'm passing the buck on the matter you raised, but aside from that, LG2M.

@JonHanna
Copy link
Contributor

Test Innerloop CentOS7.1 Release Build and Test

@karelz
Copy link
Member

karelz commented Dec 12, 2016

Nit: Please change title, this one won't be useful in git history ...

@jamesqo jamesqo changed the title Fix #13910 and add tests to ensure the selector gets run during Count. Ensure the selector gets run during Count. Dec 19, 2016
@VSadov
Copy link
Member

VSadov commented Dec 27, 2016

@jamesqo @JonHanna - I think we need an issue to discuss whether Skip is allowed to skip selectors.

To me it is not obvious that Select(s).Skip(10) would be ok to run the same as Skip(10).Select(s)

@VSadov
Copy link
Member

VSadov commented Dec 27, 2016

The change itself - LGTM

@VSadov VSadov merged commit b879dc0 into dotnet:master Dec 27, 2016
@VSadov
Copy link
Member

VSadov commented Dec 27, 2016

Enetered https://github.com/dotnet/corefx/issues/14729 to follow up with the Skip, or more precisely the behavior of Partition.

@jamesqo jamesqo deleted the select-count branch December 28, 2016 01:28
@karelz karelz modified the milestone: 2.0.0 Jan 3, 2017
@crokusek
Copy link

Please consider weighing on the "side issue" if you think it is a good question.

https://stackoverflow.com/questions/32000607/is-it-bad-practice-to-purposely-rely-on-linq-side-effects

@crozone
Copy link

crozone commented May 18, 2021

Is there any chance of these changes to be rolled back in the future? I came across this PR while researching LINQ performance and it seems a shame that these opimisations still cannot be enabled because of bad code that relies on .Select() side effects.

Is there scope to add an additional LINQ API to force enumeration to give codebases that rely on this behaviour an easy out, while the general case can benefit?

Alternatively, is there any scope for these optimisations to be enabled if LINQ is used with static anonymous functions that are less likely to have side effects?

@karelz
Copy link
Member

karelz commented May 18, 2021

@crozone We do not monitor closed PRs/bug, esp. in dotnet/corefx repo which is now used only for servicing. I would recommend to open a question on dotnet/runtime repo and link back to this PR.

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Ensure the selector gets run during Count.

Commit migrated from dotnet/corefx@b879dc0
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants