Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure the selector gets run during Count. #14435

Merged
merged 3 commits into from Dec 27, 2016
Merged

Conversation

@jamesqo
Copy link
Contributor

@jamesqo jamesqo commented Dec 11, 2016

Select does not change the count of an enumerable, so previously we made an optimization where if Count() was called we would bypass running the selector altogether and iterate directly through the source. This commit undoes that and makes sure we always run the selector if onlyIfCheap is false.

Fixes #13910

cc @JonHanna, @stephentoub, @VSadov

if (!onlyIfCheap)
{
int end = _minIndexInclusive + count;
for (int i = _minIndexInclusive; i != end; ++i)

This comment has been minimized.

@jamesqo

jamesqo Dec 11, 2016
Author Contributor

An issue came up here that I wasn't sure best how to approach:

  • lazyEnumerable.Select(i => i).Skip(1).Count() runs the selector lazyEnumerable.Count() times, because Select.Skip on a lazy enumerable isn't specially recognized and the selector gets run on the first item.

  • list.Select(i => i).Skip(1).Count() is specially recognized, however, and it returns a SelectListPartitionIterator which does not run the selector on the first item.

One way to fix this would be to start from 0 instead of _minIndexInclusive here. However, if we do that, we break Skip(1).Select(i => i) which also ends up here; patterns like those should definitely not run the selector on the first item.

Ideally, we would somehow have a way to differentiate if Skip or Select was called first from within the iterator, and start from _minIndexInclusive or 0 accordingly. But then we might need to add an extra field...

cc @JonHanna

This comment has been minimized.

@JonHanna

JonHanna Dec 11, 2016
Collaborator

I'm inclined to think that we don't care.

A scenario that was called out as important is someone calling Count() on a Select result specifically to trigger side effects in selectors. (Not a sound practice IMO, but that's another matter). Such a use would be stymied by optimisations that skipped the selectors, and so we avoid such optimisation.

A user who skips something has indicated indifference to that thing. As such I'm inclined to think it doesn't matter whether we run n or n-1 selectors. Indeed, I'm happy running 0 in this case and just calculating what the result of Count() would be.

Others may not be as willing to go with quite so observable a difference to .Net4.6 Framework behaviour though. TBH if this was my PR I'd be taking the fastest route but prepared to back down if I failed to convince on that point.

Assert.Equal(source.Count(), timesRun);
}

// [Theory]

This comment has been minimized.

@jamesqo

jamesqo Dec 11, 2016
Author Contributor

Disabled currently because the first assert is giving inconsistent results. See comment above

@@ -226,6 +255,14 @@ public List<TResult> ToList()

public int GetCount(bool onlyIfCheap)
{
if (!onlyIfCheap)

This comment has been minimized.

@JonHanna

JonHanna Dec 11, 2016
Collaborator

The reason for this is obscure and would likely benefit from being commented on. Without context this looks like pointless busy work that should be deleted to improve efficiency.

return _source.Count;
int count = _source.Count;

if (!onlyIfCheap)

This comment has been minimized.

@JonHanna

JonHanna Dec 11, 2016
Collaborator

Likewise, your reason for doing this isn't obvious from the code alone, so should be commented on. And likely elsewhere, so I won't call out other cases.

@jamesqo jamesqo force-pushed the jamesqo:select-count branch from ca5307f to 5d629d1 Dec 11, 2016
{
_selector(item);
}
}

This comment has been minimized.

@JonHanna

JonHanna Dec 11, 2016
Collaborator

IIRC, if we just returned -1 in this case then the calling Count() method would do pretty much the above. I would imagine this would be slightly faster, but only slightly (that is just a guess though). Do we need the extra code here?

This comment has been minimized.

@jamesqo

jamesqo Dec 12, 2016
Author Contributor

@JonHanna During #12703 when I had optimized Where.Select and the issue of running these selectors had come up, I had originally had a EnumerableHelpers.Count function that Enumerable.Count would call after checking for Linq interfaces (just like what ToArray does today). This was the code I had written for the iterators

// Leave it to Count to iterate through us
public int GetCount(bool onlyIfCheap) => onlyIfCheap ? -1 : EnumerableHelpers.Count(this);

However, @stephentoub argued against this. See here for context: #12703 (comment) I ended up writing everything inline for GetCount in those iterators. So I just employed the same strategy here.


I would imagine this would be slightly faster, but only slightly

Virtual method calls are pretty expensive; going from 2 -> 3 virtual method calls (MoveNext & Current to MoveNext, Current & MoveNext) should probably be more than half of a 33% difference. I haven't measured either, but I'm not sure if it would be wise to regress perf here regardless.

This comment has been minimized.

@JonHanna

JonHanna Dec 12, 2016
Collaborator

Fair enough.

for (int i = 0; i < count; i++)
{
_selector(_source[i]);
}

This comment has been minimized.

@JonHanna

JonHanna Dec 11, 2016
Collaborator

As above, could we just return -1 here and let the caller do this?

@JonHanna
Copy link
Collaborator

@JonHanna JonHanna commented Dec 11, 2016

I'm passing the buck on the matter you raised, but aside from that, LG2M.

@JonHanna
Copy link
Collaborator

@JonHanna JonHanna commented Dec 11, 2016

Test Innerloop CentOS7.1 Release Build and Test

@karelz
Copy link
Member

@karelz karelz commented Dec 12, 2016

Nit: Please change title, this one won't be useful in git history ...

@jamesqo jamesqo changed the title Fix #13910 and add tests to ensure the selector gets run during Count. Ensure the selector gets run during Count. Dec 19, 2016
@VSadov
Copy link
Member

@VSadov VSadov commented Dec 27, 2016

@jamesqo @JonHanna - I think we need an issue to discuss whether Skip is allowed to skip selectors.

To me it is not obvious that Select(s).Skip(10) would be ok to run the same as Skip(10).Select(s)

@VSadov
Copy link
Member

@VSadov VSadov commented Dec 27, 2016

The change itself - LGTM

@VSadov VSadov merged commit b879dc0 into dotnet:master Dec 27, 2016
8 checks passed
8 checks passed
Innerloop CentOS7.1 Debug Build and Test Build finished.
Details
Innerloop CentOS7.1 Release Build and Test Build finished.
Details
Innerloop OSX Debug Build and Test Build finished.
Details
Innerloop OSX Release Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Debug Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Release Build and Test Build finished.
Details
Innerloop Windows_NT Debug Build and Test Build finished.
Details
Innerloop Windows_NT Release Build and Test Build finished.
Details
@VSadov
Copy link
Member

@VSadov VSadov commented Dec 27, 2016

Enetered https://github.com/dotnet/corefx/issues/14729 to follow up with the Skip, or more precisely the behavior of Partition.

@jamesqo jamesqo deleted the jamesqo:select-count branch Dec 28, 2016
@karelz karelz modified the milestone: 2.0.0 Jan 3, 2017
@crokusek
Copy link

@crokusek crokusek commented May 26, 2017

Please consider weighing on the "side issue" if you think it is a good question.

https://stackoverflow.com/questions/32000607/is-it-bad-practice-to-purposely-rely-on-linq-side-effects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.