New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure the selector gets run during Count. #14435

Merged
merged 3 commits into from Dec 27, 2016

Conversation

@jamesqo
Contributor

jamesqo commented Dec 11, 2016

Select does not change the count of an enumerable, so previously we made an optimization where if Count() was called we would bypass running the selector altogether and iterate directly through the source. This commit undoes that and makes sure we always run the selector if onlyIfCheap is false.

Fixes #13910

cc @JonHanna, @stephentoub, @VSadov

if (!onlyIfCheap)
{
int end = _minIndexInclusive + count;
for (int i = _minIndexInclusive; i != end; ++i)

This comment has been minimized.

@jamesqo

jamesqo Dec 11, 2016

Contributor

An issue came up here that I wasn't sure best how to approach:

  • lazyEnumerable.Select(i => i).Skip(1).Count() runs the selector lazyEnumerable.Count() times, because Select.Skip on a lazy enumerable isn't specially recognized and the selector gets run on the first item.

  • list.Select(i => i).Skip(1).Count() is specially recognized, however, and it returns a SelectListPartitionIterator which does not run the selector on the first item.

One way to fix this would be to start from 0 instead of _minIndexInclusive here. However, if we do that, we break Skip(1).Select(i => i) which also ends up here; patterns like those should definitely not run the selector on the first item.

Ideally, we would somehow have a way to differentiate if Skip or Select was called first from within the iterator, and start from _minIndexInclusive or 0 accordingly. But then we might need to add an extra field...

cc @JonHanna

@jamesqo

jamesqo Dec 11, 2016

Contributor

An issue came up here that I wasn't sure best how to approach:

  • lazyEnumerable.Select(i => i).Skip(1).Count() runs the selector lazyEnumerable.Count() times, because Select.Skip on a lazy enumerable isn't specially recognized and the selector gets run on the first item.

  • list.Select(i => i).Skip(1).Count() is specially recognized, however, and it returns a SelectListPartitionIterator which does not run the selector on the first item.

One way to fix this would be to start from 0 instead of _minIndexInclusive here. However, if we do that, we break Skip(1).Select(i => i) which also ends up here; patterns like those should definitely not run the selector on the first item.

Ideally, we would somehow have a way to differentiate if Skip or Select was called first from within the iterator, and start from _minIndexInclusive or 0 accordingly. But then we might need to add an extra field...

cc @JonHanna

This comment has been minimized.

@JonHanna

JonHanna Dec 11, 2016

Collaborator

I'm inclined to think that we don't care.

A scenario that was called out as important is someone calling Count() on a Select result specifically to trigger side effects in selectors. (Not a sound practice IMO, but that's another matter). Such a use would be stymied by optimisations that skipped the selectors, and so we avoid such optimisation.

A user who skips something has indicated indifference to that thing. As such I'm inclined to think it doesn't matter whether we run n or n-1 selectors. Indeed, I'm happy running 0 in this case and just calculating what the result of Count() would be.

Others may not be as willing to go with quite so observable a difference to .Net4.6 Framework behaviour though. TBH if this was my PR I'd be taking the fastest route but prepared to back down if I failed to convince on that point.

@JonHanna

JonHanna Dec 11, 2016

Collaborator

I'm inclined to think that we don't care.

A scenario that was called out as important is someone calling Count() on a Select result specifically to trigger side effects in selectors. (Not a sound practice IMO, but that's another matter). Such a use would be stymied by optimisations that skipped the selectors, and so we avoid such optimisation.

A user who skips something has indicated indifference to that thing. As such I'm inclined to think it doesn't matter whether we run n or n-1 selectors. Indeed, I'm happy running 0 in this case and just calculating what the result of Count() would be.

Others may not be as willing to go with quite so observable a difference to .Net4.6 Framework behaviour though. TBH if this was my PR I'd be taking the fastest route but prepared to back down if I failed to convince on that point.

Assert.Equal(source.Count(), timesRun);
}
// [Theory]

This comment has been minimized.

@jamesqo

jamesqo Dec 11, 2016

Contributor

Disabled currently because the first assert is giving inconsistent results. See comment above

@jamesqo

jamesqo Dec 11, 2016

Contributor

Disabled currently because the first assert is giving inconsistent results. See comment above

{
_selector(item);
}
}

This comment has been minimized.

@JonHanna

JonHanna Dec 11, 2016

Collaborator

IIRC, if we just returned -1 in this case then the calling Count() method would do pretty much the above. I would imagine this would be slightly faster, but only slightly (that is just a guess though). Do we need the extra code here?

@JonHanna

JonHanna Dec 11, 2016

Collaborator

IIRC, if we just returned -1 in this case then the calling Count() method would do pretty much the above. I would imagine this would be slightly faster, but only slightly (that is just a guess though). Do we need the extra code here?

This comment has been minimized.

@jamesqo

jamesqo Dec 12, 2016

Contributor

@JonHanna During #12703 when I had optimized Where.Select and the issue of running these selectors had come up, I had originally had a EnumerableHelpers.Count function that Enumerable.Count would call after checking for Linq interfaces (just like what ToArray does today). This was the code I had written for the iterators

// Leave it to Count to iterate through us
public int GetCount(bool onlyIfCheap) => onlyIfCheap ? -1 : EnumerableHelpers.Count(this);

However, @stephentoub argued against this. See here for context: #12703 (comment) I ended up writing everything inline for GetCount in those iterators. So I just employed the same strategy here.


I would imagine this would be slightly faster, but only slightly

Virtual method calls are pretty expensive; going from 2 -> 3 virtual method calls (MoveNext & Current to MoveNext, Current & MoveNext) should probably be more than half of a 33% difference. I haven't measured either, but I'm not sure if it would be wise to regress perf here regardless.

@jamesqo

jamesqo Dec 12, 2016

Contributor

@JonHanna During #12703 when I had optimized Where.Select and the issue of running these selectors had come up, I had originally had a EnumerableHelpers.Count function that Enumerable.Count would call after checking for Linq interfaces (just like what ToArray does today). This was the code I had written for the iterators

// Leave it to Count to iterate through us
public int GetCount(bool onlyIfCheap) => onlyIfCheap ? -1 : EnumerableHelpers.Count(this);

However, @stephentoub argued against this. See here for context: #12703 (comment) I ended up writing everything inline for GetCount in those iterators. So I just employed the same strategy here.


I would imagine this would be slightly faster, but only slightly

Virtual method calls are pretty expensive; going from 2 -> 3 virtual method calls (MoveNext & Current to MoveNext, Current & MoveNext) should probably be more than half of a 33% difference. I haven't measured either, but I'm not sure if it would be wise to regress perf here regardless.

This comment has been minimized.

@JonHanna

JonHanna Dec 12, 2016

Collaborator

Fair enough.

@JonHanna

JonHanna Dec 12, 2016

Collaborator

Fair enough.

for (int i = 0; i < count; i++)
{
_selector(_source[i]);
}

This comment has been minimized.

@JonHanna

JonHanna Dec 11, 2016

Collaborator

As above, could we just return -1 here and let the caller do this?

@JonHanna

JonHanna Dec 11, 2016

Collaborator

As above, could we just return -1 here and let the caller do this?

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Dec 11, 2016

Collaborator

I'm passing the buck on the matter you raised, but aside from that, LG2M.

Collaborator

JonHanna commented Dec 11, 2016

I'm passing the buck on the matter you raised, but aside from that, LG2M.

@JonHanna

This comment has been minimized.

Show comment
Hide comment
@JonHanna

JonHanna Dec 11, 2016

Collaborator

Test Innerloop CentOS7.1 Release Build and Test

Collaborator

JonHanna commented Dec 11, 2016

Test Innerloop CentOS7.1 Release Build and Test

@karelz

This comment has been minimized.

Show comment
Hide comment
@karelz

karelz Dec 12, 2016

Member

Nit: Please change title, this one won't be useful in git history ...

Member

karelz commented Dec 12, 2016

Nit: Please change title, this one won't be useful in git history ...

@jamesqo jamesqo changed the title from Fix #13910 and add tests to ensure the selector gets run during Count. to Ensure the selector gets run during Count. Dec 19, 2016

@VSadov

This comment has been minimized.

Show comment
Hide comment
@VSadov

VSadov Dec 27, 2016

Member

@jamesqo @JonHanna - I think we need an issue to discuss whether Skip is allowed to skip selectors.

To me it is not obvious that Select(s).Skip(10) would be ok to run the same as Skip(10).Select(s)

Member

VSadov commented Dec 27, 2016

@jamesqo @JonHanna - I think we need an issue to discuss whether Skip is allowed to skip selectors.

To me it is not obvious that Select(s).Skip(10) would be ok to run the same as Skip(10).Select(s)

@VSadov

This comment has been minimized.

Show comment
Hide comment
@VSadov

VSadov Dec 27, 2016

Member

The change itself - LGTM

Member

VSadov commented Dec 27, 2016

The change itself - LGTM

@VSadov VSadov merged commit b879dc0 into dotnet:master Dec 27, 2016

8 checks passed

Innerloop CentOS7.1 Debug Build and Test Build finished.
Details
Innerloop CentOS7.1 Release Build and Test Build finished.
Details
Innerloop OSX Debug Build and Test Build finished.
Details
Innerloop OSX Release Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Debug Build and Test Build finished.
Details
Innerloop Ubuntu14.04 Release Build and Test Build finished.
Details
Innerloop Windows_NT Debug Build and Test Build finished.
Details
Innerloop Windows_NT Release Build and Test Build finished.
Details
@VSadov

This comment has been minimized.

Show comment
Hide comment
@VSadov

VSadov Dec 27, 2016

Member

Enetered #14729 to follow up with the Skip, or more precisely the behavior of Partition.

Member

VSadov commented Dec 27, 2016

Enetered #14729 to follow up with the Skip, or more precisely the behavior of Partition.

@jamesqo jamesqo deleted the jamesqo:select-count branch Dec 28, 2016

@karelz karelz modified the milestone: 2.0.0 Jan 3, 2017

@crokusek

This comment has been minimized.

Show comment
Hide comment
@crokusek

crokusek May 26, 2017

Please consider weighing on the "side issue" if you think it is a good question.

https://stackoverflow.com/questions/32000607/is-it-bad-practice-to-purposely-rely-on-linq-side-effects

crokusek commented May 26, 2017

Please consider weighing on the "side issue" if you think it is a good question.

https://stackoverflow.com/questions/32000607/is-it-bad-practice-to-purposely-rely-on-linq-side-effects

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment