Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Iterators.partition split arrays into views for faster and easier parallelism #33533

Merged
merged 5 commits into from
Nov 1, 2019

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Oct 11, 2019

I've frequently wanted to use Iterators.partition to subdivide an iteration space into chunks amenable to parallelism, but its implementation — which defaulted to copying all elements of each partition into a new Vector — left some efficiencies to be desired. It did have one special case — wherein a partition of a Vector itself would helpfully use views. This PR makes that the default for all AbstractArrays. It goes further and adds very helpful performance specializations for some ranges and CartesianIndices essential for doing loops over partition(eachindex(...)).

This PR is divided into three commits:

  • The minor change, which makes all AbstractArrays return views
  • The optimization, which speeds up partitions of common ranges and CartesianIndices
  • The use-case, which makes BitArray broadcasts simpler, simdier, and faster. This was the use-case that drove my optimizations, and I put together a series of benchmarks. The short version is that by using partition we can easily split the algorithm into a @simdable section, leading to up to 10x speedups for code like a .== 0:

diff

@mbauman mbauman added minor change Marginal behavior change acceptable for a minor release needs news A NEWS entry is required for this change triage This should be discussed on a triage call broadcast Applying a function over a collection performance Must go faster and removed needs news A NEWS entry is required for this change labels Oct 11, 2019
@JeffBezanson
Copy link
Member

Cool!!

@StefanKarpinski
Copy link
Member

Argument for why this change is an acceptable minor change even though it is technically breaking:

  • since the partitions were previously copies, mutating them was almost always pointless
  • the main case that could break is if someone was hanging onto a partition and then mutating the main array or keeping the partition as an independent array from the original and mutating that

Both of these potential usages seem pretty unlikely. The main usage of this seems like it would be to just look at different partitions of the original array without doing any mutation of the original or the partition. With this change, a new potential use case becomes possible: modifying the original array by operating on partitions. Going back from views to copies would be more breaking since it would make code that used that use case stop working.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Oct 11, 2019

Another nice coincidence: @vtjnash is doing compiler work that will make taking views usually non-allocating and therefore much more efficient in many more cases, so this could get even more efficient by the time 1.4 is actually ready.

@ararslan
Copy link
Member

ararslan commented Oct 11, 2019

FWIW I've been using partition a lot lately and I didn't even realize it wasn't using views (nor did I do any mutation), so I'd second Stefan's comment:

Both of these potential usages seem pretty unlikely

This is great!

@StefanKarpinski
Copy link
Member

Hasn't been officially triaged, but if anyone has any objections, please post them here. I think this seems popular enough based on the reactions to be merged without a triage debate.

Previously only `::Vector` was special-cased to use views. The trade-off here is that we lose the ability to predict the concrete eltype -- since arrays can potentially choose to return something different from `vec` or `view`. Generic iterables still collect their elements into a freshly-allocated `Vector`, like before.
And also recompute ranges instead of using views for partitions of ranges.

Since `Iterators.partition` is so handy for dividing up iteration spaces, it makes sense to optimize this as much as possible. While it is enherently a "linear" operation, it is a batched linear operation that allows us to skip doing all the effective ind2sub work on every single iteration.
base/multidimensional.jl Outdated Show resolved Hide resolved
function iterate(itr::PartitionIterator{<:AbstractRange}, state=1)
state > length(itr.c) && return nothing
r = min(state + itr.n - 1, length(itr.c))
return @inbounds itr.c[state:r], r + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OT: Are we missing an abstraction here: should we define that view(::AbstractRange, slice::AbstractRange) isa AbstractRange, or would that confuse consumes or view?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's #26872 — looks like it got stalled because it was proposed before we really had a handle on minor changes.

mbauman and others added 2 commits October 31, 2019 15:05
shorter lines, more spaces, and comments (because I had already forgotten how this worked myself)
@mbauman
Copy link
Member Author

mbauman commented Oct 31, 2019

Triage was in favor and I addressed the style review... and while I was at it I added a few more comments because I didn't understand how this worked anymore.

@mbauman mbauman removed the triage This should be discussed on a triage call label Oct 31, 2019
@mbauman mbauman merged commit 3f0e7d6 into master Nov 1, 2019
@mbauman mbauman deleted the mb/fast-partitions branch November 1, 2019 22:30
@Sacha0
Copy link
Member

Sacha0 commented Nov 5, 2019

Very nice! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection minor change Marginal behavior change acceptable for a minor release performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants