Make eachindex more efficient for linear arrays #10704

mbauman · 2015-03-31T20:41:38Z

This greatly improves the ability to use eachindex in generic code. For arrays with fast linear storage, eachindex now simply returns a UnitRange to linearly index the array. For arrays with slow linear access, this still returns a CartesianRange. In typical code constructs, the resulting elements from the iterators can be used identically. (If not, I would consider it a missing functionality in CartesianIndex.)

As a result, we can simplify AbstractArray cartesian iteration by moving all the complexity into the eachindex iterator.

Cc @timholy, ref the discussion in #4774 (comment)

kmsquire · 2015-03-31T20:51:38Z

base/dict.jl

@@ -713,7 +713,7 @@ function skip_deleted(h::Dict, i)
 end

 start(t::Dict) = skip_deleted(t, 1)
-done(t::Dict, i) = done(t.vals, i)
+done(t::Dict, i) = i > length(t.vals)


I'm curious why this change is necessary...

This changes the iterator state to be a tuple of (IndexIterator, IteratorState) instead of just the next index. Dict was (incorrectly) passing a state to the Array iteration protocol that did not come from the Array's start or next.

Got it, thanks.

JeffBezanson · 2015-03-31T21:05:53Z

+1
This is definitely the way forward!

I worry about the extra complexity in AbstractArray iteration. Maybe we should still use the old, super-simple definitions for Array, in the interest of code size if not performance.

timholy · 2015-03-31T21:42:17Z

I'm completely in favor.

Have you examined performance and/or looked at the generated code to make sure there is no performance loss? Given Jeff's concern, it would be interesting to know whether all the apparent complexity ends up getting inlined out for Arrays. (That said, for people digging into base with @which trying to figure out how it all works, there's no doubt the more general code is harder to understand...so Jeff's point has validity even aside from the generated code.)

JeffBezanson · 2015-03-31T21:56:54Z

I imagine everything can be inlined, and this can be really fast. But iterating over Arrays is so common that this could create significant extra work for type inference and the inliner, even if there is no effect on the final machine code.

timholy · 2015-03-31T22:06:22Z

True indeed. I'm in favor of the idea of reintroducing specialized versions for Arrays.

mbauman · 2015-04-01T02:12:24Z

That makes sense. For linear fast arrays, this inlines perfectly and has no noticeable performance impact, but I agree that we should keep the specialized methods for Array to reduce the code size. I had only been looking at performance for fast linear arrays…

The bad news is that this is not inlined for Linear Slow arrays with a huge performance impact. I suppose that makes sense, though… the previous next methods that this replaces were explicitly inlined. But even with that change I'm seeing a 10x performance regression. I'll have to look closer at this.

StefanKarpinski · 2015-04-01T14:24:19Z

What are "Linear Slow arrays"?

mbauman · 2015-04-01T14:39:52Z

I updated the documentation for eachindex here to try to make this a bit more clear:

Creates an iterable object for visiting each index of an AbstractArray A in an efficient manner. For array types that have opted into fast linear indexing (like Array), this is simply the range 1:length(A). For other array types, this returns a specialized Cartesian range to efficiently index into the array with indices specified for every dimension.

I think that Base.linearindexing should also be documented (albeit still not exported since it is typically only extended but not called). Here's a stab at what it might say:

linearindexing defines how an AbstractArray most efficiently accesses its elements. If Base.linearindexing(A) returns Base.LinearFast(), this means that linear indexing with only one index is an efficient operation. If it instead returns Base.LinearSlow() (by default), this means that the array intrinsically accesses its elements with indices specified for every dimension. Since converting a linear index to multiple indexing subscripts is typically very expensive, this allows generic code to be written in an efficient manner for all array types with trait-based dispatch.

mbauman · 2015-04-01T14:46:25Z

Sprinkling in a few more inline annotations has helped some, but I'm still allocating 128 bytes per loop for iterating over a sparse matrix.

timholy · 2015-04-01T14:48:14Z

It's almost perfect 😄. I would consider rewording the last sentence to something like

...this provides a traits-based mechanism allowing one to write generic code in an efficient manner for all array types.

mbauman · 2015-04-01T18:01:18Z

OK! Getting closer to parity with master here. No more allocations (I think the core issue was having nested tuples). This is now actually a bit faster for loops like for i in eachindex(A). Since this depends upon the iteration protocol of eachindex, that's a hard limit as to how fast we can iterate for a in A… but somehow master is able to iterate 5x faster for small arrays. It is currently cheating, but the perf tests I've been running wrap everything in @inbounds anyways. Edit: performance comparison - sumelt is for a in A, sumeach is for a in eachindex(A), sumfast is what this PR does. Is is Int Small, Fb is Float big.

Now to chase down those test failures...

timholy · 2015-04-01T18:15:59Z

Nice!

There were already a few oddities about our performance, and overall even on master we're a bit slow on small arrays (see the benchmarks in #10507, focusing on Array, ArrayLS, and ArrayLF).

mbauman · 2015-04-01T18:29:31Z

Ah, great suggestion. I ran the full suite (and updated the gist). I'm on-par or better than everything except that wonky SubArray case. In some cases performance here is 10x better. So I'm considering performance fixed.

timholy · 2015-04-01T18:31:18Z

That's awesome! For any other performance oddities, also bear #9080 in mind. It seems to affect all "complex" iterators. (Unless you've solved it??)

mbauman · 2015-04-01T18:32:28Z

julia> @time sumcart_manual(A)
elapsed time: 0.14881743 seconds (184 bytes allocated)
5.000083609636129e7

julia> @time sumcart_iter(A)
elapsed time: 0.145975981 seconds (184 bytes allocated)
5.000083609636129e7

:)

timholy · 2015-04-01T18:37:25Z

Yes, but I'm guessing you're cheating :-). With this PR, you have to switch sumcart_iter to for I in eachindex(size(A)).

mbauman · 2015-04-01T18:39:28Z

Aw, shucks, I didn't notice that. Right you are, this doesn't solve it.

timholy · 2015-04-01T18:44:40Z

You had me hoping there...anyway, just wanted to make sure you knew about that issue---might help interpret any performance oddities.

This greatly improves the ability to use `eachindex` in generic code. For arrays with fast linear storage, `eachindex` now simply returns a `UnitRange` to linearly index the array. For arrays with slow linear access, this still returns a `CartesianRange`. In typical code constructs, the resulting elements from the iterators can be used identically. As a result, we can simplify AbstractArray cartesian iteration by moving all the complexity into the `eachindex` iterator.

mbauman · 2015-04-02T03:49:31Z

Alright! All tests pass locally (including the full subarray suite). I've squashed the commits so this should be good to go (pending CI and a final thumbs up).

Jutho · 2015-04-02T09:14:34Z

base/multidimensional.jl

+# 0-d cartesian ranges are special-cased to iterate once and only once
+start{I<:CartesianIndex{0}}(iter::CartesianRange{I}) = false
+next{I<:CartesianIndex{0}}(iter::CartesianRange{I}, state) = iter.start, true
+done{I<:CartesianIndex{0}}(iter::CartesianRange{I}, state) = state



That's indeed much nicer than the old trick with finished.

Jutho · 2015-04-02T09:16:37Z

Looks great to me!

timholy · 2015-04-02T10:48:21Z

base/abstractarray.jl

-## iteration support for arrays as ranges ##
+## iteration support for arrays ##
+macro _inline_meta()
+    Expr(:meta, :inline)


A comment here about pushmeta! not yet being available in the bootstrap might be handy. Otherwise
there's a (small) risk we might see usages of Base.@_inline_meta popping up in package code from folks who learn by browsing through the source.

timholy · 2015-04-02T10:52:47Z

👍 Thanks for doing this!

Make eachindex more efficient for linear arrays

kmsquire reviewed Mar 31, 2015
View reviewed changes

mbauman force-pushed the mb/eachindex branch from 6f158ca to 39ef7c4 Compare April 1, 2015 02:12

mbauman changed the title ~~Make eachindex more efficient for linear arrays~~ WIP: Make eachindex more efficient for linear arrays Apr 1, 2015

mbauman force-pushed the mb/eachindex branch from 1caa7c5 to 57f9e66 Compare April 2, 2015 03:43

mbauman changed the title ~~WIP: Make eachindex more efficient for linear arrays~~ Make eachindex more efficient for linear arrays Apr 2, 2015

Jutho reviewed Apr 2, 2015
View reviewed changes

timholy reviewed Apr 2, 2015
View reviewed changes

Add a comment about the internal _inline_meta macro

adde889

mbauman added a commit that referenced this pull request Apr 2, 2015

Merge pull request #10704 from mbauman/mb/eachindex

d61c7e4

Make eachindex more efficient for linear arrays

mbauman merged commit d61c7e4 into JuliaLang:master Apr 2, 2015

mbauman deleted the mb/eachindex branch April 2, 2015 13:30

mbauman mentioned this pull request Apr 13, 2015

improved ind2sub/sub2ind #10337

Merged

mbauman mentioned this pull request Jun 15, 2015

RFC: start(arr) vs. 1 for bounds checking, iteration vs. indexing #11713

Closed

mbauman mentioned this pull request Mar 21, 2016

ReshapedArrays, take2 #15449

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make eachindex more efficient for linear arrays #10704

Make eachindex more efficient for linear arrays #10704

mbauman commented Mar 31, 2015

kmsquire Mar 31, 2015

mbauman Mar 31, 2015

kmsquire Mar 31, 2015

JeffBezanson commented Mar 31, 2015

timholy commented Mar 31, 2015

JeffBezanson commented Mar 31, 2015

timholy commented Mar 31, 2015

mbauman commented Apr 1, 2015

StefanKarpinski commented Apr 1, 2015

mbauman commented Apr 1, 2015

mbauman commented Apr 1, 2015

timholy commented Apr 1, 2015

mbauman commented Apr 1, 2015

timholy commented Apr 1, 2015

mbauman commented Apr 1, 2015

timholy commented Apr 1, 2015

mbauman commented Apr 1, 2015

timholy commented Apr 1, 2015

mbauman commented Apr 1, 2015

timholy commented Apr 1, 2015

mbauman commented Apr 2, 2015

Jutho Apr 2, 2015

Jutho commented Apr 2, 2015

timholy Apr 2, 2015

timholy commented Apr 2, 2015

Make eachindex more efficient for linear arrays #10704

Make eachindex more efficient for linear arrays #10704

Conversation

mbauman commented Mar 31, 2015

kmsquire Mar 31, 2015

Choose a reason for hiding this comment

mbauman Mar 31, 2015

Choose a reason for hiding this comment

kmsquire Mar 31, 2015

Choose a reason for hiding this comment

JeffBezanson commented Mar 31, 2015

timholy commented Mar 31, 2015

JeffBezanson commented Mar 31, 2015

timholy commented Mar 31, 2015

mbauman commented Apr 1, 2015

StefanKarpinski commented Apr 1, 2015

mbauman commented Apr 1, 2015

mbauman commented Apr 1, 2015

timholy commented Apr 1, 2015

mbauman commented Apr 1, 2015

timholy commented Apr 1, 2015

mbauman commented Apr 1, 2015

timholy commented Apr 1, 2015

mbauman commented Apr 1, 2015

timholy commented Apr 1, 2015

mbauman commented Apr 1, 2015

timholy commented Apr 1, 2015

mbauman commented Apr 2, 2015

Jutho Apr 2, 2015

Choose a reason for hiding this comment

Jutho commented Apr 2, 2015

timholy Apr 2, 2015

Choose a reason for hiding this comment

timholy commented Apr 2, 2015