Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Give AbstractArrays smart and performant indexing behaviors for free #10525

Merged
merged 5 commits into from
Jun 4, 2015

Conversation

mbauman
Copy link
Sponsor Member

@mbauman mbauman commented Mar 15, 2015

This is still a work in progress, but I'd like to get some feedback on the architecture and design here before applying this same sort of scheme to setindex!.

The basic idea is that the only getindex method defined in base for abstract arrays is getindex(::AbstractArray, I...). And the only methods that an AbstractArray subtype must define are size and just one getindex method:

getindex(::T, ::Int) # if linearindexing(T) == LinearFast()
getindex(::T, ::Int, ::Int, #=...ndims(A) indices...=#) if LinearSlow()

Unfortunately, it is currently impossible to express the latter method for arbitrary dimensionalities, but in practice it's not a big issue: most LinearSlow() arrays have a fixed dimension.

This is achieved through dispatch on an internal _getindex method, which recomputes the indices such that it can call the canonical getindex method that the user must define. If the user has not defined their canonical method, it will fall back to an error method in _getindex. I use similar scheme for unsafe_getindex, with the exception that we can fallback to the safe version if the subtype hasn't defined the canonical unsafe method. This enables fast vector indexing by checking bounds of the index vectors instead of on each element. And once @inbounds is extensible, AbstractArrays will be able to support it by default.

The difficulty with all this redirection is that an extra function call can wreck indexing performance, and it can be hard to avoid. I've had particular difficulty getting good performance with CartesianIndexes, and I still lag in performance there by 20x for big arrays. I think call site inline annotations would be a magic bullet, but there may be other tricks we can use, too. I've not looked into this very carefully yet, though. (Fixed with a more sensible inlining strategy)

TL/DR:

In my cursory performance tests hacked onto Tim's indexing perf suite from his reshape work (more tests are needed), I'm close to matching or outperforming master with Array with only these definitions:

julia> methods(getindex, (Array, Any...))
3-element Array{Any,1}:
 getindex(A::Array{T,N},i::Int) at array.jl:304
 getindex{T<:Real}(A::Array{T,N},I::Range{T<:Real}) at array.jl:347 # Needed for bootstrap
 getindex(A::AbstractArray{T,N},I...) at abstractarray.jl:492

(Of course, in places where we're not quite able to close the gap we can always reinstate the specialized methods. This is just a very useful stress-test of both functionality and performance.)

cc: @timholy

@mbauman
Copy link
Sponsor Member Author

mbauman commented Mar 15, 2015

(I was expecting the tests to fail, but I'm amazed at the variety of ways they did so. At a minimum, this needs #9607 (subsumed), #10337, and #10505).

# Both index_shape and checksize compute sum(I); manually hoist it out
N = sum(I)
dest = similar(src, (N,))
size(dest) == (N,) || throw(DimensionMismatch())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this size check necessary? Haven't you just created dest of that size?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really - similar has. It could give us the wrong answer, and the check here should be very cheap compared to the allocations and many assignments (although I may need to hide it in a function; I haven't profiled yet). In general, my approach has been to not trust the output of similar. Master effectively does the same thing for arrays (although similar and the check are split across method boundaries): multidimensional.jl:219

@timholy
Copy link
Sponsor Member

timholy commented Mar 16, 2015

I'm super-excited about this, and will try to review soon. I just have pretty limited time slots right now.

@johnmyleswhite
Copy link
Member

This would be so awesome if it works out.

sub2ind(dims::(Integer,Integer), i1::Integer) = i1
sub2ind(dims::(Integer,Integer), i1::Integer, i2::Integer) = i1+dims[1]*(i2-1)
sub2ind(dims::(Integer,Integer), i1::Integer, i2::Integer, I::Integer...) = i1+dims[1]*(i2-1+sum(I)-length(I))
sub2ind(dims::(Integer,Integer,Integer...), i1::Integer, i2::Integer, I::Integer...) =
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need an @inline here to ensure good performance for higher dimensions?

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for N>5. See #10337 (comment).

@timholy
Copy link
Sponsor Member

timholy commented Mar 17, 2015

This is awesome. In broad terms, I agree this is how indexing should work. You also have a lot of clever insights into how to implement this---it was fun to read your code.

You're basically building this on top of #8227? Perhaps we should just merge that and deal with the consequences later.

Unfortunately, it is currently impossible to express the latter method for arbitrary dimensionalities, but in practice it's not a big issue: most LinearSlow() arrays have a fixed dimension.

I'm not sure I agree with this. ReshapedArrays.jl, Interpolations.jl, and your own AxisArrays.jl (if you pass in a LinearSlow() parent) are good counter-examples. I fear we need to introduce getindexN.

Finally, scalar indexing with non-reals is a genuine possibility, see the remarks re DualNumbers.jl in #9586.

@mbauman
Copy link
Sponsor Member Author

mbauman commented Mar 17, 2015

Thanks!

You're basically building this on top of #8227?

Yes, in spirit. I just needed some way to express getindex without a bounds check; BitArrays and a few other data structures already use this idiom, too. I really want user-extensible @inbounds, too, but after working on this I'm not sure having two independent methods is the best solution. It adds a lot of indirection here, and it'd be nice if it could work for things like sub2ind.

I think your alternative proposal in #7799 (comment) might be more attractive, but that just punts the complexity up the chain to a system I don't know and only a few could implement. The method table would keep track of the methods compiled with and without bounds checks. When compiling without bounds checks, the compiler would simply elide everything within @boundscheck. I don't think @inbounds should propagate, so every performant getindex method would be written getindex(...) = (@boundscheck …; @inbounds …).

@mbauman
Copy link
Sponsor Member Author

mbauman commented Mar 17, 2015

The method table would keep track of the methods compiled with and without bounds checks.

Hunh, that sounds a lot like multiple dispatch. getindex(::Type{BoundsCheckOn}, x, I...)? That would actually solve the issues in #8227 with rewriting function names.

@simonster
Copy link
Member

Using multiple dispatch has some elegance, but it has problems of its own.

The first problem is that it makes defining getindex uglier for cases where performance doesn't matter: if getindex(x, I...) is now getindex(::Type{BoundsCheckOn}, x, I...), then it seems that all existing definitions need to be changed. We could have a deprecation that gets thrown when an old getindex method is called, similar to @vtjnash's deprecation for convert with pointer types, but arguably users shouldn't need to think about bounds checks unless they need to turn them off for performance purposes.

The second problem is that, at least as far as I can tell, there's still a lowering issue. It's not possible to create an @inbounds macro that changes the first argument to getindex without modifying the frontend (or reproducing the lowering code in Julia), since ref is not lowered to getindex in the AST that the macro gets. Using multiple dispatch would solve @JeffBezanson's scoping concerns, but it seems we'd still need to add a new Expr or an argument to refto specify whether bounds checks are on or off.

@timholy
Copy link
Sponsor Member

timholy commented Mar 22, 2015

Following up from #10507 (comment). @JeffBezanson, your input here would be helpful: how can we restrict dispatch to calling a particular method only for a specific number of varargs? I see a couple of options:

  • Introduce getindexN as an indexing function that only gets called when length(indexes) == ndims(A)
  • WIP: ReshapedArrays #10507 (comment)
  • Introduce getindex{T,N}(A::AbstractArray{T,N}, I::NTuple{N}) (forces a tuple creation unless inlined)
  • A new notation, getindex{T,N}(A::AbstractArray{T,N}, indexes...N).

To me the last seems most attractive, followed by the first or third.

@timholy
Copy link
Sponsor Member

timholy commented Mar 23, 2015

One more option worth considering: rather than a tuple, use a CartesianIndex{N}. The main catch: in order to support things like InterpolationArray, the element type should be of any Number. This is a little awkward, given that the main mission for CartesianIndex is as a glorified counter.

@timholy
Copy link
Sponsor Member

timholy commented Mar 25, 2015

Barring further suggestions, I'm going to (time permitting) start playing around with adding a new field to a jl_methlist_t: an index into the tvars field for the parameter specifying the length of the varargs list.
With some parser changes, that should allow us to support the syntax

getindex{T,N}(A::AbstractArray{T,N}, indexes...N)

@mbauman
Copy link
Sponsor Member Author

mbauman commented Mar 25, 2015

That would be awesome!

mbauman added a commit that referenced this pull request Jun 9, 2015
Minor oversight from #10525, this restores the previous behavior where indexing a SubArray by, e.g., [1 2; 3 4], returns an array of the same size with the given linear indices.
vtjnash added a commit that referenced this pull request Jun 10, 2015
fix #11187 (pass struct and tuple objects by stack pointer)
fix #11450 (ccall emission was frobbing the stack)
likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
vtjnash added a commit that referenced this pull request Jun 11, 2015
fix #11187 (pass struct and tuple objects by stack pointer)
fix #11450 (ccall emission was frobbing the stack)
likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
vtjnash added a commit that referenced this pull request Jun 12, 2015
fix #11187 (pass struct and tuple objects by stack pointer)
fix #11450 (ccall emission was frobbing the stack)
likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
vtjnash added a commit that referenced this pull request Jun 16, 2015
fix #11187 (pass struct and tuple objects by stack pointer)
fix #11450 (ccall emission was frobbing the stack)
likely may fix #11026 and may fix #11003 (ref #10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
@mbauman mbauman mentioned this pull request Jun 20, 2015
fcard pushed a commit to fcard/julia that referenced this pull request Jul 8, 2015
fix JuliaLang#11187 (pass struct and tuple objects by stack pointer)
fix JuliaLang#11450 (ccall emission was frobbing the stack)
likely may fix JuliaLang#11026 and may fix JuliaLang#11003 (ref JuliaLang#10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
fcard pushed a commit to fcard/julia that referenced this pull request Jul 8, 2015
fix JuliaLang#11187 (pass struct and tuple objects by stack pointer)
fix JuliaLang#11450 (ccall emission was frobbing the stack)
likely may fix JuliaLang#11026 and may fix JuliaLang#11003 (ref JuliaLang#10525) invalid stack-read on 32-bit

this additionally changes the julia specSig calling convention to pass
non-primitive types by pointer instead of by-value

this additionally fixes a bug in gen_cfunction that could be exposed by
turning off specSig

this additionally moves the alloca calls in ccall (and other places) to
the entry BasicBlock in the function, ensuring that llvm detects them as
static allocations and moves them into the function prologue

this additionally fixes some undefined behavior from changing
a variable's size through a alloca-cast instead of zext/sext/trunc

this additionally prepares for turning back on allocating tuples as vectors,
since the gc now guarantees 16-byte alignment

future work this makes possible:
 - create a function to replace the jlallocobj_func+init_bits_value call pair (to reduce codegen pressure)
 - allow moving pointers sometimes rather than always copying immutable data
 - teach the GC how it can re-use an existing pointer as a box
tomasaschan added a commit to JuliaMath/Interpolations.jl that referenced this pull request Jul 22, 2015
mbauman added a commit that referenced this pull request Jul 24, 2015
Back in #10525, I decided to deprecate `setindex!(A, x, I::Union{Real, Int, AbstractArray}...)` for symmetry since `getindex` only allows vector indices when there's more than one index.

But looking forward, I would really like to work towards APL semantics in 0.5 wherein the sum of the dimensionality of the indices is the dimensionality of the output array.  For example, indexing `A[[1 2; 3 4], 1]` would output a 2-dimensional `2x2` array: `[A[1,1] A[2, 1]; A[3,1] A[4,1]]`.  In which case, we'd add support back in for `setindex!` with array indices for symmetry.  This seems like needless churn - let's just leave things be until 0.5.
timholy added a commit that referenced this pull request Aug 11, 2015
@mbauman mbauman mentioned this pull request Sep 15, 2015
27 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet