Towards array nirvana #7941

timholy · 2014-08-10T10:37:22Z

For 0.4 several of us want to see significant improvements in arrays. I'm opening this meta-issue to help organize the effort. I'll be brief here and link out to issues/packages for more detailed explanations. Please add to these bullet points.

Underlying technologies

The first two are essential, the third is nearly essential.

Julia native bounds checking and removal (extensible bounds checking removal #7799) (PR: RFC: Safer, extensible ﹫inbounds #8227)
Staged functions (WIP: Staged functions #7474, splatting fixes For caching staged methods, don't restrict the number of varargs to match #7935)
Efficient cartesian iteration ~~(WIP: Add Cartesian product iteration. Fixes #1917 #6437)~~ (Efficient cartesian iteration (new version of #6437) #8432)
Eliminate the performance hit from splatting (update: stagedfunctions can be coaxed into doing this for you)
Even more efficient tuple storage (for implementing fixed-size arrays)
A mutable fixed-size buffer type analogous to tuples? (for mutable fixed-size arrays; Clang.jl would probably like this too)
Make [a, b] non-concatenating make [a, b] not concatenate #3737/make {a,b} give better-typed arrays [original title: make [a,b] non-concatenating] #2488

I'm guessing the approach for implementing #7799 will also allow one to specify manual inlining (which is where the idea was originally proposed), which is the main bottleneck for #6437. So that's almost a 2-for-1 deal.

Implementation

The first two are essential.

ArrayViews (ArrayViews (A systematic array view framework) #5556). ~~In my opinion, we likely just want the ContiguousView part of it.~~
Work in progress: https://github.com/timholy/ArrayViewsAPL.jl
Fixed-size arrays (WIP: Add Cartesian product iteration. Fixes #1917 #6437)
Address SubArray bounds checking (or not): bug or feature? #4044
More diverse views permitting an AbstractVector index? (These would not be "strided".)
Move Colon translation out of the parser (RFC: Move Colon translation out of the parser #10331)

Opportunities

negated indexing, NegatedIndex type #1032
easier indexing for AbstractArrays, RFC: Give AbstractArrays smart and performant indexing behaviors for free #10525

ArrayViewsAPL is something I've not yet broadly announced. While the APL in the package's title is in reference to #5949 (for which it can be a test bed to see how we'd like it and how much would break), its real purpose is to exploit stagedfunctions for creating efficient and general view operations. Please see the README for explanations.

The text was updated successfully, but these errors were encountered:

timholy · 2014-08-10T10:42:46Z

Another fun ingredient in my own personal array nirvana will be https://github.com/timholy/NamedAxesArrays.jl. More joy from stagedfunctions 😄. But I don't propose that for inclusion in base.

lindahua · 2014-08-10T11:54:37Z

@timholy the contiguous rank of StridedView is important, consider:

v1 = view(a,:,1:2:7)
v2 = view(v1,:,2)

With the type system in ArrayViews, it can determine statically that v2 is a contiguous view.

timholy · 2014-08-10T12:00:30Z

I see, we probably want both then.

lindahua · 2014-08-10T12:02:27Z

The StridedView part is tightly integrated with ContiguousView. It is possible that lower rank slices of a non-contiguous Strided view is contiguous. The ArrayViews type system preserves this piece of information, so that something above can be determined statically.

timholy · 2014-08-10T12:03:56Z

Although how one decides which one wants is an interesting question---I guess we could return your ArrayView types when the parent is an array and no slicing is desired, and the ArrayViewAPL view type otherwise?

Jutho · 2014-08-10T12:07:41Z

Looking forward to 0.4 already...

timholy · 2014-08-10T12:14:54Z

Jutho, you've contributed a lot already, and it would be lovely to have your help implementing some of this!

jtravs · 2014-08-10T12:45:34Z

@timholy ArrayViewAPL is what I've been dreaming of since discovering Julia! Hopefully sliceview can become the standard indexing semantics. I'll be happy to test and help in any way I can once this lands in master (or is usable without having to jump over lots of branch fences)!

Jutho · 2014-08-10T12:57:45Z

@timholy I’d be happy to help when time permits. Let me know if you had anything specific in mind that I can contribute to.

On 10 Aug 2014, at 14:15, Tim Holy notifications@github.com wrote:

Jutho, you've contributed a lot already, and it would be lovely to have your help implementing some of this!

—
Reply to this email directly or view it on GitHub.

jiahao · 2014-08-10T13:40:59Z

+1 for named axes arrays

timholy · 2014-08-10T14:57:55Z

@lindahua, I added "Move Colon translation out of the parser" because presumably for ArrayViews that will be a critical step for A[:, i] to work as you're intending.

JeffBezanson · 2014-08-10T17:45:38Z

Great list!

Just to get a broader context, we also want more efficient strings (and possibly BigInts), which also require any-length, immutable homogeneous arrays. Similar enough to tuples to make one think. Instead of mutable fixed-size arrays, we might want to use the "assignable cell" model, where you have a mutable cell that can hold a single value. Then you just store different array values into it at different times. Local mutations of single elements can be optimized.

lindahua · 2014-08-10T20:41:40Z

@timholy Thanks for taking the lead on this.

I don't have particular preference as to whether to use ArrayViews or ArrayViewsAPL or a hybrid of both, or something afresh, as long as it is efficient enough for common cases.

I think there are two issues here can be discussed separately.

Semantics (behaviors): we have to specify clearly what certain expressions actually mean. For example:
- should a[i, :] be a (column) vector or a matrix with one row (when i is a scalar)?
- should we perform bounds checking for i upon construction of the view?
- should a[[1,3,7,12]] returns a view or a copy?
I think we should focus on making decision for such questions (and probably a few others) first.
Once we have a clear specification of the behavior, we can then proceed to work on an efficient implementation, which can be based on either ArrayViews, ArrayViewsAPL, or borrow things from both. The goal would be type stability and efficiency (of compilation, construction, and getindex).

lindahua · 2014-08-10T20:46:23Z

I remember this has been brought up a while ago in some issue. But it would be useful to mention it here. It can be useful to support syntax like:

a[..., i]
# so it means a[:,i] when ndims(a) == 2
# and a[:,:,i] when ndims(a) == 3, etc

timholy · 2014-08-10T21:13:39Z

Those are good questions. What's funny is that I am not actually sure they all need to be settled first. For example, in ArrayViewsAPL I've implemented both the current sub behavior and the current slice behavior. While they have drastically different implications for user code, the two functions differ by only a couple of lines.

Regarding a[[1,3,7,12]], at least for ArrayViewsAPL, it may just be a question of changing RangeIndex in the type declaration to AbstractVector. Whether we want this is, of course, a good question. I've gone back and forth on that. My personal thinking currently is that we probably want the option for a view of that type, but I wonder if we want that to be accessible only through view(a, [1,3,7,12]).

Bounds-checking upon construction is probably more important to settle early (this is #4044). I now suspect that we may need to have the base types check bounds upon construction. People who want to pull the tricks in #4044 might need to implement an AbstractArray type that doesn't check upon construction but has a getindex that always checks bounds, whether inside @inbounds or not (unless they really like to live dangerously, that is).

Overall, my view is that the real effort here is in the "underlying technologies" section of the list. I'd include the changes to the parser in this. The rest of the core "view" infrastructure (ignoring fixed-size arrays, etc) seems likely to be something one could finish banging out in a couple of days. Writing tests and dealing with the consequences will take rather longer.

JeffBezanson · 2014-08-10T21:17:59Z

We can also consider whether we want more kinds of indexes, such as #1032. This is important since it may ask for a more general approach to Colon than just making it a special case.

timholy · 2014-08-10T21:48:11Z

Good thought. I think stagedfunctions will make it a lot easier to support a diversity of indexes. Prior to stagedfunctions, we'd have had to worry about what to do if argument 1 was a Colon, argument 2 was a NegatedIndex, and argument 3 was a Vector{Int}, and whether we can handle that type-diversity efficiently in a single function. Now we just write one function:

stagedfunction someindexingfunction(A, indexes...)
    # ... optionally do a whole bunch of "common" stuff at the beginning
    if indexes[i] <: Colon
        mungedindexexprs[i] = :(1:size(A, $i))
    elseif indexes[i] <: NegatedIndex
        mungedindexexprs[i] = :(setdiff(1:size(A, $i), indexes[$i]))
    elseif ...
    end
    # ...more common work if you need to...
    # And here's what actually runs:
    :(A($(mungedindexexprs...))
end

With a single function we can basically handle anything we want to support, and all the tricky work is done at compile time by straightforward Julia code. You couldn't write something more runtime efficient by hand if you tried. It's a total game-changer.

JeffBezanson · 2014-08-10T21:54:32Z

I share your enthusiasm for staged functions, but not for if statements checking types :)

timholy · 2014-08-10T22:02:22Z

Fair enough. mungedindexexprs[i] = rewriteindex(:A, :indexes, i, indexes[i]).

lindahua · 2014-08-10T23:15:38Z

I will try to rewrite ArrayViews with staged functions and see how much it may simplify codes.

lindahua · 2014-08-10T23:16:51Z

Still, critical information like whether views are contiguous should be encoded by the type itself (this is essential for type stability).

quinnj · 2014-08-11T00:31:37Z

Can we add #2488 and #3737 to the list?

timholy · 2014-08-11T01:02:29Z

I see that "collaborator" tag, so feel free to add them to the bullet list.

StefanKarpinski · 2014-08-11T01:33:12Z

While we're at this array overhaul, can we try to make it so that switching between APL and Matlab style indexing is relatively simple? I.e. Just a matter of surface syntax.

timholy · 2014-08-11T01:36:03Z

That was my idea with ArrayViewsAPL: just choose

getindex(A, ...) = subview(A, ...)

or

getindex(A, ...) = sliceview(A, ...)

Seemed the safest way given that we don't yet know what we want.

jakebolewski · 2014-08-11T01:40:51Z

Since these enhancements are going to break most array code anyway, what is the status of {} arrays going forward? The syntax seems like an historical vestige. I feel it would be conceptually simpler to have just one array syntax.

ViralBShah · 2015-02-01T13:20:25Z

I have been following from a distance, but I am wondering if we are any closer now than 3 months ago. Quite a bit has happened with staged functions, sub arrays and cartesian iterators since then, which makes me hopeful that we are closer to replacing array indexing with views. Is it likely we can achieve this in 0.4?

andreasnoack · 2015-02-01T14:34:32Z

Some and hopefully most of the work should be done in #9150. The main issue is to handle Colon within Julia instead of in the parser. I've talk with @jakebolewski about fixing that.

timholy · 2015-02-01T16:21:58Z

In a sense this is already done, other than syntax (meaning A[:,j] gets replaced by slice(A, :, j)).

But it's still a little scary given "deep" bugs in stagedfunctions, #8504 and ~~#8553~~#8853. If those had been fixed, we surely would have had this functionality months ago.

timholy · 2015-02-01T16:30:12Z

Oh, and I forgot perhaps even the most important one: it would be insane to turn A[:,j] into slice(A, :, j) until #8227 (or alternate implementation) gets merged.

EDIT: Demo:

julia> A = reshape(1:15, 3, 5)
3x5 Array{Int64,2}:
 1  4  7  10  13
 2  5  8  11  14
 3  6  9  12  15

julia> b = slice(A, 1:2, 1)
2-element SubArray{Int64,1,Array{Int64,2},(UnitRange{Int64},Int64),2}:
 1
 2

julia> b[3]
3

julia> b = sub(A, 1, :)
1x5 SubArray{Int64,2,Array{Int64,2},(UnitRange{Int64},Colon),2}:
 1  4  7  10  13

julia> b[2,2]
5

jiahao · 2015-02-01T19:27:51Z

@timholy I think you meant some other issue besides #8553

timholy · 2015-02-01T19:45:48Z

Indeed I did, thanks for catching. (Corrected above.)

ViralBShah · 2015-04-23T13:08:19Z

It looks like much of this will make its way into 0.4. Should we refactor this issue into the bits that we want in 0.4, and a separate issue for what will be in 0.5?

timholy · 2015-04-23T13:20:52Z

Indeed, the large majority has been in 0.4 for some time---that's what all the checkmarks at the top are for 😄.

If you want to split out separate issues for the remaining items, I'm fine with that, but I don't really see the point.

mschauer · 2015-04-23T13:21:25Z

@lindahua Regarding your remark in the beginning, the issue on ... was #5405

timholy · 2015-04-23T13:23:06Z

#10525 was long overdue for a spot on that list, so I added it.

ViralBShah · 2015-04-23T20:01:48Z

I was just initially puzzled at seeing the 0.5 milestone. We can leave it as it is.

ViralBShah · 2015-04-23T20:02:30Z

The question I really had was - from a 0.4 perspective, which of the unchecked ones we want to get done, or are we already there?

blakejohnson · 2015-04-23T20:04:09Z

I think we should still aim for extensible bounds checking removal, #7799, in 0.4.

timholy · 2015-04-23T20:27:35Z

I was puzzled about the 0.5 milestone too. I wondered if you had changed it (I didn't). That said, I think we won't get most of the remaining items by 0.4, so bumping the milestone seems reasonable.

Realistically, I expect us to check only one more box: I think we will get "sweet custom array types" (aka #10525, hopefully supported by #10911 if I can get my act together in time). This is a really big step---one that I did not anticipate at the time I opened this issue---and it will be amazing to have. That would also open the way for fixing reshape (#10507), but I'd be a little reluctant to add that kind of functionality at a late date. I just tagged that PR as 0.5 to clarify my current thoughts about it.

Re "return views from indexing": I think the big show-stopper is #7799, since our views don't check bounds (in my opinion, we'd have a catastrophic loss of safety if we switched to returning views before addressing that issue). It's also worth reminding folks that @carlobaldassi has raised a number of concerns specifically about BitArrays that, to my knowledge, we haven't really addressed (see posts higher up in this issue). However, if we can satisfy ourselves about these two issues, then there's a lot that can happen relatively quickly (#10331, and returning views from indexing).

pao · 2015-04-23T20:49:09Z

I wondered if you had changed it

In case you hadn't seen it, this is in the issue's log, squeezed between comments--it was changed on March 7 by @vtjnash.

StefanKarpinski · 2015-04-23T20:50:34Z

@vtjnash went around and randomized all the milestones :-|

mbauman · 2015-04-23T20:53:26Z

I think the big show-stopper is #7799, since our views don't check bounds (in my opinion, we'd have a catastrophic loss of safety if we switched to returning views before addressing that issue).

I'm looking at this right now in the context of #10525. I think I may be able to rectify this since it puts unsafe_getindex on firmer ground. It sure would be a lot nicer with #7799, though.

(~~SubArray almost checks bounds right now, though.~~ It does so upon construction, and then it indexes into the parent array with bounds checks on. ~~The trouble is that in a multidimensional indexing expression, one of the indices may be out of bounds and it may return spurious data.~~ Edit: Ah, I see now. It doesn't check bounds in resolving its indices, either. Same solution applies, though.)

carlobaldassi · 2015-04-26T15:36:08Z

a number of concerns specifically about BitArrays that, to my knowledge, we haven't really addressed (see posts higher up in this issue).

To get back on that, my current thinking is that a lot of methods should be specialized to work on BitArray Views efficiently. It's going to take time -- and to increase test/bitarray.jl execution time considerably -- but in the end I think it's also going to improve performance in most common cases by allowing to avoid temporaries (e.g. sum(B[a:b]) and similar).

BTW I'm assuming that in most relevant cases views of views can be flattened, otherwise this is probably not going to work.

The only issue which I don't think we can really solve is treating efficiently views with generic, non contiguous indexing. However, this is only going to make a real difference when indexing repeatedly into the view, and maybe for nested views.

prcastro · 2015-06-04T19:22:21Z

I believe the last two items could be closed

mbauman · 2015-09-15T23:42:13Z

Superseded by #13157. If I missed anything from this issue, please add it there.

timholy added this to the 0.4 milestone Aug 10, 2014

timholy mentioned this issue Aug 10, 2014

Topics we can cover at this stage JuliaQuantum/JuliaQuantum.github.io#1

Closed

JeffBezanson modified the milestones: 0.4, 0.4-projects Aug 10, 2014

timholy mentioned this issue Nov 20, 2014

RFC: Safer, extensible ﹫inbounds #8227

Closed

MichaelHatherly mentioned this issue Dec 24, 2014

Documentation Documentation Documentation #9447

Merged

SimonDanisch mentioned this issue Jan 8, 2015

GLPlot.jl and OpenCV.jl interface possibilties? SimonDanisch/GLPlot.jl#18

Open

JeffBezanson mentioned this issue Mar 2, 2015

WIP: redesign of tuples and tuple types #10380

Merged

vtjnash modified the milestones: 0.5, 0.4 Mar 7, 2015

timholy mentioned this issue May 10, 2015

Vectorized code performance #11220

Closed

mbauman mentioned this issue Sep 15, 2015

Arraypocalypse Now and Then #13157

Closed

27 tasks

mbauman closed this as completed Sep 15, 2015

timholy mentioned this issue Jun 26, 2017

range indexing should produce a subarray, not a copy #3701

Closed

Towards array nirvana #7941

Towards array nirvana #7941

Comments

timholy commented Aug 10, 2014 • edited

Underlying technologies

Implementation

Opportunities

timholy commented Aug 10, 2014

lindahua commented Aug 10, 2014

timholy commented Aug 10, 2014

lindahua commented Aug 10, 2014

timholy commented Aug 10, 2014

Jutho commented Aug 10, 2014

timholy commented Aug 10, 2014

jtravs commented Aug 10, 2014

Jutho commented Aug 10, 2014

jiahao commented Aug 10, 2014

timholy commented Aug 10, 2014

JeffBezanson commented Aug 10, 2014

lindahua commented Aug 10, 2014

lindahua commented Aug 10, 2014

timholy commented Aug 10, 2014

JeffBezanson commented Aug 10, 2014

timholy commented Aug 10, 2014

JeffBezanson commented Aug 10, 2014

timholy commented Aug 10, 2014

lindahua commented Aug 10, 2014

lindahua commented Aug 10, 2014

quinnj commented Aug 11, 2014

timholy commented Aug 11, 2014

StefanKarpinski commented Aug 11, 2014

timholy commented Aug 11, 2014

jakebolewski commented Aug 11, 2014

ViralBShah commented Feb 1, 2015

andreasnoack commented Feb 1, 2015

timholy commented Feb 1, 2015

timholy commented Feb 1, 2015

jiahao commented Feb 1, 2015

timholy commented Feb 1, 2015

ViralBShah commented Apr 23, 2015

timholy commented Apr 23, 2015

mschauer commented Apr 23, 2015

timholy commented Apr 23, 2015

ViralBShah commented Apr 23, 2015

ViralBShah commented Apr 23, 2015

blakejohnson commented Apr 23, 2015

timholy commented Apr 23, 2015

pao commented Apr 23, 2015

StefanKarpinski commented Apr 23, 2015

mbauman commented Apr 23, 2015

carlobaldassi commented Apr 26, 2015

prcastro commented Jun 4, 2015

mbauman commented Sep 15, 2015

timholy commented Aug 10, 2014 •

edited