Towards array nirvana #7941

Closed
timholy opened this Issue Aug 10, 2014 · 78 comments

Comments

Projects
None yet
@timholy
Member

timholy commented Aug 10, 2014

For 0.4 several of us want to see significant improvements in arrays. I'm opening this meta-issue to help organize the effort. I'll be brief here and link out to issues/packages for more detailed explanations. Please add to these bullet points.

Underlying technologies

The first two are essential, the third is nearly essential.

  • Julia native bounds checking and removal (#7799) (PR: #8227)
  • Staged functions (#7474, splatting fixes #7935)
  • Efficient cartesian iteration (#6437) (#8432)
  • Eliminate the performance hit from splatting (update: stagedfunctions can be coaxed into doing this for you)
  • Even more efficient tuple storage (for implementing fixed-size arrays)
  • A mutable fixed-size buffer type analogous to tuples? (for mutable fixed-size arrays; Clang.jl would probably like this too)
  • Make [a, b] non-concatenating #3737/#2488

I'm guessing the approach for implementing #7799 will also allow one to specify manual inlining (which is where the idea was originally proposed), which is the main bottleneck for #6437. So that's almost a 2-for-1 deal.

Implementation

The first two are essential.

Opportunities
  • negated indexing, #1032
  • easier indexing for AbstractArrays, #10525

ArrayViewsAPL is something I've not yet broadly announced. While the APL in the package's title is in reference to #5949 (for which it can be a test bed to see how we'd like it and how much would break), its real purpose is to exploit stagedfunctions for creating efficient and general view operations. Please see the README for explanations.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 10, 2014

Member

Another fun ingredient in my own personal array nirvana will be https://github.com/timholy/NamedAxesArrays.jl. More joy from stagedfunctions 😄. But I don't propose that for inclusion in base.

Member

timholy commented Aug 10, 2014

Another fun ingredient in my own personal array nirvana will be https://github.com/timholy/NamedAxesArrays.jl. More joy from stagedfunctions 😄. But I don't propose that for inclusion in base.

@timholy timholy added this to the 0.4 milestone Aug 10, 2014

@timholy timholy referenced this issue in JuliaQuantum/JuliaQuantum.github.io Aug 10, 2014

Closed

Topics we can cover at this stage #1

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Aug 10, 2014

Member

@timholy the contiguous rank of StridedView is important, consider:

v1 = view(a,:,1:2:7)
v2 = view(v1,:,2)

With the type system in ArrayViews, it can determine statically that v2 is a contiguous view.

Member

lindahua commented Aug 10, 2014

@timholy the contiguous rank of StridedView is important, consider:

v1 = view(a,:,1:2:7)
v2 = view(v1,:,2)

With the type system in ArrayViews, it can determine statically that v2 is a contiguous view.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 10, 2014

Member

I see, we probably want both then.

Member

timholy commented Aug 10, 2014

I see, we probably want both then.

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Aug 10, 2014

Member

The StridedView part is tightly integrated with ContiguousView. It is possible that lower rank slices of a non-contiguous Strided view is contiguous. The ArrayViews type system preserves this piece of information, so that something above can be determined statically.

Member

lindahua commented Aug 10, 2014

The StridedView part is tightly integrated with ContiguousView. It is possible that lower rank slices of a non-contiguous Strided view is contiguous. The ArrayViews type system preserves this piece of information, so that something above can be determined statically.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 10, 2014

Member

Although how one decides which one wants is an interesting question---I guess we could return your ArrayView types when the parent is an array and no slicing is desired, and the ArrayViewAPL view type otherwise?

Member

timholy commented Aug 10, 2014

Although how one decides which one wants is an interesting question---I guess we could return your ArrayView types when the parent is an array and no slicing is desired, and the ArrayViewAPL view type otherwise?

@Jutho

This comment has been minimized.

Show comment
Hide comment
@Jutho

Jutho Aug 10, 2014

Contributor

Looking forward to 0.4 already...

Contributor

Jutho commented Aug 10, 2014

Looking forward to 0.4 already...

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 10, 2014

Member

Jutho, you've contributed a lot already, and it would be lovely to have your help implementing some of this!

Member

timholy commented Aug 10, 2014

Jutho, you've contributed a lot already, and it would be lovely to have your help implementing some of this!

@jtravs

This comment has been minimized.

Show comment
Hide comment
@jtravs

jtravs Aug 10, 2014

Contributor

@timholy ArrayViewAPL is what I've been dreaming of since discovering Julia! Hopefully sliceview can become the standard indexing semantics. I'll be happy to test and help in any way I can once this lands in master (or is usable without having to jump over lots of branch fences)!

Contributor

jtravs commented Aug 10, 2014

@timholy ArrayViewAPL is what I've been dreaming of since discovering Julia! Hopefully sliceview can become the standard indexing semantics. I'll be happy to test and help in any way I can once this lands in master (or is usable without having to jump over lots of branch fences)!

@Jutho

This comment has been minimized.

Show comment
Hide comment
@Jutho

Jutho Aug 10, 2014

Contributor

@timholy I’d be happy to help when time permits. Let me know if you had anything specific in mind that I can contribute to.

On 10 Aug 2014, at 14:15, Tim Holy notifications@github.com wrote:

Jutho, you've contributed a lot already, and it would be lovely to have your help implementing some of this!


Reply to this email directly or view it on GitHub.

Contributor

Jutho commented Aug 10, 2014

@timholy I’d be happy to help when time permits. Let me know if you had anything specific in mind that I can contribute to.

On 10 Aug 2014, at 14:15, Tim Holy notifications@github.com wrote:

Jutho, you've contributed a lot already, and it would be lovely to have your help implementing some of this!


Reply to this email directly or view it on GitHub.

@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Aug 10, 2014

Member

+1 for named axes arrays

Member

jiahao commented Aug 10, 2014

+1 for named axes arrays

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 10, 2014

Member

@lindahua, I added "Move Colon translation out of the parser" because presumably for ArrayViews that will be a critical step for A[:, i] to work as you're intending.

Member

timholy commented Aug 10, 2014

@lindahua, I added "Move Colon translation out of the parser" because presumably for ArrayViews that will be a critical step for A[:, i] to work as you're intending.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Aug 10, 2014

Member

Great list!

Just to get a broader context, we also want more efficient strings (and possibly BigInts), which also require any-length, immutable homogeneous arrays. Similar enough to tuples to make one think. Instead of mutable fixed-size arrays, we might want to use the "assignable cell" model, where you have a mutable cell that can hold a single value. Then you just store different array values into it at different times. Local mutations of single elements can be optimized.

Member

JeffBezanson commented Aug 10, 2014

Great list!

Just to get a broader context, we also want more efficient strings (and possibly BigInts), which also require any-length, immutable homogeneous arrays. Similar enough to tuples to make one think. Instead of mutable fixed-size arrays, we might want to use the "assignable cell" model, where you have a mutable cell that can hold a single value. Then you just store different array values into it at different times. Local mutations of single elements can be optimized.

@JeffBezanson JeffBezanson modified the milestones: 0.4, 0.4-projects Aug 10, 2014

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Aug 10, 2014

Member

@timholy Thanks for taking the lead on this.

I don't have particular preference as to whether to use ArrayViews or ArrayViewsAPL or a hybrid of both, or something afresh, as long as it is efficient enough for common cases.

I think there are two issues here can be discussed separately.

  1. Semantics (behaviors): we have to specify clearly what certain expressions actually mean. For example:

    • should a[i, :] be a (column) vector or a matrix with one row (when i is a scalar)?
    • should we perform bounds checking for i upon construction of the view?
    • should a[[1,3,7,12]] returns a view or a copy?

    I think we should focus on making decision for such questions (and probably a few others) first.

  2. Once we have a clear specification of the behavior, we can then proceed to work on an efficient implementation, which can be based on either ArrayViews, ArrayViewsAPL, or borrow things from both. The goal would be type stability and efficiency (of compilation, construction, and getindex).

Member

lindahua commented Aug 10, 2014

@timholy Thanks for taking the lead on this.

I don't have particular preference as to whether to use ArrayViews or ArrayViewsAPL or a hybrid of both, or something afresh, as long as it is efficient enough for common cases.

I think there are two issues here can be discussed separately.

  1. Semantics (behaviors): we have to specify clearly what certain expressions actually mean. For example:

    • should a[i, :] be a (column) vector or a matrix with one row (when i is a scalar)?
    • should we perform bounds checking for i upon construction of the view?
    • should a[[1,3,7,12]] returns a view or a copy?

    I think we should focus on making decision for such questions (and probably a few others) first.

  2. Once we have a clear specification of the behavior, we can then proceed to work on an efficient implementation, which can be based on either ArrayViews, ArrayViewsAPL, or borrow things from both. The goal would be type stability and efficiency (of compilation, construction, and getindex).

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Aug 10, 2014

Member

I remember this has been brought up a while ago in some issue. But it would be useful to mention it here. It can be useful to support syntax like:

a[..., i]
# so it means a[:,i] when ndims(a) == 2
# and a[:,:,i] when ndims(a) == 3, etc
Member

lindahua commented Aug 10, 2014

I remember this has been brought up a while ago in some issue. But it would be useful to mention it here. It can be useful to support syntax like:

a[..., i]
# so it means a[:,i] when ndims(a) == 2
# and a[:,:,i] when ndims(a) == 3, etc
@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 10, 2014

Member

Those are good questions. What's funny is that I am not actually sure they all need to be settled first. For example, in ArrayViewsAPL I've implemented both the current sub behavior and the current slice behavior. While they have drastically different implications for user code, the two functions differ by only a couple of lines.

Regarding a[[1,3,7,12]], at least for ArrayViewsAPL, it may just be a question of changing RangeIndex in the type declaration to AbstractVector. Whether we want this is, of course, a good question. I've gone back and forth on that. My personal thinking currently is that we probably want the option for a view of that type, but I wonder if we want that to be accessible only through view(a, [1,3,7,12]).

Bounds-checking upon construction is probably more important to settle early (this is #4044). I now suspect that we may need to have the base types check bounds upon construction. People who want to pull the tricks in #4044 might need to implement an AbstractArray type that doesn't check upon construction but has a getindex that always checks bounds, whether inside @inbounds or not (unless they really like to live dangerously, that is).

Overall, my view is that the real effort here is in the "underlying technologies" section of the list. I'd include the changes to the parser in this. The rest of the core "view" infrastructure (ignoring fixed-size arrays, etc) seems likely to be something one could finish banging out in a couple of days. Writing tests and dealing with the consequences will take rather longer.

Member

timholy commented Aug 10, 2014

Those are good questions. What's funny is that I am not actually sure they all need to be settled first. For example, in ArrayViewsAPL I've implemented both the current sub behavior and the current slice behavior. While they have drastically different implications for user code, the two functions differ by only a couple of lines.

Regarding a[[1,3,7,12]], at least for ArrayViewsAPL, it may just be a question of changing RangeIndex in the type declaration to AbstractVector. Whether we want this is, of course, a good question. I've gone back and forth on that. My personal thinking currently is that we probably want the option for a view of that type, but I wonder if we want that to be accessible only through view(a, [1,3,7,12]).

Bounds-checking upon construction is probably more important to settle early (this is #4044). I now suspect that we may need to have the base types check bounds upon construction. People who want to pull the tricks in #4044 might need to implement an AbstractArray type that doesn't check upon construction but has a getindex that always checks bounds, whether inside @inbounds or not (unless they really like to live dangerously, that is).

Overall, my view is that the real effort here is in the "underlying technologies" section of the list. I'd include the changes to the parser in this. The rest of the core "view" infrastructure (ignoring fixed-size arrays, etc) seems likely to be something one could finish banging out in a couple of days. Writing tests and dealing with the consequences will take rather longer.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Aug 10, 2014

Member

We can also consider whether we want more kinds of indexes, such as #1032. This is important since it may ask for a more general approach to Colon than just making it a special case.

Member

JeffBezanson commented Aug 10, 2014

We can also consider whether we want more kinds of indexes, such as #1032. This is important since it may ask for a more general approach to Colon than just making it a special case.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 10, 2014

Member

Good thought. I think stagedfunctions will make it a lot easier to support a diversity of indexes. Prior to stagedfunctions, we'd have had to worry about what to do if argument 1 was a Colon, argument 2 was a NegatedIndex, and argument 3 was a Vector{Int}, and whether we can handle that type-diversity efficiently in a single function. Now we just write one function:

stagedfunction someindexingfunction(A, indexes...)
    # ... optionally do a whole bunch of "common" stuff at the beginning
    if indexes[i] <: Colon
        mungedindexexprs[i] = :(1:size(A, $i))
    elseif indexes[i] <: NegatedIndex
        mungedindexexprs[i] = :(setdiff(1:size(A, $i), indexes[$i]))
    elseif ...
    end
    # ...more common work if you need to...
    # And here's what actually runs:
    :(A($(mungedindexexprs...))
end

With a single function we can basically handle anything we want to support, and all the tricky work is done at compile time by straightforward Julia code. You couldn't write something more runtime efficient by hand if you tried. It's a total game-changer.

Member

timholy commented Aug 10, 2014

Good thought. I think stagedfunctions will make it a lot easier to support a diversity of indexes. Prior to stagedfunctions, we'd have had to worry about what to do if argument 1 was a Colon, argument 2 was a NegatedIndex, and argument 3 was a Vector{Int}, and whether we can handle that type-diversity efficiently in a single function. Now we just write one function:

stagedfunction someindexingfunction(A, indexes...)
    # ... optionally do a whole bunch of "common" stuff at the beginning
    if indexes[i] <: Colon
        mungedindexexprs[i] = :(1:size(A, $i))
    elseif indexes[i] <: NegatedIndex
        mungedindexexprs[i] = :(setdiff(1:size(A, $i), indexes[$i]))
    elseif ...
    end
    # ...more common work if you need to...
    # And here's what actually runs:
    :(A($(mungedindexexprs...))
end

With a single function we can basically handle anything we want to support, and all the tricky work is done at compile time by straightforward Julia code. You couldn't write something more runtime efficient by hand if you tried. It's a total game-changer.

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Aug 10, 2014

Member

I share your enthusiasm for staged functions, but not for if statements checking types :)

Member

JeffBezanson commented Aug 10, 2014

I share your enthusiasm for staged functions, but not for if statements checking types :)

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 10, 2014

Member

Fair enough. mungedindexexprs[i] = rewriteindex(:A, :indexes, i, indexes[i]).

Member

timholy commented Aug 10, 2014

Fair enough. mungedindexexprs[i] = rewriteindex(:A, :indexes, i, indexes[i]).

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Aug 10, 2014

Member

I will try to rewrite ArrayViews with staged functions and see how much it may simplify codes.

Member

lindahua commented Aug 10, 2014

I will try to rewrite ArrayViews with staged functions and see how much it may simplify codes.

@lindahua

This comment has been minimized.

Show comment
Hide comment
@lindahua

lindahua Aug 10, 2014

Member

Still, critical information like whether views are contiguous should be encoded by the type itself (this is essential for type stability).

Member

lindahua commented Aug 10, 2014

Still, critical information like whether views are contiguous should be encoded by the type itself (this is essential for type stability).

@quinnj

This comment has been minimized.

Show comment
Hide comment
@quinnj

quinnj Aug 11, 2014

Member

Can we add #2488 and #3737 to the list?

Member

quinnj commented Aug 11, 2014

Can we add #2488 and #3737 to the list?

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 11, 2014

Member

I see that "collaborator" tag, so feel free to add them to the bullet list.

Member

timholy commented Aug 11, 2014

I see that "collaborator" tag, so feel free to add them to the bullet list.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 11, 2014

Member

While we're at this array overhaul, can we try to make it so that switching between APL and Matlab style indexing is relatively simple? I.e. Just a matter of surface syntax.

Member

StefanKarpinski commented Aug 11, 2014

While we're at this array overhaul, can we try to make it so that switching between APL and Matlab style indexing is relatively simple? I.e. Just a matter of surface syntax.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Aug 11, 2014

Member

That was my idea with ArrayViewsAPL: just choose

getindex(A, ...) = subview(A, ...)

or

getindex(A, ...) = sliceview(A, ...)

Seemed the safest way given that we don't yet know what we want.

Member

timholy commented Aug 11, 2014

That was my idea with ArrayViewsAPL: just choose

getindex(A, ...) = subview(A, ...)

or

getindex(A, ...) = sliceview(A, ...)

Seemed the safest way given that we don't yet know what we want.

@jakebolewski

This comment has been minimized.

Show comment
Hide comment
@jakebolewski

jakebolewski Aug 11, 2014

Member

Since these enhancements are going to break most array code anyway, what is the status of {} arrays going forward? The syntax seems like an historical vestige. I feel it would be conceptually simpler to have just one array syntax.

Member

jakebolewski commented Aug 11, 2014

Since these enhancements are going to break most array code anyway, what is the status of {} arrays going forward? The syntax seems like an historical vestige. I feel it would be conceptually simpler to have just one array syntax.

@johnmyleswhite

This comment has been minimized.

Show comment
Hide comment
@johnmyleswhite

johnmyleswhite Aug 11, 2014

Member

+1 for ditching the {} syntax for arrays

Member

johnmyleswhite commented Aug 11, 2014

+1 for ditching the {} syntax for arrays

@quinnj

This comment has been minimized.

Show comment
Hide comment
@quinnj

quinnj Aug 11, 2014

Member

+1. I actually think [] should default to Any[]. It seems like you want an Any array much more often than a None array.

Member

quinnj commented Aug 11, 2014

+1. I actually think [] should default to Any[]. It seems like you want an Any array much more often than a None array.

@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Aug 11, 2014

Member

+1
Related: the automatic concatenation of arrays is a real gotcha when trying to construct a list of arrays like [[1, 2, 3], [4, 5, 6]]. Currently using {} in the outermost brackets is pretty much the most convenient way to skirt around the concatenation.

Member

jiahao commented Aug 11, 2014

+1
Related: the automatic concatenation of arrays is a real gotcha when trying to construct a list of arrays like [[1, 2, 3], [4, 5, 6]]. Currently using {} in the outermost brackets is pretty much the most convenient way to skirt around the concatenation.

@jakebolewski

This comment has been minimized.

Show comment
Hide comment
@jakebolewski

jakebolewski Aug 11, 2014

Member

It would be nice to decouple this as then dicts and sets get first class syntax syntax support. Right now dict and array syntax is a bit muddled.

Member

jakebolewski commented Aug 11, 2014

It would be nice to decouple this as then dicts and sets get first class syntax syntax support. Right now dict and array syntax is a bit muddled.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 11, 2014

Member

The only corner case where dict and array syntax overlap is the empty case. We may want to think a bit about using the { } syntax for something else, just because those bracket pairs are so precious.

Member

StefanKarpinski commented Aug 11, 2014

The only corner case where dict and array syntax overlap is the empty case. We may want to think a bit about using the { } syntax for something else, just because those bracket pairs are so precious.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 11, 2014

Member

I agree, however, that we should thoroughly rethink array syntax (concatenation, empty arrays, dicts) and semantics (views, slicing behavior). I'm thinking we should codename 0.4 "phoenix rising" and give it a really dramatic logo.

phoenix-rising

Member

StefanKarpinski commented Aug 11, 2014

I agree, however, that we should thoroughly rethink array syntax (concatenation, empty arrays, dicts) and semantics (views, slicing behavior). I'm thinking we should codename 0.4 "phoenix rising" and give it a really dramatic logo.

phoenix-rising

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Aug 11, 2014

Member

+1 to all of that. The lack of convenient notation for nested arrays (or n-d arrays) is pretty annoying. We could have

  • [[1,2], [3,4]] does not concatenate
  • [a b; c d] still concatenates
  • {a b; c d} still does not concatenate, but constructs better-typed arrays
Member

JeffBezanson commented Aug 11, 2014

+1 to all of that. The lack of convenient notation for nested arrays (or n-d arrays) is pretty annoying. We could have

  • [[1,2], [3,4]] does not concatenate
  • [a b; c d] still concatenates
  • {a b; c d} still does not concatenate, but constructs better-typed arrays
@IainNZ

This comment has been minimized.

Show comment
Hide comment
@IainNZ

IainNZ Aug 11, 2014

Member

That all sounds amazing. Got bit, again, by [[1,2], [3,4]] today

Member

IainNZ commented Aug 11, 2014

That all sounds amazing. Got bit, again, by [[1,2], [3,4]] today

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Aug 11, 2014

Member

What does "better-typed arrays" mean?

Member

StefanKarpinski commented Aug 11, 2014

What does "better-typed arrays" mean?

@JeffBezanson

This comment has been minimized.

Show comment
Hide comment
@JeffBezanson

JeffBezanson Aug 11, 2014

Member

It means with an element type more specific than Any.

Member

JeffBezanson commented Aug 11, 2014

It means with an element type more specific than Any.

@andreasnoack

This comment has been minimized.

Show comment
Hide comment
@andreasnoack

andreasnoack Sep 22, 2014

Member

It has proven to be a bit difficult to go from copying to viewing in base. The problem is that at the time array.jl is loaded in sysimg.jl the necessary functionality for ArrayViewsAPL hasn't been loaded yet. @timholy do you have any updates on this?

Initially, we also tried with ArrayViews, but @jakebolewski and I got stuck because of an error that might be due to a wrong method being called.

Member

andreasnoack commented Sep 22, 2014

It has proven to be a bit difficult to go from copying to viewing in base. The problem is that at the time array.jl is loaded in sysimg.jl the necessary functionality for ArrayViewsAPL hasn't been loaded yet. @timholy do you have any updates on this?

Initially, we also tried with ArrayViews, but @jakebolewski and I got stuck because of an error that might be due to a wrong method being called.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Sep 22, 2014

Member

@andreasnoack, the solution you suggested (provide enough functionality to bootstrap at the beginning, and then add the general cases later) seems like it will solve the load order problem. But overall I'm a little stuck about what to do with #8235: in the absence of stagedfunctions, the tension between providing enough functionality and having reasonable compilation times is pretty difficult to solve. But in #8432 I came up with a way of "faking" stagedfunctions that could presumably be applied to #8235. Would rather see how #8432 fares before I go to the effort of porting that over, though.

In many ways, #8432 is actually the more important development anyway. The magic of ArrayViews is that it makes linear indexing efficient whenever you're snipping out a contiguous block, but it doesn't help you at all when it's not a contiguous block---in fact ArrayViews are typically a little slower than our existing SubArrays. #8235 is likely to be the best among the non-contiguous views, but it won't be great either, and will be far closer to SubArrays/ArrayViews than it will be to Arrays.

In the absence of making div really, really fast, the only general solution is switching to cartesian indexing. So, once #8432 (or something like it) gets merged, one option is that we could just use our existing SubArrays. There's room to improve them, of course, but the pressure to use a new container type will be much reduced.

Member

timholy commented Sep 22, 2014

@andreasnoack, the solution you suggested (provide enough functionality to bootstrap at the beginning, and then add the general cases later) seems like it will solve the load order problem. But overall I'm a little stuck about what to do with #8235: in the absence of stagedfunctions, the tension between providing enough functionality and having reasonable compilation times is pretty difficult to solve. But in #8432 I came up with a way of "faking" stagedfunctions that could presumably be applied to #8235. Would rather see how #8432 fares before I go to the effort of porting that over, though.

In many ways, #8432 is actually the more important development anyway. The magic of ArrayViews is that it makes linear indexing efficient whenever you're snipping out a contiguous block, but it doesn't help you at all when it's not a contiguous block---in fact ArrayViews are typically a little slower than our existing SubArrays. #8235 is likely to be the best among the non-contiguous views, but it won't be great either, and will be far closer to SubArrays/ArrayViews than it will be to Arrays.

In the absence of making div really, really fast, the only general solution is switching to cartesian indexing. So, once #8432 (or something like it) gets merged, one option is that we could just use our existing SubArrays. There's room to improve them, of course, but the pressure to use a new container type will be much reduced.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Sep 22, 2014

Member

This didn't directly address the issue of whether we return views or copies from indexing operations. Basically, though, we need to have great view performance before it's feasible to make that switch. If we do the switch and get great performance for A[:,i] but lousy performance for A[2:end-1, i:j], then I don't think we will be happy with having made that switch.

Member

timholy commented Sep 22, 2014

This didn't directly address the issue of whether we return views or copies from indexing operations. Basically, though, we need to have great view performance before it's feasible to make that switch. If we do the switch and get great performance for A[:,i] but lousy performance for A[2:end-1, i:j], then I don't think we will be happy with having made that switch.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Sep 22, 2014

Member

Without wanting to sound like a broken record, all the hard work here is in the "underlying technologies" section at the top of this post; if those are in place, having a good view container is probably a matter of a couple of days' work. I'm mildly dismayed at how little progress we've made on those underlying technologies---not a single one is checked off yet---although there is reason for hope. @simonster has contributed #8227 (unmerged), which I view as one of the three "must-haves." A second "must have" is implemented in #8432, but that's brand new and so even in the absence of objections it probably needs to sit for at least a week or two before I'd consider hitting merge on it. And stagedfunctions are just hard, so we may need to consider workarounds (and their cost).

Member

timholy commented Sep 22, 2014

Without wanting to sound like a broken record, all the hard work here is in the "underlying technologies" section at the top of this post; if those are in place, having a good view container is probably a matter of a couple of days' work. I'm mildly dismayed at how little progress we've made on those underlying technologies---not a single one is checked off yet---although there is reason for hope. @simonster has contributed #8227 (unmerged), which I view as one of the three "must-haves." A second "must have" is implemented in #8432, but that's brand new and so even in the absence of objections it probably needs to sit for at least a week or two before I'd consider hitting merge on it. And stagedfunctions are just hard, so we may need to consider workarounds (and their cost).

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Feb 1, 2015

Member

I have been following from a distance, but I am wondering if we are any closer now than 3 months ago. Quite a bit has happened with staged functions, sub arrays and cartesian iterators since then, which makes me hopeful that we are closer to replacing array indexing with views. Is it likely we can achieve this in 0.4?

Member

ViralBShah commented Feb 1, 2015

I have been following from a distance, but I am wondering if we are any closer now than 3 months ago. Quite a bit has happened with staged functions, sub arrays and cartesian iterators since then, which makes me hopeful that we are closer to replacing array indexing with views. Is it likely we can achieve this in 0.4?

@andreasnoack

This comment has been minimized.

Show comment
Hide comment
@andreasnoack

andreasnoack Feb 1, 2015

Member

Some and hopefully most of the work should be done in #9150. The main issue is to handle Colon within Julia instead of in the parser. I've talk with @jakebolewski about fixing that.

Member

andreasnoack commented Feb 1, 2015

Some and hopefully most of the work should be done in #9150. The main issue is to handle Colon within Julia instead of in the parser. I've talk with @jakebolewski about fixing that.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 1, 2015

Member

In a sense this is already done, other than syntax (meaning A[:,j] gets replaced by slice(A, :, j)).

But it's still a little scary given "deep" bugs in stagedfunctions, #8504 and #8553#8853. If those had been fixed, we surely would have had this functionality months ago.

Member

timholy commented Feb 1, 2015

In a sense this is already done, other than syntax (meaning A[:,j] gets replaced by slice(A, :, j)).

But it's still a little scary given "deep" bugs in stagedfunctions, #8504 and #8553#8853. If those had been fixed, we surely would have had this functionality months ago.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 1, 2015

Member

Oh, and I forgot perhaps even the most important one: it would be insane to turn A[:,j] into slice(A, :, j) until #8227 (or alternate implementation) gets merged.

EDIT: Demo:

julia> A = reshape(1:15, 3, 5)
3x5 Array{Int64,2}:
 1  4  7  10  13
 2  5  8  11  14
 3  6  9  12  15

julia> b = slice(A, 1:2, 1)
2-element SubArray{Int64,1,Array{Int64,2},(UnitRange{Int64},Int64),2}:
 1
 2

julia> b[3]
3

julia> b = sub(A, 1, :)
1x5 SubArray{Int64,2,Array{Int64,2},(UnitRange{Int64},Colon),2}:
 1  4  7  10  13

julia> b[2,2]
5
Member

timholy commented Feb 1, 2015

Oh, and I forgot perhaps even the most important one: it would be insane to turn A[:,j] into slice(A, :, j) until #8227 (or alternate implementation) gets merged.

EDIT: Demo:

julia> A = reshape(1:15, 3, 5)
3x5 Array{Int64,2}:
 1  4  7  10  13
 2  5  8  11  14
 3  6  9  12  15

julia> b = slice(A, 1:2, 1)
2-element SubArray{Int64,1,Array{Int64,2},(UnitRange{Int64},Int64),2}:
 1
 2

julia> b[3]
3

julia> b = sub(A, 1, :)
1x5 SubArray{Int64,2,Array{Int64,2},(UnitRange{Int64},Colon),2}:
 1  4  7  10  13

julia> b[2,2]
5
@jiahao

This comment has been minimized.

Show comment
Hide comment
@jiahao

jiahao Feb 1, 2015

Member

@timholy I think you meant some other issue besides #8553

Member

jiahao commented Feb 1, 2015

@timholy I think you meant some other issue besides #8553

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Feb 1, 2015

Member

Indeed I did, thanks for catching. (Corrected above.)

Member

timholy commented Feb 1, 2015

Indeed I did, thanks for catching. (Corrected above.)

@vtjnash vtjnash modified the milestones: 0.5, 0.4 Mar 7, 2015

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Apr 23, 2015

Member

It looks like much of this will make its way into 0.4. Should we refactor this issue into the bits that we want in 0.4, and a separate issue for what will be in 0.5?

Member

ViralBShah commented Apr 23, 2015

It looks like much of this will make its way into 0.4. Should we refactor this issue into the bits that we want in 0.4, and a separate issue for what will be in 0.5?

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Apr 23, 2015

Member

Indeed, the large majority has been in 0.4 for some time---that's what all the checkmarks at the top are for 😄.

If you want to split out separate issues for the remaining items, I'm fine with that, but I don't really see the point.

Member

timholy commented Apr 23, 2015

Indeed, the large majority has been in 0.4 for some time---that's what all the checkmarks at the top are for 😄.

If you want to split out separate issues for the remaining items, I'm fine with that, but I don't really see the point.

@mschauer

This comment has been minimized.

Show comment
Hide comment
@mschauer

mschauer Apr 23, 2015

Contributor

@lindahua Regarding your remark in the beginning, the issue on ... was #5405

Contributor

mschauer commented Apr 23, 2015

@lindahua Regarding your remark in the beginning, the issue on ... was #5405

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Apr 23, 2015

Member

#10525 was long overdue for a spot on that list, so I added it.

Member

timholy commented Apr 23, 2015

#10525 was long overdue for a spot on that list, so I added it.

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Apr 23, 2015

Member

I was just initially puzzled at seeing the 0.5 milestone. We can leave it as it is.

Member

ViralBShah commented Apr 23, 2015

I was just initially puzzled at seeing the 0.5 milestone. We can leave it as it is.

@ViralBShah

This comment has been minimized.

Show comment
Hide comment
@ViralBShah

ViralBShah Apr 23, 2015

Member

The question I really had was - from a 0.4 perspective, which of the unchecked ones we want to get done, or are we already there?

Member

ViralBShah commented Apr 23, 2015

The question I really had was - from a 0.4 perspective, which of the unchecked ones we want to get done, or are we already there?

@blakejohnson

This comment has been minimized.

Show comment
Hide comment
@blakejohnson

blakejohnson Apr 23, 2015

Contributor

I think we should still aim for extensible bounds checking removal, #7799, in 0.4.

Contributor

blakejohnson commented Apr 23, 2015

I think we should still aim for extensible bounds checking removal, #7799, in 0.4.

@timholy

This comment has been minimized.

Show comment
Hide comment
@timholy

timholy Apr 23, 2015

Member

I was puzzled about the 0.5 milestone too. I wondered if you had changed it (I didn't). That said, I think we won't get most of the remaining items by 0.4, so bumping the milestone seems reasonable.

Realistically, I expect us to check only one more box: I think we will get "sweet custom array types" (aka #10525, hopefully supported by #10911 if I can get my act together in time). This is a really big step---one that I did not anticipate at the time I opened this issue---and it will be amazing to have. That would also open the way for fixing reshape (#10507), but I'd be a little reluctant to add that kind of functionality at a late date. I just tagged that PR as 0.5 to clarify my current thoughts about it.

Re "return views from indexing": I think the big show-stopper is #7799, since our views don't check bounds (in my opinion, we'd have a catastrophic loss of safety if we switched to returning views before addressing that issue). It's also worth reminding folks that @carlobaldassi has raised a number of concerns specifically about BitArrays that, to my knowledge, we haven't really addressed (see posts higher up in this issue). However, if we can satisfy ourselves about these two issues, then there's a lot that can happen relatively quickly (#10331, and returning views from indexing).

Member

timholy commented Apr 23, 2015

I was puzzled about the 0.5 milestone too. I wondered if you had changed it (I didn't). That said, I think we won't get most of the remaining items by 0.4, so bumping the milestone seems reasonable.

Realistically, I expect us to check only one more box: I think we will get "sweet custom array types" (aka #10525, hopefully supported by #10911 if I can get my act together in time). This is a really big step---one that I did not anticipate at the time I opened this issue---and it will be amazing to have. That would also open the way for fixing reshape (#10507), but I'd be a little reluctant to add that kind of functionality at a late date. I just tagged that PR as 0.5 to clarify my current thoughts about it.

Re "return views from indexing": I think the big show-stopper is #7799, since our views don't check bounds (in my opinion, we'd have a catastrophic loss of safety if we switched to returning views before addressing that issue). It's also worth reminding folks that @carlobaldassi has raised a number of concerns specifically about BitArrays that, to my knowledge, we haven't really addressed (see posts higher up in this issue). However, if we can satisfy ourselves about these two issues, then there's a lot that can happen relatively quickly (#10331, and returning views from indexing).

@pao

This comment has been minimized.

Show comment
Hide comment
@pao

pao Apr 23, 2015

Member

I wondered if you had changed it

In case you hadn't seen it, this is in the issue's log, squeezed between comments--it was changed on March 7 by @vtjnash.

Member

pao commented Apr 23, 2015

I wondered if you had changed it

In case you hadn't seen it, this is in the issue's log, squeezed between comments--it was changed on March 7 by @vtjnash.

@StefanKarpinski

This comment has been minimized.

Show comment
Hide comment
@StefanKarpinski

StefanKarpinski Apr 23, 2015

Member

@vtjnash went around and randomized all the milestones :-|

Member

StefanKarpinski commented Apr 23, 2015

@vtjnash went around and randomized all the milestones :-|

@mbauman

This comment has been minimized.

Show comment
Hide comment
@mbauman

mbauman Apr 23, 2015

Member

I think the big show-stopper is #7799, since our views don't check bounds (in my opinion, we'd have a catastrophic loss of safety if we switched to returning views before addressing that issue).

I'm looking at this right now in the context of #10525. I think I may be able to rectify this since it puts unsafe_getindex on firmer ground. It sure would be a lot nicer with #7799, though.

(SubArray almost checks bounds right now, though. It does so upon construction, and then it indexes into the parent array with bounds checks on. The trouble is that in a multidimensional indexing expression, one of the indices may be out of bounds and it may return spurious data. Edit: Ah, I see now. It doesn't check bounds in resolving its indices, either. Same solution applies, though.)

Member

mbauman commented Apr 23, 2015

I think the big show-stopper is #7799, since our views don't check bounds (in my opinion, we'd have a catastrophic loss of safety if we switched to returning views before addressing that issue).

I'm looking at this right now in the context of #10525. I think I may be able to rectify this since it puts unsafe_getindex on firmer ground. It sure would be a lot nicer with #7799, though.

(SubArray almost checks bounds right now, though. It does so upon construction, and then it indexes into the parent array with bounds checks on. The trouble is that in a multidimensional indexing expression, one of the indices may be out of bounds and it may return spurious data. Edit: Ah, I see now. It doesn't check bounds in resolving its indices, either. Same solution applies, though.)

@carlobaldassi

This comment has been minimized.

Show comment
Hide comment
@carlobaldassi

carlobaldassi Apr 26, 2015

Member

a number of concerns specifically about BitArrays that, to my knowledge, we haven't really addressed (see posts higher up in this issue).

To get back on that, my current thinking is that a lot of methods should be specialized to work on BitArray Views efficiently. It's going to take time -- and to increase test/bitarray.jl execution time considerably -- but in the end I think it's also going to improve performance in most common cases by allowing to avoid temporaries (e.g. sum(B[a:b]) and similar).

BTW I'm assuming that in most relevant cases views of views can be flattened, otherwise this is probably not going to work.

The only issue which I don't think we can really solve is treating efficiently views with generic, non contiguous indexing. However, this is only going to make a real difference when indexing repeatedly into the view, and maybe for nested views.

Member

carlobaldassi commented Apr 26, 2015

a number of concerns specifically about BitArrays that, to my knowledge, we haven't really addressed (see posts higher up in this issue).

To get back on that, my current thinking is that a lot of methods should be specialized to work on BitArray Views efficiently. It's going to take time -- and to increase test/bitarray.jl execution time considerably -- but in the end I think it's also going to improve performance in most common cases by allowing to avoid temporaries (e.g. sum(B[a:b]) and similar).

BTW I'm assuming that in most relevant cases views of views can be flattened, otherwise this is probably not going to work.

The only issue which I don't think we can really solve is treating efficiently views with generic, non contiguous indexing. However, this is only going to make a real difference when indexing repeatedly into the view, and maybe for nested views.

@prcastro

This comment has been minimized.

Show comment
Hide comment
@prcastro

prcastro Jun 4, 2015

Contributor

I believe the last two items could be closed

Contributor

prcastro commented Jun 4, 2015

I believe the last two items could be closed

@mbauman mbauman referenced this issue Sep 15, 2015

Closed

Arraypocalypse Now and Then #13157

21 of 27 tasks complete
@mbauman

This comment has been minimized.

Show comment
Hide comment
@mbauman

mbauman Sep 15, 2015

Member

Superseded by #13157. If I missed anything from this issue, please add it there.

Member

mbauman commented Sep 15, 2015

Superseded by #13157. If I missed anything from this issue, please add it there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment