SubArray bounds checking (or not): bug or feature? #4044

Closed
timholy opened this Issue Aug 13, 2013 · 23 comments

Comments

Projects
None yet
6 participants
@timholy
Member

timholy commented Aug 13, 2013

SubArrays currently do not check bounds upon construction:

julia> A = randn(4,8)
4x8 Float64 Array:
 -0.661697  0.150336   -0.979774   0.350783   1.97253   -0.255409   -0.464847   -0.785631
 -0.389022  0.0357723  -2.96341    0.285687   0.446166  -0.0916746  -0.64864     0.476115
 -0.972825  0.037842    0.456445   0.752897  -0.733183   0.320432   -1.00636     2.2454  
  0.964869  0.467261   -0.564566  -2.38089    1.70834    0.0517955  -0.0813409   0.302723

julia> As = sub(A, 1:3, 0:3)
3x4 SubArray of 4x8 Float64 Array:
 #undef  -0.661697  0.150336   -0.979774
 #undef  -0.389022  0.0357723  -2.96341 
 #undef  -0.972825  0.037842    0.456445

julia> As[1,1]
ERROR: BoundsError()
 in getindex at subarray.jl:172

julia> As[1,2]
-0.6616974979624464

This lack of bounds-checking turns out to be incredibly useful for certain applications, and I'm starting to write more and more code that takes advantage of it. However, I also recognize that this may be unintended behavior.

Hence, I'd like to request that we make a decision: is this a bug or a feature? I see three paths:

  • It's a feature; keep it as it is. There are a couple of lines of code that will need fixing. Specifically, in a few places A.indexes[i][j] needs to become first(A.indexes[i]) + (j-1)*step(A.indexes[i]), which is mathematically equivalent except for bounds-checking.
  • It should be an optional feature (introduce an extra parameter into the type definition)
  • It's a bug. We will check bounds on construction, but then we also need an AbstractStridedArray type from which "private" variants can descend.

Left to my own devices I would choose the first. But I'll be happy to implement whichever of these achieves some reasonable facsimile of consensus.

@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Aug 13, 2013

Member

I should also clarify that the issue of bounds-checking upon construction is separate from the issue of bounds-checking on getindex and setindex!. I think we definitely need bounds-checking on the latter, and currently that's not implemented either. Using the same array as in my demo above:

julia> As[4,2]
0.9648692808934075

That does not depend on the fact that As was created with out-of-bounds indexes:

julia> As = sub(A, 1:3, 1:3)
3x3 SubArray of 4x8 Float64 Array:
 -0.661697  0.150336   -0.979774
 -0.389022  0.0357723  -2.96341 
 -0.972825  0.037842    0.456445

julia> As[4,2]
0.4672608417658797

In other words, getindex currently behaves exactly as I implemented for get in #3814. While I still think that's the correct behavior for get, it's definitely not the right behavior for getindex.

So, to summarize: my recommendation is for no bounds-checking on construction, but have bounds-checking on getindex/setindex!. Whatever we choose to do, there's stuff that needs fixing.

Member

timholy commented Aug 13, 2013

I should also clarify that the issue of bounds-checking upon construction is separate from the issue of bounds-checking on getindex and setindex!. I think we definitely need bounds-checking on the latter, and currently that's not implemented either. Using the same array as in my demo above:

julia> As[4,2]
0.9648692808934075

That does not depend on the fact that As was created with out-of-bounds indexes:

julia> As = sub(A, 1:3, 1:3)
3x3 SubArray of 4x8 Float64 Array:
 -0.661697  0.150336   -0.979774
 -0.389022  0.0357723  -2.96341 
 -0.972825  0.037842    0.456445

julia> As[4,2]
0.4672608417658797

In other words, getindex currently behaves exactly as I implemented for get in #3814. While I still think that's the correct behavior for get, it's definitely not the right behavior for getindex.

So, to summarize: my recommendation is for no bounds-checking on construction, but have bounds-checking on getindex/setindex!. Whatever we choose to do, there's stuff that needs fixing.

@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Aug 13, 2013

Member

Ah, I've realized that we can't currently add getindex/setindex! bounds-checking to SubArray without a major performance penalty.
#3503 (comment)

Member

timholy commented Aug 13, 2013

Ah, I've realized that we can't currently add getindex/setindex! bounds-checking to SubArray without a major performance penalty.
#3503 (comment)

@JeffBezanson

This comment has been minimized.

Show comment Hide comment
@JeffBezanson

JeffBezanson Aug 13, 2013

Member

The idea of SubArrays that overlap regions outside an array is kind of neat.

Member

JeffBezanson commented Aug 13, 2013

The idea of SubArrays that overlap regions outside an array is kind of neat.

@stevengj

This comment has been minimized.

Show comment Hide comment
@stevengj

stevengj Aug 13, 2013

Member

I see this as a feature, not a bug. It is still "safe" in the sense of throwing an exception of the bounds of the original array are exceeded.

Member

stevengj commented Aug 13, 2013

I see this as a feature, not a bug. It is still "safe" in the sense of throwing an exception of the bounds of the original array are exceeded.

@kmsquire

This comment has been minimized.

Show comment Hide comment
@kmsquire

kmsquire Aug 13, 2013

Member

Can you give an example of where this behavior is useful?

Member

kmsquire commented Aug 13, 2013

Can you give an example of where this behavior is useful?

@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Aug 13, 2013

Member

Here's an example from what I'm doing right now (this isn't quite accurate, but explaining the precise analysis would take a lot of time...): I have some movies, and I want to track movement of objects during the course of the movie. The movie is a big array, and I track an object by drawing a box around it (a SubArray). What happens when the object moves near the edge of the frame, so that my centered box would stick out beyond the edge? If the SubArray barfs on construction, then I find that I needed a fair bit more code to handle special cases. If it doesn't barf on construction, then I'm finding that it reduces the number of places in the code that I have to explicitly think about boundaries.

So, there's a real-world application showing where it's useful. That said, a person could rationally argue that such facilities don't belong in SubArray. If that's how this discussion works out, I can easily go off in my own direction with my own custom types (it may happen anyway, of course). There are pluses and minuses, for both me and Julia, either way.

Member

timholy commented Aug 13, 2013

Here's an example from what I'm doing right now (this isn't quite accurate, but explaining the precise analysis would take a lot of time...): I have some movies, and I want to track movement of objects during the course of the movie. The movie is a big array, and I track an object by drawing a box around it (a SubArray). What happens when the object moves near the edge of the frame, so that my centered box would stick out beyond the edge? If the SubArray barfs on construction, then I find that I needed a fair bit more code to handle special cases. If it doesn't barf on construction, then I'm finding that it reduces the number of places in the code that I have to explicitly think about boundaries.

So, there's a real-world application showing where it's useful. That said, a person could rationally argue that such facilities don't belong in SubArray. If that's how this discussion works out, I can easily go off in my own direction with my own custom types (it may happen anyway, of course). There are pluses and minuses, for both me and Julia, either way.

@kmsquire

This comment has been minimized.

Show comment Hide comment
@kmsquire

kmsquire Aug 13, 2013

Member

Thanks for the example. How would/do you handle the #undefs in the sub-array? Display on screen seems to work fine, but any access to them throws a BoundsError.

Member

kmsquire commented Aug 13, 2013

Thanks for the example. How would/do you handle the #undefs in the sub-array? Display on screen seems to work fine, but any access to them throws a BoundsError.

@stevengj

This comment has been minimized.

Show comment Hide comment
@stevengj

stevengj Aug 13, 2013

Member

Yes, in general this is useful for defining a SubArray for an interior region that is updated, and the surrounding array elements to be "ghost cells" or "halo cells", which is a common technique to implement boundary conditions in PDEs.

Member

stevengj commented Aug 13, 2013

Yes, in general this is useful for defining a SubArray for an interior region that is updated, and the surrounding array elements to be "ghost cells" or "halo cells", which is a common technique to implement boundary conditions in PDEs.

@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Aug 14, 2013

Member

Yep, our get command for arrays does a version of this.

The other nice property about not being picky at construction time is that shifts become commutative. Here's a 1d example:

A = rand(15)
rng = 1:4
As = sub(A, rng)

Now:

As = sub(As, rng+1)
As = sub(As, rng-1)

yields no shift, as does

As = sub(As, rng-1)
As = sub(As, rng+1)

However, this second one would give an error if we were picky at construction time.

Member

timholy commented Aug 14, 2013

Yep, our get command for arrays does a version of this.

The other nice property about not being picky at construction time is that shifts become commutative. Here's a 1d example:

A = rand(15)
rng = 1:4
As = sub(A, rng)

Now:

As = sub(As, rng+1)
As = sub(As, rng-1)

yields no shift, as does

As = sub(As, rng-1)
As = sub(As, rng+1)

However, this second one would give an error if we were picky at construction time.

@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Aug 14, 2013

Member

OK, it sounds like this is a feature. I'll add a brief note pointing to this issue in the subarray.jl code, and fix the sub(A::SubArray, ...) constructor to be consistent with this behavior.

Member

timholy commented Aug 14, 2013

OK, it sounds like this is a feature. I'll add a brief note pointing to this issue in the subarray.jl code, and fix the sub(A::SubArray, ...) constructor to be consistent with this behavior.

@timholy timholy closed this Aug 14, 2013

@ViralBShah

This comment has been minimized.

Show comment Hide comment
@ViralBShah

ViralBShah Aug 14, 2013

Member

I wonder if the code in examples/plife.jl can benefit from this. I do think that we should make this an intentional and well-defined feature, for the same reasons as already discussed.

Member

ViralBShah commented Aug 14, 2013

I wonder if the code in examples/plife.jl can benefit from this. I do think that we should make this an intentional and well-defined feature, for the same reasons as already discussed.

timholy added a commit that referenced this issue Aug 14, 2013

@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Aug 14, 2013

Member

I do think that we should make this an intentional and well-defined feature, for the same reasons as already discussed.

It's now intentional and well-defined in the sense that I've set up tests that will fail if this behavior changes.

Member

timholy commented Aug 14, 2013

I do think that we should make this an intentional and well-defined feature, for the same reasons as already discussed.

It's now intentional and well-defined in the sense that I've set up tests that will fail if this behavior changes.

@JeffBezanson

This comment has been minimized.

Show comment Hide comment
@JeffBezanson

JeffBezanson Feb 24, 2014

Member

I believe this feature means that a loop from 1:length(subarray) where @inbounds is used is incorrect. That could be a serious problem. We might have to put @boundscheck true in every SubArray indexing method if we want to keep this behavior.

Member

JeffBezanson commented Feb 24, 2014

I believe this feature means that a loop from 1:length(subarray) where @inbounds is used is incorrect. That could be a serious problem. We might have to put @boundscheck true in every SubArray indexing method if we want to keep this behavior.

@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Feb 24, 2014

Member

I panicked about this a month or two ago, and then convinced myself that it isn't a problem (but that there's another, much smaller, problem). As far as I can tell---and I could easily be wrong---@inbounds doesn't cross layers. That is, @inbounds S[i,j] will avoid the check on whether i and j lie within 1:size(S,1), 1:size(S,2), but does not turn off bounds checking on S.parent[S.indexes[1][i], S.indexes[2][j]].

One might complain about not being able to turn off bounds checking "all the way down" (and I have practical cases of abstraction nesting 4 deep, e.g., a SubArray of an Image of an InterpolatingGrid of an Array), but I suspect it is the lesser of two evils.

Or, is layer-crossing of @inbounds something that will become a concern with better inlining?

Member

timholy commented Feb 24, 2014

I panicked about this a month or two ago, and then convinced myself that it isn't a problem (but that there's another, much smaller, problem). As far as I can tell---and I could easily be wrong---@inbounds doesn't cross layers. That is, @inbounds S[i,j] will avoid the check on whether i and j lie within 1:size(S,1), 1:size(S,2), but does not turn off bounds checking on S.parent[S.indexes[1][i], S.indexes[2][j]].

One might complain about not being able to turn off bounds checking "all the way down" (and I have practical cases of abstraction nesting 4 deep, e.g., a SubArray of an Image of an InterpolatingGrid of an Array), but I suspect it is the lesser of two evils.

Or, is layer-crossing of @inbounds something that will become a concern with better inlining?

@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Feb 24, 2014

Member

(My example assumed that SubArrays index their parent in a cartesian manner, which they don't currently. But someday they will.)

Member

timholy commented Feb 24, 2014

(My example assumed that SubArrays index their parent in a cartesian manner, which they don't currently. But someday they will.)

@JeffBezanson

This comment has been minimized.

Show comment Hide comment
@JeffBezanson

JeffBezanson Feb 24, 2014

Member

Incorrect --- there is no check for i and j within 1:size(S,1) and 1:size(S,2), because we didn't write one and the compiler only checks bounds for primitives. @inbounds is sloppy, and applies to every inlined array access within. It's really only safe when applied very carefully.

The bug here is real:

julia> A = [1:3];

julia> s = sub(A,1:4)
4-element SubArray{Int64,1,Array{Int64,1},(Range1{Int64},)}:
   1   
   2   
   3   
 #undef

julia> foo(s) = @inbounds s[4]

julia> foo(s)

# no error
Member

JeffBezanson commented Feb 24, 2014

Incorrect --- there is no check for i and j within 1:size(S,1) and 1:size(S,2), because we didn't write one and the compiler only checks bounds for primitives. @inbounds is sloppy, and applies to every inlined array access within. It's really only safe when applied very carefully.

The bug here is real:

julia> A = [1:3];

julia> s = sub(A,1:4)
4-element SubArray{Int64,1,Array{Int64,1},(Range1{Int64},)}:
   1   
   2   
   3   
 #undef

julia> foo(s) = @inbounds s[4]

julia> foo(s)

# no error
@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Feb 24, 2014

Member

because we didn't write one and the compiler only checks bounds for primitives

Yes, I knew that, I was imagining you brought this up because you were in the middle of trying to fix that problem 😄. Now I see you were visiting @lindahua's ArrayView PR.

But I somehow misdiagnosed the sloppiness problem---I am almost sure I had explicitly tested this very scenario, though perhaps I failed to wrap it inside a function. So yes, this is a big problem.

The issue, as you say, is that there isn't a good way to do this unless the type encodes whether it's safe to be sloppy. Given how @inbounds works, a type that does not guarantee in-bounds at construction time basically has to turn off inlining on getindex and setindex!,

I see two ways forward:

  • (easy) Give this feature up for Julia's default array view type(s), and do bounds-checking upon construction. Anyone who needs this has to roll their own, and insert manual bounds-checking. (We desperately need #3796.)
  • (hard) Let types declare whether it's safe for @inbounds to propagate across inlining steps.

The easy one seems inevitable, and fairly urgent. Sigh.

Member

timholy commented Feb 24, 2014

because we didn't write one and the compiler only checks bounds for primitives

Yes, I knew that, I was imagining you brought this up because you were in the middle of trying to fix that problem 😄. Now I see you were visiting @lindahua's ArrayView PR.

But I somehow misdiagnosed the sloppiness problem---I am almost sure I had explicitly tested this very scenario, though perhaps I failed to wrap it inside a function. So yes, this is a big problem.

The issue, as you say, is that there isn't a good way to do this unless the type encodes whether it's safe to be sloppy. Given how @inbounds works, a type that does not guarantee in-bounds at construction time basically has to turn off inlining on getindex and setindex!,

I see two ways forward:

  • (easy) Give this feature up for Julia's default array view type(s), and do bounds-checking upon construction. Anyone who needs this has to roll their own, and insert manual bounds-checking. (We desperately need #3796.)
  • (hard) Let types declare whether it's safe for @inbounds to propagate across inlining steps.

The easy one seems inevitable, and fairly urgent. Sigh.

@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Feb 25, 2014

Member

Just had to test it:

julia> @inbounds s[4]
ERROR: BoundsError()
 in getindex at subarray.jl:193

No wonder I thought this wasn't a problem. But of course your analysis is the correct one.

Member

timholy commented Feb 25, 2014

Just had to test it:

julia> @inbounds s[4]
ERROR: BoundsError()
 in getindex at subarray.jl:193

No wonder I thought this wasn't a problem. But of course your analysis is the correct one.

@timholy timholy reopened this Feb 26, 2014

@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Feb 26, 2014

Member

I don't fully understand how codegen/cgutils bounds-checking works. Is there any way to have it depend on type? I'm wondering if we can expand the array hierarchy, so that somewhere (under DenseArray) we have a PrecheckedArray and an MustCheckArray (those are stupid names, but hopefully you get the gist). Bounds-checking could not be disabled for accesses in an MustCheckArray. We could change SubArray so that it does bounds-checking upon construction, and move it into the PrecheckedArray type. Users who want the functionality discussed above would derive their type from MustCheckArray.

The alternative, I presume, is to write bounds-checking into getindex for the type. That's how it currently works for Ranges. There are well-known negatives (currently) to that approach, but certainly it's better than the alternative of having unsafe types lying around. And long-term will it be as good as the intrinsics?

Member

timholy commented Feb 26, 2014

I don't fully understand how codegen/cgutils bounds-checking works. Is there any way to have it depend on type? I'm wondering if we can expand the array hierarchy, so that somewhere (under DenseArray) we have a PrecheckedArray and an MustCheckArray (those are stupid names, but hopefully you get the gist). Bounds-checking could not be disabled for accesses in an MustCheckArray. We could change SubArray so that it does bounds-checking upon construction, and move it into the PrecheckedArray type. Users who want the functionality discussed above would derive their type from MustCheckArray.

The alternative, I presume, is to write bounds-checking into getindex for the type. That's how it currently works for Ranges. There are well-known negatives (currently) to that approach, but certainly it's better than the alternative of having unsafe types lying around. And long-term will it be as good as the intrinsics?

@lindahua

This comment has been minimized.

Show comment Hide comment
@lindahua

lindahua Jun 4, 2014

Member

Is this still 0.3 blocking? Maybe, we should tackle this together with the array view stuff in the 0.4 cycle?

Member

lindahua commented Jun 4, 2014

Is this still 0.3 blocking? Maybe, we should tackle this together with the array view stuff in the 0.4 cycle?

@stevengj

This comment has been minimized.

Show comment Hide comment
@stevengj

stevengj Jun 4, 2014

Member

I don't think it should block 0.3.

Member

stevengj commented Jun 4, 2014

I don't think it should block 0.3.

@JeffBezanson

This comment has been minimized.

Show comment Hide comment
@JeffBezanson

JeffBezanson Jun 4, 2014

Member

Ok, let's leave this for the array view change.

Member

JeffBezanson commented Jun 4, 2014

Ok, let's leave this for the array view change.

@JeffBezanson JeffBezanson modified the milestones: 0.4, 0.3 Jun 4, 2014

timholy added a commit that referenced this issue Jan 4, 2015

Be consistent about not bounds-checking indexes upon SubArray constru…
…ction

This may change someday (#4044), but for now it's best to not check.
@timholy

This comment has been minimized.

Show comment Hide comment
@timholy

timholy Feb 1, 2015

Member

BTW, here's my personal plan for fixing this in a way that lets us have our cake and eat it too:

  • Add yet another parameter to SubArray, call it BoundsChecked, with values true and false. This is to indicate whether bounds were checked upon construction. One could have more complicated settings for controlling bounds-checking at both construction and usage, but once #8227 gets resolved I think we can do the latter via @inbounds.
  • Give users a way to control that setting upon construction, likely through Val.
  • Make it so that @inbounds is a no-op on a SubArray with BoundsChecked==false (meaning, if you don't check upon construction, you can't disable upon usage).

I think this plan will let us still do the fun things at the top of this issue, would furthermore enable fun stuff like this:

a = rand(5)
b = sub(a, 2:4)  # provides "padding"
c = similar(b)
for i = 1:length(b)
    @inbounds c[i] = b[i-1] - 2*b[i] + b[i+1]
end

and yet resolve the concern that @JeffBezanson raised.

But all this awaits #8227.

Member

timholy commented Feb 1, 2015

BTW, here's my personal plan for fixing this in a way that lets us have our cake and eat it too:

  • Add yet another parameter to SubArray, call it BoundsChecked, with values true and false. This is to indicate whether bounds were checked upon construction. One could have more complicated settings for controlling bounds-checking at both construction and usage, but once #8227 gets resolved I think we can do the latter via @inbounds.
  • Give users a way to control that setting upon construction, likely through Val.
  • Make it so that @inbounds is a no-op on a SubArray with BoundsChecked==false (meaning, if you don't check upon construction, you can't disable upon usage).

I think this plan will let us still do the fun things at the top of this issue, would furthermore enable fun stuff like this:

a = rand(5)
b = sub(a, 2:4)  # provides "padding"
c = similar(b)
for i = 1:length(b)
    @inbounds c[i] = b[i-1] - 2*b[i] + b[i+1]
end

and yet resolve the concern that @JeffBezanson raised.

But all this awaits #8227.

timholy added a commit that referenced this issue Feb 23, 2015

Check bounds upon construction of SubArrays. Fixes #4044, fixes #9924
In places where one wants to live dangerously, one can call
Base.sub_unchecked/Base.slice_unchecked. So in a sense this gives the
benefits of #4044.

@timholy timholy closed this in dcf3ec8 Feb 24, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment