Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for 0-indexed and arbitrary-indexed arrays #16260

Merged
merged 44 commits into from
Jun 14, 2016
Merged

Support for 0-indexed and arbitrary-indexed arrays #16260

merged 44 commits into from
Jun 14, 2016

Conversation

timholy
Copy link
Member

@timholy timholy commented May 8, 2016

This is the one part of ArrayIteration.jl that I think it's reasonable to move to Base for the julia-0.5 release. The core change is the addition of the indsindices function, which should be thought of as a sister function to size but instead returns a tuple containing the in-bounds indexes of an array. The fallback is

indices(A::AbstractArray, d) = 1:size(A,d)
indices{T,N}(A::AbstractArray{T,N}) = ntuple(d->inds(A, d), Val{N})

But this allows one to override indices for specific types:

A = OffsetArray([1 3; 2 4], (-1,2))
@test A[0,3] == 1
@test A[1,3] == 2
@test A[0,4] == 3
@test A[1,4] == 4
julia> A[1,1]
ERROR: BoundsError: attempt to access OAs.OffsetArray{Int64,2,Array{Int64,2}} with inds (0:1,3:4):
 #undef  #undef
 #undef  #undef
  at index [1,1]
 in throw_boundserror(::OAs.OffsetArray{Int64,2,Array{Int64,2}}, ::Tuple{Int64,Int64}) at ./abstractarray.jl:134
 [inlined code] from ./abstractarray.jl:103
 in getindex(::OAs.OffsetArray{Int64,2,Array{Int64,2}}, ::Int64, ::Int64) at /home/tim/src/julia-0.5/test/abstractarray.jl:542
 in eval(::Module, ::Any) at ./boot.jl:230

This adds the definition and uses it in checkbounds. The bigger job will be wiring it into the rest of Base. So I thought I'd post at this stage and see what folks think.

@carnaval
Copy link
Contributor

carnaval commented May 8, 2016

👎 we have to leave something for the hackernews threads to talk about :-)

@tkelman tkelman added the needs docs Documentation for this change is required label May 8, 2016
checkbounds(::Type{Bool}, sz::Integer, i::Real) = 1 <= i <= sz
checkbounds(::Type{Bool}, sz::Integer, ::Colon) = true
function checkbounds(::Type{Bool}, sz::Integer, r::Range)
checkbounds(::Type{Bool}, inds::UnitRange, i) = throw(ArgumentError("unable to check bounds for indices of type $(typeof(i))"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of having an argument with the same name as a closely-related function

@kmsquire
Copy link
Member

kmsquire commented May 8, 2016

👍 from me for the idea.

I'm impressed that it takes so little code to implement this!

Have you measured the performance impact?

@timholy
Copy link
Member Author

timholy commented May 8, 2016

we have to leave something for the hackernews threads to talk about :-)

Yeah, that was my biggest concern too 😄.

More seriously, are you genuinely down on this? There are cases where it's nicer to write an algorithm where indexing is not based on 1. For example, if I compute a quantity as a function of displacement, it's much prettier to write my code in terms of quantity[Δx] even if Δx is negative.

@timholy
Copy link
Member Author

timholy commented May 8, 2016

Have you measured the performance impact?

Not yet, but AFAICT there shouldn't be any to the addition of inds. Of course checkbounds is enormously performance-sensitive, so there's every chance that I've messed something up in this implementation. I thought I'd first see what folks think.

Actually using an offset seems likely to have a very small performance impact, but it should be essentially negligible compared to cache misses.

@vtjnash
Copy link
Member

vtjnash commented May 8, 2016

I would prefer spelling out the name, it's not that much saved to abbreviate it. That said, is this actually distinct from "keys"?

@timholy
Copy link
Member Author

timholy commented May 8, 2016

A secret reason I made it inds was to sidestep the indexes/indices debate. That kind of cowardice may not be healthy in the long run.

I also thought about keys. We could define keys(A, d), but for consistency with Dict, keys(A) should probably return an iterator (e.g., a CartesianRange) rather than an NTuple{N,UnitRange{Int}}. The default iterator for Tuple is to just iterate over the elements, so we can't just replace CartesianRange with NTuple{N,UnitRange{Int}}.

@timholy timholy mentioned this pull request May 8, 2016
@mlubin
Copy link
Member

mlubin commented May 8, 2016

Having this infrastructure in Base will certainly smooth over some issues with JuMP containers. CC @joehuchette @IainNZ

@eschnett
Copy link
Contributor

eschnett commented May 8, 2016

In https://github.com/eschnett/FlexibleArrays.jl , I used lbnd and ubnd to obtain the lower and upper array bounds, respectively. I like having a single function that returns both. This also allows strided arrays if one returns a StepRange. It even generalizes to sparse arrays, where inds can return a list (or iterator) of the indices.

@timholy
Copy link
Member Author

timholy commented May 8, 2016

@eschnett, thanks for the enthusiasm. However, I don't think this can return sparse ind***s:

  • it's valid to address a sparse vector or matrix at non-stored entries, and so you need a way of indicating whether you want all or just the stored ones;
  • for a sparse matrix, the stored entries vary by column, so this API would be insufficient;
  • part 2 of this PR is to change our dimension checks to inds(A, 1) == inds(B, 1) and for that reason they have to return the full range.

Over at ArrayIteration I use stored to indicate that you want to access just the stored elements of an array, so the change you want is definitely on its way.

@eschnett
Copy link
Contributor

eschnett commented May 8, 2016

@timholy My main concern is about arrays with a lower bound different from 1. A secondary concern would be for arrays with a non-unit stride. (This should work find with an inds function; whether this support should go into Base is a different question.)

Irregular sparse arrays are not relevant for me at the moment. However, block-sparse arrays are, where the upper bound on some dimension j depends on the value of an earlier dimension i. I'm just bringing this up to raise general awareness; there are many open questions regarding how these should be handled, or efficiently stored, or traversed, or how much of this needs to be exposed to the user anyway since efficient algorithms will depend on (or require) such a sparsity structure.

@StefanKarpinski
Copy link
Member

I like the idea of being able to write Array(T, -a:a, 0:b, -c:0) and get an array with "strange" indices.

@timholy
Copy link
Member Author

timholy commented May 9, 2016

Without an enormous amount of surgery it can't be Array(T, inds...), but it could be OffsetArray(T, inds...).

@StefanKarpinski
Copy link
Member

Yes, of course, that would be epic surgery. Just saying that it would be a nice Fortran feature graft.

@timholy timholy removed the needs docs Documentation for this change is required label May 9, 2016
@timholy
Copy link
Member Author

timholy commented May 9, 2016

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

nanosoldier commented May 9, 2016

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

Edit by jrevels: Ignore the pending status - like this comment says, the job is actually complete. One of the nodes isn't sending out final statuses for some reason, looking into this now.

@timholy
Copy link
Member Author

timholy commented May 11, 2016

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@jrevels
Copy link
Member

jrevels commented May 11, 2016

See here - I just realized ":master" ended up pointing to your fork's master by default instead of JuliaLang/julia:master. Just fixed this, sorry for the bug. Let's try it now:

@nanosoldier runbenchmarks(ALL, vs = ":master")

Edit: Oh, this PR isn't from a fork, so it should've been fine in the first place. Nevermind then, sorry for the noise.

@timholy
Copy link
Member Author

timholy commented May 11, 2016

I suspect there's still one corner case to go, anyway. I assumed this would get them all, but I should check more thoroughly locally.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Member Author

timholy commented May 13, 2016

@nanosoldier runbenchmarks(ALL, vs = ":master")

@timholy
Copy link
Member Author

timholy commented May 13, 2016

@mbauman, this contains yet a different implementation of bounds-checking. Now that the valid range of indices is represented by a UnitRange rather than an Integer, checkbounds(Bool, inds, i) has the unfortunate possibility of being confused for checkbounds(Bool, A, i). So I renamed it checkindex. See what you think.

@timholy
Copy link
Member Author

timholy commented Jun 14, 2016

As you review it, remember JuliaLang/www_old.julialang.org#386 (might make it easier to understand if you read that first).

@StefanKarpinski
Copy link
Member

I was getting worried about making too many "status update" messages

I wouldn't worry about this, @timholy – your SNR is excellent, I wouldn't mind more status updates.

@@ -484,13 +484,15 @@ export
zeta,

# arrays
allocate_for,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:( Maybe this can be combined with similar? If not, at least needs a better name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I guess it could simply be another "branch" of similar. Will fix.

@andreasnoack
Copy link
Member

@timholy The new meaning of Colon broke DistributedArrays.jl here and here so I'm trying to figure out how to think about new Colon.

You have added a deprecation for first so although it's a little weird to require the unexported_first function instead of the exported first it works. However, I need to figure out how to handle intersect(Colon,UnitRange). Do you think that it would it be safe to define

intersect(a::Colon,b::UnitRange) = b

?

@timholy
Copy link
Member Author

timholy commented Jul 5, 2016

I do think that's a reasonable definition to have.

With regards to _first, I was wondering if/when this would come up, so I'm very glad you brought this up. I'm sure you understand the issue: without having the array as one of the arguments, there's no longer any way to evaluate first(:) meaningfully. (In retrospect, maybe we never should have defined this method at all, since last(:) never made sense.) The current strategy is to force folks to supply both the array and the dimension. Without discussion, I was nervous about calling it first(A, d, :), so I made it a new non-exported function. But if anyone has better ideas, I'm all ears. I wonder if this should actually be a method of indices, i.e.,

indices(A, d, inds) = inds
indices(A, d, ::Colon) = indices(A, d)

Thoughts?

@timholy
Copy link
Member Author

timholy commented Jul 5, 2016

I guess one problem with that definition is that we have size(A, 2, 4), which returns a Tuple{Int,Int}; it would make sense to keep indices parallel. Given your definition above, it could be intersect(indices(A,d), inds), and that would be fast for Colon, but not necessarily fast for other choices.

@andreasnoack
Copy link
Member

Right now, I can barely figure out how to fix setindex!(Array, SubDArray, indx). Too many indices. Source, destination, global, and local. Figuring out how to handle a case with custom index ranges on top of this is extreme sport for the brain but probably something that suits your taste. If I acquire any insights during the process I'll share them here.

@timholy
Copy link
Member Author

timholy commented Jul 6, 2016

I feel your pain: index gymnastics are never fun, though it's better in Julia since we've developed composable elementary operations. I find that the main trick is "look no deeper than you absolutely have to," although perhaps with distributed arrays there's no escaping looking deep. Sounds like a case where developing easy-to-think-about elementary operations might make life easier.

But this particular issue is "just" an API question. One thing to note is that this would become trivial with something like #15750: we'd have first(::Colon) = ifirst (or whatever we'd decide to call it). As it is, I've scoured through our array functions and can't find anything relevant. So my next question is this: what do people use first(:) for? AFAICT, it's really to compute an "offset". So perhaps we should define

indexoffset(indx) = first(indx)-1
indexoffset(::Colon) = 0

Even more generally (but less simply), we could have

reindex(indx, refindx) = indx
reindex(::Colon, refindx) = refindx

so what used to be first(:) would become first(reindex(:, indices(A,d)).

mfasi pushed a commit to mfasi/julia that referenced this pull request Sep 5, 2016
Due to a change in the behaviour of `mapslices` (JuliaLang#16260), `median(X,k)` would mutate the underlying array. Fixes JuliaLang#17153.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.