Skip to content

Add code that gives an nice Array interface to a vector of arrays #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 26, 2017
Merged

Add code that gives an nice Array interface to a vector of arrays #2

merged 11 commits into from
Apr 26, 2017

Conversation

gabrielgellner
Copy link
Contributor

@gabrielgellner gabrielgellner commented Apr 11, 2017

Let me know what you need. My gitfu is weak.

@coveralls
Copy link

coveralls commented Apr 11, 2017

Coverage Status

Coverage increased (+9.0%) to 77.273% when pulling e02800f on gabrielgellner:vector_of_array into 0c3431d on ChrisRackauckas:master.

@codecov-io
Copy link

codecov-io commented Apr 11, 2017

Codecov Report

Merging #2 into master will increase coverage by 14.9%.
The diff coverage is 92%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #2      +/-   ##
==========================================
+ Coverage   62.36%   77.27%   +14.9%     
==========================================
  Files           2        2              
  Lines          93       66      -27     
==========================================
- Hits           58       51       -7     
+ Misses         35       15      -20
Impacted Files Coverage Δ
src/RecursiveArrayTools.jl 68.29% <ø> (ø)
src/vector_of_array.jl 92% <92%> (ø)
src/array_partition.jl
src/utils.jl

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9547070...3807a3c. Read the comment docs.

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Apr 13, 2017

Would it be difficult to check what the cost of the ragged flag is? I am hoping it's essentially zero, in which case that's awesome! If so, then let's make a is_ragged(x), that way the DiffEq solution can overload that to look towards sol.u.

What happens if you just put this on a Vector{Float64}? Curious. I don't think we'll need this, but I wonder what it does.

@gabrielgellner
Copy link
Contributor Author

Yeah I wanted to benchmark this. Do you have any examples of how I might best do this? Should I just make a couple of different sized arrays and check out the overhead of doing indexing (the part that scares me the most)?

Base.sizehint!{T, N}(S::AbstractVectorOfArray{T, N}, i) = sizehint!(S.data, i)

function Base.push!{T, N}(S::AbstractVectorOfArray{T, N}, new_item::AbstractArray)
if S.dims[1:(end - 1)] == size(new_item)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if S.ragged && S.dims[1] == size(new_item)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait actually, this is for not ragged? I don't quite understand what's going on here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are updating the dims attribute if it matches the current shape of the other elements (we drop the last element as this is the length of the containing vector). Otherwise we make it ragged. But you show a bug, I need to check that it has already become ragged in which case I don't need to do anymore checking. Will fix this up.

# Based on code from M. Bauman Stackexchange answer + Gitter discussion
type VectorOfArray{T, N, A} <: AbstractVectorOfArray{T, N}
data::A # A <: AbstractVector{<: AbstractArray{T, N - 1}}
dims::NTuple{N, Int}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a tuple if it's changing with each push?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better as a Vector{Int}?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I will make that change, I think I originally did this just to make it more Array like, but it clearly doesn't make sense as it is not static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue is can I make the field readonly? Might cause strange issues if the user mutates the dim field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about that. If a user does that, they deserve problems.

@ChrisRackauckas
Copy link
Member

It would be good to know the numbers for the overhead of indexing, but since there really isn't much we can do about it (by design) it's moreso for understanding when someone should convert to a matrix/tensor.

There are two things here. Speed of building and speed of indexing. Speed of building is first. I plan to swap out the current array that's used in each solver with a VectorOfArray (currently it's a Vector{Array{...}}). The solvers get values and then iteratively push!. So push! is crucial to stay at the same speed.

But then next is indexing. The solvers and event handlers only actually use the linear index, and that should stay the same speed because that's just passing through to indexing the vector.

The more general indexing is only used on the user side. It's a convenience, and so is the conversion to a matrix function (maybe we should have it just be full instead of this new function?). It would be good to know what exactly the difference is when doing operations like this, but since it's not in the main routine we have more leeway here.

@gabrielgellner
Copy link
Contributor Author

The timings your want make a lot of sense. I will get this rigged up, as it will be really good to know.

@ChrisRackauckas
Copy link
Member

I took a good look at this. I'm not sure that the extra fields are really worthwhile. The ragged field is only used once:

S.ragged && throw(BoundsError("A ragged VectorOfArray does not support Cartesian indexing"))

But that's unnecessary: in many cases a Cartesian index still makes sense. For example, with this you can't do a[1,:] to get the first component through time, which will make sense for any (standard indices) ragged array. It think it would be better to just let the data throw the index out of bounds if the user hits something that it can't handle. Because of this, I think ragged checks and saving of S.ragged can all go, since that's the only place it's actually used.

But that also means you can just take dims as size(S) == size(S.data), since that was just to support the ragged stuff. So I think you only need S.data.

And then once you do this, the overhead it's a zero-overhead abstraction too. It would still be interesting to test the cost in indexing, but that would be a structural cost and not something you could improve at all.

Even after you trim all of this back, I don't think you hit any of the functionality in the tests, which shows that it wasn't essential anyways.

I rebased you to the current RecursiveArrayTools.jl since a bit has changed here.

@gabrielgellner
Copy link
Contributor Author

Okay this Friday I will look over this. I really hope that removing ragged and dims makes everything awesome, which makes sense! I never liked having the ragged check inside what would be a very commonly used indexing operation (at the user side).

@gabrielgellner
Copy link
Contributor Author

When you did your test did you remove the subtyping from AbstractArray? I am getting a bunch of issues from the dims not really being what we would want, since size(a.data) gives the container vector dimension, but is not really what we would want for something like [[1, 2], [3, 4]] which should be like a matrix.

@ChrisRackauckas
Copy link
Member

When you did your test did you remove the subtyping from AbstractArray? I am getting a bunch of issues from the dims not really being what we would want, since size(a.data) gives the container vector dimension, but is not really what we would want for something like [[1, 2], [3, 4]] which should be like a matrix.

Why not just define dims? When it's subtyped as AbstractArray{T,N}, is that actually inferable? I would think the computation on the N would make it type unstable to create these when it's subtyped as AbstractArray, and since you're overloading the indexing functions anyways, it may not make sense to subtype that.

@ChrisRackauckas
Copy link
Member

@gabrielgellner will you be available tomorrow? I would like to work through this and finish this up over the weekend.

@gabrielgellner
Copy link
Contributor Author

Saturday is my wife's birthday. But I plan to power through this on Sunday if that works.

@ChrisRackauckas
Copy link
Member

Sounds good.

Remove ragged field, and go back to simple AbstractArray interface.
@boundscheck checkbounds(S, I...) # is this needed?
S.data[I[end]][Base.front(I)...]
@inline function Base.getindex{T, N}(VA::VectorOfArray{T, N}, I::Vararg{Int, N})
@boundscheck checkbounds(VA, I...) # is this needed?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed

#@assert all(size(vec[1]) == size(v) for v in vec)
VectorOfArray(vec, (size(vec[1])..., length(vec)))
end
VectorOfArray{T, N}(vec::AbstractVector{T}, dims::NTuple{N}) = VectorOfArray{eltype(T), N, typeof(vec)}(vec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pure? Can this be inferred?

@ChrisRackauckas
Copy link
Member

type VectorOfArray{T, N, A} <: AbstractArray{T, N}
    data::A
end
VectorOfArray{T, N}(vec::AbstractVector{T}, dims::NTuple{N}) = VectorOfArray{eltype(T), N, typeof(vec)}(vec)
VectorOfArray(vec::AbstractVector) = VectorOfArray(vec, (size(vec[1])..., length(vec)))

A = rand(4)
@code_warntype VectorOfArray(A)


Variables:
  #self#::Type{VectorOfArray}
  vec::Array{Float64,1}

Body:
  begin 
      (Base.arrayref)(vec::Array{Float64,1},1)::Float64
      return $(Expr(:new, VectorOfArray{Float64,1,Array{Float64,1}}, :(vec)))
  end::VectorOfArray{Float64,1,Array{Float64,1}}

Great! That infers well. I think you cracked the code for how to make it type-inferably <:AbstractArray!

@ChrisRackauckas
Copy link
Member

ChrisRackauckas commented Apr 26, 2017

This is looking really good. I think we are close to done.

TODO: should we redefine length to be over the VA.data? Currently it is the number of total elements

What do you think? From a quick search over DiffEq, it seems that re-defining length wouldn't be too disruptive (it was defined before on sol as the length of the underlying vector). What do you think is more useful? I think it might help if you're iterating over the linear index, because then the indices are 1:length(sol).

Other thing: this should all be written on some AbstractVectorOfArray instead of on the concrete type. We should probably standardize this all to use .u. Then the DiffEqVectorOfArray <: AbstractVectorOfArray with .u and .t would immediately work. The next question would be: should DESolution <: AbstractVectorOfArray as well? Or should it just pass indices?

@gabrielgellner
Copy link
Contributor Author

My gut feeling is that it should be over the container vector, as like you suggest I think of the length part as being associated with the linear index, whereas the size is over the Cartesian view. In many ways I see what we are doing as having two parts:

  1. We are making a nice growable array, like you did originally that has a convenient getindex overload
  2. We are making iteration, linear indexing, etc as being over the collection of subarrays

I really think these two things together are super convenient and nice.

Also happy to make this all an Abstract type, though I don't have a great feeling over whether the diffeq solution should be a subtype. I guess it is likely a VectorOfArray with extra fields so it does make conceptual sense to me.

@gabrielgellner
Copy link
Contributor Author

One other thing I wanted feedback on. So when the array is not ragged we get sweet printing, just like it was a dense array. Once it is ragged the printing becomes meaningless. As a result I was planning on making our own print, that just gives that standard array of array view. Make sense?

@ChrisRackauckas
Copy link
Member

That makes sense to me.

@gabrielgellner
Copy link
Contributor Author

Okay I added the length behavior, and moved to use Abstract base type.

@ChrisRackauckas
Copy link
Member

So just the show methods now?

@gabrielgellner
Copy link
Contributor Author

Yup. Working on that now, and then should be ready to merge.

@ChrisRackauckas
Copy link
Member

I think the show methods can come in another PR. I like short PRs better anyways. I changed data to u and will merge when the tests are complete, then test what happens if I subtype DESolution as an AbstractVectorOfArray. I think that will work just fine, and thus will expand that whole solution interface. I'm not sure show should be overrided on the abstract type because of this, so I'd just put it on the concrete type.

@gabrielgellner
Copy link
Contributor Author

Nice. I will keep looking into how the display machinery works, and get this ready for a future PR. Once this is merged what do you need from me next? Should I start working on adding an interp output PR?

@ChrisRackauckas
Copy link
Member

Should I start working on adding an interp output PR?

I think that's the way to go next.

@ChrisRackauckas ChrisRackauckas merged commit 9a95543 into SciML:master Apr 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants