Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match setindex!, push!, and append! behavior of AbstractArray Interface #184

Closed
wants to merge 10 commits into from

Conversation

goretkin
Copy link
Contributor

See #182

There are a few TODOs with respect to efficiency, but otherwise this does not break any tests in this test suite, although it does change the behavior and so this could be a breaking change.

If someone defines for their own type a method Base.convert(::Type{MyType}, ::NamedTuple), StructArrays.jl will disregard it, and it will in my opinion deviate from the AbstractArray.

One possible way to rectify this is to introduce a wrapper StructArrays.Structural. Anyone who add methods to that would be committing type piracy, and so this could be a sentinel value for indicating "do the structural (not nominal) thing".

This means a test like

@test append!(StructArray([1im]), [(re = 111, im = 222)]) == StructArray([1im, 111 + 222im])

would be written (now) as

@test append!(StructArray([1im]), [Structural((re = 111, im = 222))]) == StructArray([1im, 111 + 222im])

and it would perhaps be worthwhile to define methods such that the following worked too:

@test append!(StructArray([1im]), Structural([(re = 111, im = 222)])) == StructArray([1im, 111 + 222im])

@piever
Copy link
Collaborator

piever commented Apr 15, 2021

I probably need to think harder about this, but the original design of this package was very much "table" oriented, so it made sense that one could support

s = StructArray(rand(ComplexF64, 10))
push!(s, (re = 1.0, im = 1.0)

IMO, we either choose that we support push!(s, true) or push!(s, (re = 1.0, im = 1.0)). Having some mechanism to choose seems overly complex to me.

My main concerning with the promotion is approach is that it would fail to promote fields. For example

julia> struct A{T}
           x::T
       end

julia> s = A(1)
A{Int64}(1)

julia> convert(A{Float64}, s)
ERROR: MethodError: Cannot `convert` an object of type
  A{Int64} to an object of type
  A{Float64}
Closest candidates are:
  convert(::Type{T}, ::T) where T at essentials.jl:205
  A{T}(::Any) where T at REPL[10]:2
Stacktrace:
 [1] top-level scope
   @ REPL[12]:1

So if I have a StructArray{A{Float64}} and I try to pass a A{Int} the proposed mechanism would error. I'm not sure that this is a good trade-off: IMO, being able to promote columns is more helpful. I imagine that the other scenario (promote the whole struct but not columns) is mostly helpful when working with a StructArray{Complex{T}}

@goretkin
Copy link
Contributor Author

That concern makes sense. Note that this is consistent with e.g. Base.Array and probably every <:AbstractArray should behave the same, otherwise you cannot write code that is generic over any AbstractArray.

julia> struct A{T}
       x::T
       end

julia> push!([A{Float64}(1.1)], A{Int}(2))
ERROR: MethodError: Cannot `convert` an object of type 
  A{Int64} to an object of type 
  A{Float64}

IMO, we either choose that we support push!(s, true) or push!(s, (re = 1.0, im = 1.0)). Having some mechanism to choose seems overly complex to me.

And so concretely, if StructArray <: AbstractArray, it seems like you are forced to support push!(s, true). But I'm actually surprised not to see convert mentioned in the documentation for setindex! or push!, so perhaps I am mistaken about the interface. It's also the case afaik with `setindex!(<:AbstractDict, ...):

julia> d = Dict{Int, Complex{Float64}}()
Dict{Int64, ComplexF64}()

julia> d[3.0] = true
true

julia> d
Dict{Int64, ComplexF64} with 1 entry:
  3 => 1.0+0.0im

At the same time, it seems useful to support push!(s, (re = 1.0, im = 1.0)) and I think it could be more generally useful to have something like Structural to allow that to work for any AbstractArray, not just StructArray.

@goretkin
Copy link
Contributor Author

but the original design of this package was very much "table" oriented,

Gotcha. I would love to simply swap out an array-of-structs layout like Array{Complex{Float64}} for a struct-of-arrays layout like StructArray{Complex{Float64}} with identical semantics, and just to test if one data layout has better performance than the other. This might be incompatible with the table view.

In fact, perhaps my perspective is exactly opposite. This package gives a way to "wrap" the first argument of push! so that it behaves like a table. My suggestion is to wrap (no scare quotes) the second argument.

That would separate two things: 1. the memory layout, 2. the "behavior". The second point reminds me of https://github.com/JuliaObjects/ConstructionBase.jl and is kind of the essence of my ugly kludge here:

https://github.com/JuliaArrays/StructArrays.jl/pull/184/files#diff-7716d668639137523ed95cf3f9b626f6e4924cce6d754e160085b353092d43c5R365-R370

@piever
Copy link
Collaborator

piever commented Apr 15, 2021

That would separate two things: 1. the memory layout, 2. the "behavior". The second point reminds me of https://github.com/JuliaObjects/ConstructionBase.jl

Unrelated, but ConstructionsBase seems super useful for things that I do here. I should definitely check it out, thanks for mentioning it!

Regarding the "convert or not" discussion, it's a valid point that one should be able to separate memory layout and behavior. We should also see how this plays with the widening machinery in collect_structarray https://github.com/JuliaArrays/StructArrays.jl/blob/master/src/collect.jl#L107.

The problem there is that if one collects an iterable of named tuples and the type of the values of the named tuple changes (for example because of missing data) the promoted type becomes too general, so that is a scenario where StructArrays needs to differ from base. For example

julia> Base.promote_typejoin(typeof((a=1,)), typeof((a=missing,)))
NamedTuple{(:a,), T} where T<:Tuple

whereas the "field by field" Base.promote_typejoin would give NamedTuple{(:a,), Tuple{Union{Int, Missing}}}.

@goretkin
Copy link
Contributor Author

goretkin commented Apr 15, 2021

whereas the "field by field" Base.promote_typejoin would give NamedTuple{(:a,), Tuple{Union{Int, Missing}}}.

Ah, hm. Here's what I dug up:

JuliaLang/julia#25924
JuliaLang/julia#31077

[EDIT, not sure, actually] It seems like Missing is special-cased here: https://github.com/JuliaLang/julia/blob/691cf74e23a92fbfd6ec656f1bcd7bdf07075695/base/promotion.jl#L141-L152

The behavior is

julia> function commutator(f, A, B)
           Tuple{f(A, B)}, f(Tuple{A}, Tuple{B})
       end
commutator (generic function with 2 methods)

julia> commutator(Base.promote_typejoin, Int64, Missing)
(Tuple{Union{Missing, Int64}}, Tuple{Any})

julia> commutator(Base.promote_typejoin, Int64, String)
(Tuple{Any}, Tuple{Any})

@goretkin
Copy link
Contributor Author

so that is a scenario where StructArrays needs to differ from base. For example

I was trying to identify where the behavior diverges with e.g. Base.Array. Can you give two calls, one with Array and one with StructArray where e.g. the eltype is different (Any and Union{Missing, Int64})?

@piever
Copy link
Collaborator

piever commented Apr 16, 2021

Sure, this is probably the simplest example:

julia> using StructArrays

julia> iter = (i for i in ((a=1,), (a=missing,)))
Base.Generator{Tuple{NamedTuple{(:a,), Tuple{Int64}}, NamedTuple{(:a,), Tuple{Missing}}}, typeof(identity)}(identity, ((a = 1,), (a = missing,)))

julia> StructArray(iter)
2-element StructArray(::Vector{Union{Missing, Int64}}) with eltype NamedTuple{(:a,), Tuple{Union{Missing, Int64}}}:
 NamedTuple{(:a,), Tuple{Union{Missing, Int64}}}((1,))
 NamedTuple{(:a,), Tuple{Union{Missing, Int64}}}((missing,))

julia> collect(iter)
2-element Vector{NamedTuple{(:a,), T} where T<:Tuple}:
 (a = 1,)
 (a = missing,)

@goretkin
Copy link
Contributor Author

I think that is just some StructArray-specific behavior, and does not represent an inconsistency with the AbstractArray interface.

julia> StructArray(i for i in 1:5)
0-element StructArray() with eltype Int64 with indices 1:0

julia> StructArray((;i) for i in 1:5)
5-element StructArray(::Vector{Int64}) with eltype NamedTuple{(:i,), Tuple{Int64}}:
 (i = 1,)
 (i = 2,)
 (i = 3,)
 (i = 4,)
 (i = 5,)

julia> Vector(i for i in 1:5)
ERROR: MethodError: no method matching (Vector{T} where T)(::Base.Generator{UnitRange{Int64}, typeof(identity)})
[...]

julia> Array((;i) for i in 1:5)
ERROR: MethodError: no method matching Array(::Base.Generator{UnitRange{Int64}, var"#5#6"})
[...]

@piever
Copy link
Collaborator

piever commented Apr 18, 2021

I'm slowly convincing myself that yes, we should convert on setindex!, push!, and append!. I like the approach of using a stricter signature by default and converting explicitly to eltype(s) otherwise. append! is the tricky one, because we shouldn't allocate a new array just to convert the eltype, I think folding push! is a reasonable fallback. Alternatively, you could recycle the machinery here.

I am a bit perplexed about the Structural "hack". Maybe we should just embrace that this is a breaking change? The test suite should be reasonably comprehensive, how many tests fail if you remove the Structural bit from the PR?

I'm becoming convinced that in the end it's only important that one can push! or setindex! using named tuples if the array itself contained named tuples to start with (that's what you'd use for a table), so just complying with julia Base could be the right way forward. It looks like TypedTables does what you suggest and they are a data manipulation framework, so my initial skepticism is a bit hard to defend...

@fbanning fbanning mentioned this pull request Apr 21, 2021
@goretkin
Copy link
Contributor Author

goretkin commented Apr 22, 2021

append! is the tricky one, because we shouldn't allocate a new array just to convert the eltype, I think folding push! is a reasonable fallback. Alternatively, you could recycle the machinery here.

I agree that we should not allocate a new array. append! should probably use resize! to avoid triggering multiple re-allocations from repeated push!s. The machinery you mention, I'm not sure how convert fits into it, though.

I am a bit perplexed about the Structural "hack". Maybe we should just embrace that this is a breaking change? The test suite should be reasonably comprehensive, how many tests fail if you remove the Structural bit from the PR?

A few tests did break, but not many. I can take a closer look. I tend to agree with embracing the breaking change. The Structural wrapper might still be a good idea to allow a user to access this behavior: https://github.com/JuliaArrays/StructArrays.jl/pull/184/files#diff-7716d668639137523ed95cf3f9b626f6e4924cce6d754e160085b353092d43c5R365-R370 (which would ideally be written more efficiently, and possible should rely on another package like ConstructionBase.jl). I do think having easy access to that behavior is very convenient, but that applies beyond StructArrays.jl specifically, and more broadly.

I am not sure if this is directly applicable, but the "kind" of struct for which this Structural wrapper could be convenient is: https://juliadata.github.io/StructTypes.jl/stable/#DataTypes

In any case, I would remove maybe_structural, and require the user to wrap an element in Structural explicitly.

timholy added a commit that referenced this pull request May 8, 2022
Fixes #131
Closes #184

Co-authored-by: Gustavo Goretkin <gustavo.goretkin@gmail.com>
@timholy
Copy link
Member

timholy commented May 8, 2022

I'm slowly convincing myself that yes, we should convert on setindex!, push!, and append!.

Glad to hear that! Coming to this package now, rather than having followed it from its inception, my expectations are that it should be an AbstractArray with SOA layout, and its heritage as a "Tables" implementation is a surprise. In any case of conflict between the two, I would rather see it act like an AbstractArray.

@goretkin, I've continued this effort in #227

@timholy timholy closed this in 9d357fc May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants