Match `setindex!`, `push!`, and `append!` behavior of `AbstractArray` Interface #184

goretkin · 2021-04-15T06:32:26Z

There are a few TODOs with respect to efficiency, but otherwise this does not break any tests in this test suite, although it does change the behavior and so this could be a breaking change.

If someone defines for their own type a method Base.convert(::Type{MyType}, ::NamedTuple), StructArrays.jl will disregard it, and it will in my opinion deviate from the AbstractArray.

One possible way to rectify this is to introduce a wrapper StructArrays.Structural. Anyone who add methods to that would be committing type piracy, and so this could be a sentinel value for indicating "do the structural (not nominal) thing".

This means a test like

@test append!(StructArray([1im]), [(re = 111, im = 222)]) == StructArray([1im, 111 + 222im])

would be written (now) as

@test append!(StructArray([1im]), [Structural((re = 111, im = 222))]) == StructArray([1im, 111 + 222im])

and it would perhaps be worthwhile to define methods such that the following worked too:

@test append!(StructArray([1im]), Structural([(re = 111, im = 222)])) == StructArray([1im, 111 + 222im])

but now missing definition of linear indexing that probably relied on an `Base.AbstractArray` fallback.

i.e. `Base.append!(s::StructVector, rows)`

Fixes remaining broken tests.

piever · 2021-04-15T15:07:26Z

I probably need to think harder about this, but the original design of this package was very much "table" oriented, so it made sense that one could support

s = StructArray(rand(ComplexF64, 10))
push!(s, (re = 1.0, im = 1.0)

IMO, we either choose that we support push!(s, true) or push!(s, (re = 1.0, im = 1.0)). Having some mechanism to choose seems overly complex to me.

My main concerning with the promotion is approach is that it would fail to promote fields. For example

julia> struct A{T}
           x::T
       end

julia> s = A(1)
A{Int64}(1)

julia> convert(A{Float64}, s)
ERROR: MethodError: Cannot `convert` an object of type
  A{Int64} to an object of type
  A{Float64}
Closest candidates are:
  convert(::Type{T}, ::T) where T at essentials.jl:205
  A{T}(::Any) where T at REPL[10]:2
Stacktrace:
 [1] top-level scope
   @ REPL[12]:1

So if I have a StructArray{A{Float64}} and I try to pass a A{Int} the proposed mechanism would error. I'm not sure that this is a good trade-off: IMO, being able to promote columns is more helpful. I imagine that the other scenario (promote the whole struct but not columns) is mostly helpful when working with a StructArray{Complex{T}}

goretkin · 2021-04-15T15:25:32Z

That concern makes sense. Note that this is consistent with e.g. Base.Array and probably every <:AbstractArray should behave the same, otherwise you cannot write code that is generic over any AbstractArray.

julia> struct A{T}
       x::T
       end

julia> push!([A{Float64}(1.1)], A{Int}(2))
ERROR: MethodError: Cannot `convert` an object of type 
  A{Int64} to an object of type 
  A{Float64}

IMO, we either choose that we support push!(s, true) or push!(s, (re = 1.0, im = 1.0)). Having some mechanism to choose seems overly complex to me.

And so concretely, if StructArray <: AbstractArray, it seems like you are forced to support push!(s, true). But I'm actually surprised not to see convert mentioned in the documentation for setindex! or push!, so perhaps I am mistaken about the interface. It's also the case afaik with `setindex!(<:AbstractDict, ...):

julia> d = Dict{Int, Complex{Float64}}()
Dict{Int64, ComplexF64}()

julia> d[3.0] = true
true

julia> d
Dict{Int64, ComplexF64} with 1 entry:
  3 => 1.0+0.0im

At the same time, it seems useful to support push!(s, (re = 1.0, im = 1.0)) and I think it could be more generally useful to have something like Structural to allow that to work for any AbstractArray, not just StructArray.

goretkin · 2021-04-15T15:45:12Z

but the original design of this package was very much "table" oriented,

Gotcha. I would love to simply swap out an array-of-structs layout like Array{Complex{Float64}} for a struct-of-arrays layout like StructArray{Complex{Float64}} with identical semantics, and just to test if one data layout has better performance than the other. This might be incompatible with the table view.

In fact, perhaps my perspective is exactly opposite. This package gives a way to "wrap" the first argument of push! so that it behaves like a table. My suggestion is to wrap (no scare quotes) the second argument.

That would separate two things: 1. the memory layout, 2. the "behavior". The second point reminds me of https://github.com/JuliaObjects/ConstructionBase.jl and is kind of the essence of my ugly kludge here:

https://github.com/JuliaArrays/StructArrays.jl/pull/184/files#diff-7716d668639137523ed95cf3f9b626f6e4924cce6d754e160085b353092d43c5R365-R370

piever · 2021-04-15T16:56:01Z

That would separate two things: 1. the memory layout, 2. the "behavior". The second point reminds me of https://github.com/JuliaObjects/ConstructionBase.jl

Unrelated, but ConstructionsBase seems super useful for things that I do here. I should definitely check it out, thanks for mentioning it!

Regarding the "convert or not" discussion, it's a valid point that one should be able to separate memory layout and behavior. We should also see how this plays with the widening machinery in collect_structarray https://github.com/JuliaArrays/StructArrays.jl/blob/master/src/collect.jl#L107.

The problem there is that if one collects an iterable of named tuples and the type of the values of the named tuple changes (for example because of missing data) the promoted type becomes too general, so that is a scenario where StructArrays needs to differ from base. For example

julia> Base.promote_typejoin(typeof((a=1,)), typeof((a=missing,)))
NamedTuple{(:a,), T} where T<:Tuple

whereas the "field by field" Base.promote_typejoin would give NamedTuple{(:a,), Tuple{Union{Int, Missing}}}.

goretkin · 2021-04-15T17:39:35Z

whereas the "field by field" Base.promote_typejoin would give NamedTuple{(:a,), Tuple{Union{Int, Missing}}}.

Ah, hm. Here's what I dug up:

JuliaLang/julia#25924
JuliaLang/julia#31077

[EDIT, not sure, actually] It seems like Missing is special-cased here: https://github.com/JuliaLang/julia/blob/691cf74e23a92fbfd6ec656f1bcd7bdf07075695/base/promotion.jl#L141-L152

The behavior is

julia> function commutator(f, A, B)
           Tuple{f(A, B)}, f(Tuple{A}, Tuple{B})
       end
commutator (generic function with 2 methods)

julia> commutator(Base.promote_typejoin, Int64, Missing)
(Tuple{Union{Missing, Int64}}, Tuple{Any})

julia> commutator(Base.promote_typejoin, Int64, String)
(Tuple{Any}, Tuple{Any})

goretkin · 2021-04-15T17:58:23Z

so that is a scenario where StructArrays needs to differ from base. For example

I was trying to identify where the behavior diverges with e.g. Base.Array. Can you give two calls, one with Array and one with StructArray where e.g. the eltype is different (Any and Union{Missing, Int64})?

piever · 2021-04-16T11:54:10Z

Sure, this is probably the simplest example:

julia> using StructArrays

julia> iter = (i for i in ((a=1,), (a=missing,)))
Base.Generator{Tuple{NamedTuple{(:a,), Tuple{Int64}}, NamedTuple{(:a,), Tuple{Missing}}}, typeof(identity)}(identity, ((a = 1,), (a = missing,)))

julia> StructArray(iter)
2-element StructArray(::Vector{Union{Missing, Int64}}) with eltype NamedTuple{(:a,), Tuple{Union{Missing, Int64}}}:
 NamedTuple{(:a,), Tuple{Union{Missing, Int64}}}((1,))
 NamedTuple{(:a,), Tuple{Union{Missing, Int64}}}((missing,))

julia> collect(iter)
2-element Vector{NamedTuple{(:a,), T} where T<:Tuple}:
 (a = 1,)
 (a = missing,)

goretkin · 2021-04-16T13:40:54Z

I think that is just some StructArray-specific behavior, and does not represent an inconsistency with the AbstractArray interface.

julia> StructArray(i for i in 1:5)
0-element StructArray() with eltype Int64 with indices 1:0

julia> StructArray((;i) for i in 1:5)
5-element StructArray(::Vector{Int64}) with eltype NamedTuple{(:i,), Tuple{Int64}}:
 (i = 1,)
 (i = 2,)
 (i = 3,)
 (i = 4,)
 (i = 5,)

julia> Vector(i for i in 1:5)
ERROR: MethodError: no method matching (Vector{T} where T)(::Base.Generator{UnitRange{Int64}, typeof(identity)})
[...]

julia> Array((;i) for i in 1:5)
ERROR: MethodError: no method matching Array(::Base.Generator{UnitRange{Int64}, var"#5#6"})
[...]

piever · 2021-04-18T14:55:56Z

I'm slowly convincing myself that yes, we should convert on setindex!, push!, and append!. I like the approach of using a stricter signature by default and converting explicitly to eltype(s) otherwise. append! is the tricky one, because we shouldn't allocate a new array just to convert the eltype, I think folding push! is a reasonable fallback. Alternatively, you could recycle the machinery here.

I am a bit perplexed about the Structural "hack". Maybe we should just embrace that this is a breaking change? The test suite should be reasonably comprehensive, how many tests fail if you remove the Structural bit from the PR?

I'm becoming convinced that in the end it's only important that one can push! or setindex! using named tuples if the array itself contained named tuples to start with (that's what you'd use for a table), so just complying with julia Base could be the right way forward. It looks like TypedTables does what you suggest and they are a data manipulation framework, so my initial skepticism is a bit hard to defend...

goretkin · 2021-04-22T20:54:34Z

append! is the tricky one, because we shouldn't allocate a new array just to convert the eltype, I think folding push! is a reasonable fallback. Alternatively, you could recycle the machinery here.

I agree that we should not allocate a new array. append! should probably use resize! to avoid triggering multiple re-allocations from repeated push!s. The machinery you mention, I'm not sure how convert fits into it, though.

I am a bit perplexed about the Structural "hack". Maybe we should just embrace that this is a breaking change? The test suite should be reasonably comprehensive, how many tests fail if you remove the Structural bit from the PR?

A few tests did break, but not many. I can take a closer look. I tend to agree with embracing the breaking change. The Structural wrapper might still be a good idea to allow a user to access this behavior: https://github.com/JuliaArrays/StructArrays.jl/pull/184/files#diff-7716d668639137523ed95cf3f9b626f6e4924cce6d754e160085b353092d43c5R365-R370 (which would ideally be written more efficiently, and possible should rely on another package like ConstructionBase.jl). I do think having easy access to that behavior is very convenient, but that applies beyond StructArrays.jl specifically, and more broadly.

I am not sure if this is directly applicable, but the "kind" of struct for which this Structural wrapper could be convenient is: https://juliadata.github.io/StructTypes.jl/stable/#DataTypes

In any case, I would remove maybe_structural, and require the user to wrap an element in Structural explicitly.

Fixes #131 Closes #184 Co-authored-by: Gustavo Goretkin <gustavo.goretkin@gmail.com>

timholy · 2022-05-08T13:08:23Z

I'm slowly convincing myself that yes, we should convert on setindex!, push!, and append!.

Glad to hear that! Coming to this package now, rather than having followed it from its inception, my expectations are that it should be an AbstractArray with SOA layout, and its heritage as a "Tables" implementation is a surprise. In any case of conflict between the two, I would rather see it act like an AbstractArray.

@goretkin, I've continued this effort in #227

goretkin added 10 commits April 15, 2021 00:51

Add test for issue JuliaArrays#182

826320f

Also test append!

8f6960b

Require exact eltype for some definitions

4f6f601

Introduce a method ambiguity

4b44e87

Remove method ambiguity

e861e58

but now missing definition of linear indexing that probably relied on an `Base.AbstractArray` fallback.

Fix setindex!

3d38d41

More fixes, mark some broken tests

ea2d4ae

Do not fallback on table / iterator definition

1c5752f

i.e. `Base.append!(s::StructVector, rows)`

Introduce trait for types that deviate from Base

4668ab3

Fixes remaining broken tests.

Use wrapper instead of traits

6db25a1

goretkin mentioned this pull request Apr 15, 2021

Document setindex! / push! calling convert JuliaLang/julia#40498

Open

fbanning mentioned this pull request Apr 21, 2021

setindex and promotion #131

Closed

timholy added a commit that referenced this pull request May 8, 2022

Address push! and append!

9d49cc6

Fixes #131 Closes #184 Co-authored-by: Gustavo Goretkin <gustavo.goretkin@gmail.com>

timholy mentioned this pull request May 8, 2022

setindex!: convert to eltype (fixes #216) #227

Merged

timholy closed this in 9d357fc May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match `setindex!`, `push!`, and `append!` behavior of `AbstractArray` Interface #184

Match `setindex!`, `push!`, and `append!` behavior of `AbstractArray` Interface #184

goretkin commented Apr 15, 2021

piever commented Apr 15, 2021

goretkin commented Apr 15, 2021

goretkin commented Apr 15, 2021

piever commented Apr 15, 2021

goretkin commented Apr 15, 2021 •

edited

goretkin commented Apr 15, 2021

piever commented Apr 16, 2021

goretkin commented Apr 16, 2021

piever commented Apr 18, 2021

goretkin commented Apr 22, 2021 •

edited

timholy commented May 8, 2022

Match setindex!, push!, and append! behavior of AbstractArray Interface #184

Match setindex!, push!, and append! behavior of AbstractArray Interface #184

Conversation

goretkin commented Apr 15, 2021

piever commented Apr 15, 2021

goretkin commented Apr 15, 2021

goretkin commented Apr 15, 2021

piever commented Apr 15, 2021

goretkin commented Apr 15, 2021 • edited

goretkin commented Apr 15, 2021

piever commented Apr 16, 2021

goretkin commented Apr 16, 2021

piever commented Apr 18, 2021

goretkin commented Apr 22, 2021 • edited

timholy commented May 8, 2022

Match `setindex!`, `push!`, and `append!` behavior of `AbstractArray` Interface #184

Match `setindex!`, `push!`, and `append!` behavior of `AbstractArray` Interface #184

goretkin commented Apr 15, 2021 •

edited

goretkin commented Apr 22, 2021 •

edited