vcat with different AbstractVectors #2326

tshort · 2013-02-16T17:03:32Z

Currently, vcat on AbstractArrays uses the first argument to determine the type of AbstractArray to use for the result. A nice feature would be to have some sort of promotion that determines the result. In the example below, it would be nice to have a way to define that both of them return a DataArray (from the DataFrames package).

julia> [DataVector[1,3], [1,2]]
4-element Int64 DataArray:
 1
 3
 1
 2

julia> [[1,2],DataVector[1,3]]
4-element Int64 Array:
 1
 2
 1
 3

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2013-03-04T08:05:15Z

Ouch this is a tricky one.

tshort · 2013-03-04T12:25:04Z

It's not a super high priority for DataFrames since we defined a local function to replace vcat for thisrbind issue.

Something like this may also help another DataFrame issue: making [1,3,5,NA] return a DataArray rather than an Array. I thought there was a separate issue on that, but I couldn't find it.

ViralBShah · 2014-04-27T07:30:40Z

Is this something that we still need to worry about? It doesn't seem like there is much we can do at this stage. An error is at least better, I would think, as is the case right now.

julia> [DataVector[1,3], [1,2]]
ERROR: no method convert(Type{DataArray{T,1}}, Int64)

RainerHeintzmann · 2014-06-01T21:31:56Z

I think there is a solution to this problem. The core of it is that the
similar
function is called only for one of the arguments. This can be fixed by

Defining (overloading) a series of "similar" functions which accept a type rather than an actual array:

Base.similar{T}(::Type{Array{T,1}}, dims::Dims) = Array(T, dims)
Base.similar{T}(::Type{Array{T,2}}, dims::Dims) = Array(T, dims)
Base.similar{T,N}(::Type{Array{T,N}}, dims::Dims) = Array(T, dims)
Base.similar{T}(::Type{Array{T,1}}, ET, dims::Dims) = Array(ET, dims)
Base.similar{T}(::Type{Array{T,2}}, ET, dims::Dims) = Array(ET, dims)
Base.similar{T,N}(::Type{Array{T,N}}, ET, dims::Dims) = Array(ET, dims)

replacing the appropriate line in the hcat definition:

    B = similar(full(A[1]), nrows, ncols)

with

    B = similar(promote_type(typeof(A)...), (nrows, ncols))

Having Promote rules for the appropriate datatype (in my case it was my own derived Img class):

Base.promote_type{T,N}(::Type{Img{T,N}},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N}(::Type{None},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N,S}(::Type{S},::Type{Img{T,N}}) = Img{T,N}

Base.promote_type{T,N}(::Type{Img{T,N}},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N}(::Type{Img{T,N}},::Type{None}) = Img{T,N}
Base.promote_type{T,N,S}(::Type{Img{T,N}},::Type{S}) = Img{T,N}

I know the latter is a bit of a hack, as one should usually use the promote_rule mechanism and possibly also only for the promotion in cooperation with array classes (not any type as shown here). But somehow I did not get the "promote_rule" mechanism to work properly for the generic types.

This then also allowed to encapsulate an array in my Img class such that Array functions such as "+" will automatically work. In array.jl this required similar changes in the loops which generate the "+" routine, such that then the promotion rules are automatically used.
Here is the whole example which demonstrates how to create a wrapped array class such that + and alike handle it correctly.
Still issues to do:
- Also make the .+ .* functions work which use the broadcast mechanism
- Apply this trick to all other appropriate functions (more concatenations) in array.jl
- Test whether the performance may be compromised by this mechanism
- Make a macro : @Encapsulate which automatically creates the necessary code for encapsulating array types
- Clean up the code in existing wrapped classes. Image? Dataframes? such that redefinitions of the standard array operations are only necessary when the meaning should really change.
- Overload the show function for the array class (can anybody help here? "show" seems not to be used in the REPL for arrays. So which is the correct function to specialize?)

Here is example code that demonstrates how to create a "derived" array which (so far partly) behaves the way it should. Note that the start of the program below are changes to the Julia core, which one may discuss to implement them permanently (after speed tests).

# Changes to make to array.jl
# This is an overload such that similar also works with types as input
Base.similar{T}(::Type{Array{T,1}}, dims::Dims) = Array(T, dims)
Base.similar{T}(::Type{Array{T,2}}, dims::Dims) = Array(T, dims)
Base.similar{T,N}(::Type{Array{T,N}}, dims::Dims) = Array(T, dims)
Base.similar{T}(::Type{Array{T,1}}, ET, dims::Dims) = Array(ET, dims)
Base.similar{T}(::Type{Array{T,2}}, ET, dims::Dims) = Array(ET, dims)
Base.similar{T,N}(::Type{Array{T,N}}, ET, dims::Dims) = Array(ET, dims)

import Base.promote_array_type  # needed for the definitions below

# The code below is what needs to be changed in array.jl
# only the "A" was replaced with promote_type(typeof(A),typeof(B)) in "similar"
for f in (:+, :-, :div, :mod, :&, :|, :$)
    @eval begin
        function ($f){S,T}(A::StridedArray{S}, B::StridedArray{T})
            F = similar(promote_type(typeof(A),typeof(B)),promote_type(S,T), promote_shape(size(A),size(B)))  # HERE is the change
            for i=1:length(A)
                @inbounds F[i] = ($f)(A[i], B[i])
            end
            return F
        end
        # interaction with Ranges
        function ($f){S,T<:Real}(A::StridedArray{S}, B::Range{T})
            F = similar(typeof(A),promote_type(S,T), promote_shape(size(A),size(B)))
            i = 1
            for b in B
                @inbounds F[i] = ($f)(A[i], b)
                i += 1
            end
            return F
        end
        function ($f){S<:Real,T}(A::Range{S}, B::StridedArray{T})
            F = similar(typeof(B),promote_type(S,T), promote_shape(size(A),size(B)))
            i = 1
            for a in A
                @inbounds F[i] = ($f)(a, B[i])
                i += 1
            end
            return F
        end
    end
end
for f in (:.+, :.-, :.*, :./, :.\, :.%, :div, :mod, :rem, :&, :|, :$)
    @eval begin
        function ($f){T}(A::Number, B::StridedArray{T})
            F = similar(typeof(B),promote_array_type(typeof(A),T),size(B))   # HERE is the CHANGE
            for i=1:length(B)
                @inbounds F[i] = ($f)(A, B[i])
            end
            return F
        end
        function ($f){T}(A::StridedArray{T}, B::Number)
            F = similar(typeof(A),promote_array_type(typeof(B),T),size(A))  # HERER is the CHANGE
            for i=1:length(A)
                @inbounds F[i] = ($f)(A[i], B)
            end
            return F
        end
    end
end
#--------------------------------
#   Now some changes to abstractarra.jl

function hcat{T}(A::AbstractVecOrMat{T}...)
    nargs = length(A)
    nrows = size(A[1], 1)
    ncols = 0
    dense = true
    for j = 1:nargs
        Aj = A[j]
        dense &= isa(Aj,Array)
        nd = ndims(Aj)
        ncols += (nd==2 ? size(Aj,2) : 1)
        if size(Aj, 1) != nrows; error("number of rows must match"); end
    end
    B = similar(promote_type(typeof(A)...), (nrows, ncols))   #  HERE WAS THE CHANGE!
    pos = 1
    if dense
        for k=1:nargs
            Ak = A[k]
            n = length(Ak)
            copy!(B, pos, Ak, 1, n)
            pos += n
        end
    else
        for k=1:nargs
            Ak = A[k]
            p1 = pos+(isa(Ak,AbstractMatrix) ? size(Ak, 2) : 1)-1
            B[:, pos:p1] = Ak
            pos = p1+1
        end
    end
    return B
end



# -------------- end of redefinition
a=[4 5 6 75 34.2];

type Img{T,N} <: DenseArray{T,N}  # If you derive from DenseArray, then at least the plus works after defining it. However promotion seems to be ignored
    data::DenseArray{T,N}
    Img(x::DenseArray{T,N}) = new(x)
end
Img{T,N}(x::DenseArray{T,N}) = Img{T,N}(x)  # outer constructor (see Point example in the documentation)
Img{T,N}(x::Type{T},dims::NTuple{N,Int64}) = Img{T,N}(Array(T,dims));  # to be compatible with the generators as in array.jl


b=Img(a);
b.data

Base.ndims{T,N}(animg::Img{T,N}) = Base.ndims(animg.data)
Base.size{T,N}(animg::Img{T,N}) = Base.size(animg.data)

Base.promote_type{T,N}(::Type{Img{T,N}},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N}(::Type{None},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N,S}(::Type{S},::Type{Img{T,N}}) = Img{T,N}

Base.promote_type{T,N}(::Type{Img{T,N}},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N}(::Type{Img{T,N}},::Type{None}) = Img{T,N}
Base.promote_type{T,N,S}(::Type{Img{T,N}},::Type{S}) = Img{T,N}

Base.broadcast(f::Function, A1::Img, As...) = Img(broadcast(f,A1,As...)); # broadcast!(f,Img(Base.promote_eltype(A1,As...),Base.broadcast_shape(A1,As...)), A1, As...); # Base.similar(animg.data,varargs...)

Base.getindex{T,N}(animg::Img{T,N}, anind::Real) = Base.getindex(animg.data, anind)
Base.getindex{T,N}(animg::Img{T,N}, anind::DenseArray{T,N}) = Base.getindex(animg.data, anind)
Base.getindex{T,N}(animg::Img{T,N}, anind::AbstractArray{T,N}) = Base.getindex(animg.data, anind)
Base.getindex{T,N}(animg::Img{T,N}, varargs...) = Base.getindex(animg.data, varargs...)
Base.setindex!{T,N}(animg::Img{T,N}, varargs...) = Base.setindex!(animg.data, varargs...)

# ----------------
# Now similar definitions for the new type:
Base.similar{T,N}(::Img{T,N}, ET, dims::Dims) = Img(ET, dims)
# new nomenclature
Base.similar{T}(::Type{Img{T,1}}, dims::Dims) = Img(T, dims)
Base.similar{T}(::Type{Img{T,2}}, dims::Dims) = Img(T, dims)
Base.similar{T,N}(::Type{Img{T,N}}, dims::Dims) = Img(T, dims)
Base.similar{T}(::Type{Img{T,1}}, ET, dims::Dims) = Img(ET, dims)
Base.similar{T}(::Type{Img{T,2}}, ET, dims::Dims) = Img(ET, dims)
Base.similar{T,N}(::Type{Img{T,N}}, ET, dims::Dims) = Img(ET, dims)
# (x::Type{T},dims::NTuple{N,Int64})

Base.convert{T,N}(::Type{Img{T,N}}, x::Array{T,N}) = Img{T,N}(x)
Base.convert{T,N}(::Type{Array{T,N}}, animg::Img{T,N}) = Base.convert(Array{T,N},animg.data)   

b
c=b+b;
a+b
b+a

[a b]
[b a]

mcg1969 · 2014-08-06T18:25:25Z

I just ran into some of these issues and posted about it over on julia-users.

I'm going to post a separate issue about hcat/vcat, but in brief, I have an abstract array of objects and I do not want to allow AbstractArray's cat function to do concatenation on them. The solution I have, for now is to reach into the non-exported namespace and override Base.cat_t:

Base.cat_t{T}( catdim::Integer, ::Type{Scalar{T}}, X::Base.AbstractArray... ) = cat_cvxa( catdim, X )

In abstractarray.jl, cat calls this function after using promote_eltype to determine the final array type.

cat(catdim::Integer, A::AbstractArray...) =
    cat_t(catdim, promote_eltype(A...), A...)

By overloading it as I have, I can intercept every call to cat_t whose elements are promoted up to my type. I really like the fact that cat is not doing any sort of conversion for me; I get to see all of the inputs in their original form, and do conversion only if/when I need to.

So far, so good, but this isn't quite enough when we're talking about different storage strategies. For example, what if we're using hcat with a mixture of dense and sparse matrices? It would be nice if the sparse matrix class could, say, do a quick estimate of the final number of nonzeros; and if the result is worth storing in a sparse matrix, it can "take over" the cat_t call; otherwise, the DenseArray output could be the default.

It's important to facilitate the overriding of default behavior here. The current implementations of hcat, vcat, and cat in abstractarray.jl do a lot of single-element access and other types of access that would not be optimal for certain storage strategies.

The one complication in the promote_array strategy is that the right promotion might be operation dependent. So promote_array may need to accept as an argument some sort of input that specifies the operation being performed: concatenation, summing, reduction, etc.

madeleineudell · 2016-09-13T16:00:09Z

This has been a major problem in Convex.jl; we've figured out a solution, but it's brittle, tricky, and computationally expensive. We want to be able to define vcat in a special way when at least one argument is a Convex.jl AbstractExpr without breaking vcat globally.

jump-dev/Convex.jl#145

Sacha0 · 2016-09-14T19:02:31Z

Leaving a bread crumb to #17660 (comment), which could help in the near term.

• fixes for cat

dlfivefifty · 2017-08-15T13:51:23Z

Could a design similar to broadcasts Base.Broadcast.promote_containertype be useful here?

nalimilan · 2017-08-24T13:29:19Z

@dlfivefifty I think so, see #20815.

dlfivefifty · 2018-07-22T19:23:58Z

Also, I think there should be a vcatto!(dest, src) or similar, a la copyto!. I'm about to add this to LazyLinearAlgebra.jl

vtjnash · 2023-04-11T15:08:58Z

Partially fixed by #19305?

This is type-piracy, but we cannot change that (until JuliaLang/julia#2326), so at least do not make these method intersections unnecessary slow and complicated for everyone who does not care about SparseArrays and does not load it, and unreliable for everyone who does load it.

This used to make a lot of references to design issues with the SparseArrays package (#2326 / #20815), which result in a non-sensical dispatch arrangement, and contribute to a slow loading experience do to the nonsense Unions that must be checked by subtyping.

This is type-piracy, but we cannot change that (until JuliaLang/julia#2326), so at least do not make these method intersections unnecessary slow and complicated for everyone who does not care about SparseArrays and does not load it, and unreliable for everyone who does load it.

This used to make a lot of references to design issues with the SparseArrays package (#2326 / #20815), which result in a non-sensical dispatch arrangement, and contribute to a slow loading experience do to the nonsense Unions that must be checked by subtyping.

This used to make a lot of references to design issues with the SparseArrays package (#2326 / #20815), which result in a non-sensical dispatch arrangement, and contribute to a slow loading experience do to the nonsense Unions that must be checked by subtyping. (cherry picked from commit 5a922fa)

This was referenced Feb 17, 2013

rbind revision JuliaData/DataFrames.jl#197

Merged

make rbind() work with DFs with non-identical column sets JuliaData/DataFrames.jl#44

Closed

JeffBezanson mentioned this issue Mar 16, 2014

Use similar instead of Array in array.jl #6182

Merged

JeffBezanson mentioned this issue May 29, 2014

promote_rule with automatic return type casting? #7014

Closed

mcg1969 mentioned this issue Aug 6, 2014

abstractarray.jl: hcat/vcat/cat are not sufficiently abstract? #7865

Closed

mbauman mentioned this issue Jun 5, 2015

What is the official similar interface? #11574

Closed

Sacha0 mentioned this issue Jul 28, 2016

Remove full from similar(full(X), T, dims) calls in generic concatenation methods #17660

Merged

JeffBezanson mentioned this issue Sep 13, 2016

Add a promotion mechanism to choose array concatenation return type #18472

Closed

nalimilan mentioned this issue Nov 22, 2016

Type-stabilizing array concatenations #19387

Merged

nalimilan mentioned this issue Jan 1, 2017

broadcast makes sparse matrix full #11474

Closed

This was referenced Feb 24, 2017

address vcat return inconsistency JuliaStats/NullableArrays.jl#187

Closed

RFC: Introduce promotion mechanism for concatenation #20815

Open

dlfivefifty referenced this issue in JuliaApproximation/ApproxFun.jl Aug 15, 2017

• getindex -> factor for ProductDomain and TensorSpace

d9aa749

• fixes for cat

mbauman added the domain:arrays [a, r, r, a, y, s] label Aug 24, 2017

mbauman mentioned this issue May 10, 2018

Infinite length ranges inconsistently implemented #15562

Open

mcabbott mentioned this issue Jan 17, 2022

Can sparse cat apply only to numbers? JuliaSparse/SparseArrays.jl#71

Open

vtjnash mentioned this issue Apr 11, 2023

improve cat design JuliaSparse/SparseArrays.jl#384

Merged

vtjnash mentioned this issue Apr 11, 2023

improve cat design / performance #49322

Merged

vtjnash mentioned this issue Oct 24, 2023

Dispatch on a type being anywhere in a signature #39878

Open

vtjnash mentioned this issue Dec 13, 2023

Suggest vcat and hcat derive from cat #36112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vcat with different AbstractVectors #2326

vcat with different AbstractVectors #2326

tshort commented Feb 16, 2013

JeffBezanson commented Mar 4, 2013

tshort commented Mar 4, 2013

ViralBShah commented Apr 27, 2014

RainerHeintzmann commented Jun 1, 2014

mcg1969 commented Aug 6, 2014

madeleineudell commented Sep 13, 2016

Sacha0 commented Sep 14, 2016

dlfivefifty commented Aug 15, 2017

nalimilan commented Aug 24, 2017

dlfivefifty commented Jul 22, 2018

vtjnash commented Apr 11, 2023

vcat with different AbstractVectors #2326

vcat with different AbstractVectors #2326

Comments

tshort commented Feb 16, 2013

JeffBezanson commented Mar 4, 2013

tshort commented Mar 4, 2013

ViralBShah commented Apr 27, 2014

RainerHeintzmann commented Jun 1, 2014

mcg1969 commented Aug 6, 2014

madeleineudell commented Sep 13, 2016

Sacha0 commented Sep 14, 2016

dlfivefifty commented Aug 15, 2017

nalimilan commented Aug 24, 2017

dlfivefifty commented Jul 22, 2018

vtjnash commented Apr 11, 2023