Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcat with different AbstractVectors #2326

Open
tshort opened this issue Feb 16, 2013 · 11 comments
Open

vcat with different AbstractVectors #2326

tshort opened this issue Feb 16, 2013 · 11 comments
Labels
domain:arrays [a, r, r, a, y, s]

Comments

@tshort
Copy link
Contributor

tshort commented Feb 16, 2013

Currently, vcat on AbstractArrays uses the first argument to determine the type of AbstractArray to use for the result. A nice feature would be to have some sort of promotion that determines the result. In the example below, it would be nice to have a way to define that both of them return a DataArray (from the DataFrames package).

julia> [DataVector[1,3], [1,2]]
4-element Int64 DataArray:
 1
 3
 1
 2

julia> [[1,2],DataVector[1,3]]
4-element Int64 Array:
 1
 2
 1
 3
@JeffBezanson
Copy link
Sponsor Member

Ouch this is a tricky one.

@tshort
Copy link
Contributor Author

tshort commented Mar 4, 2013

It's not a super high priority for DataFrames since we defined a local function to replace vcat for thisrbind issue.

Something like this may also help another DataFrame issue: making [1,3,5,NA] return a DataArray rather than an Array. I thought there was a separate issue on that, but I couldn't find it.

@ViralBShah
Copy link
Member

Is this something that we still need to worry about? It doesn't seem like there is much we can do at this stage. An error is at least better, I would think, as is the case right now.

julia> [DataVector[1,3], [1,2]]
ERROR: no method convert(Type{DataArray{T,1}}, Int64)

@RainerHeintzmann
Copy link

I think there is a solution to this problem. The core of it is that the
similar
function is called only for one of the arguments. This can be fixed by

  1. Defining (overloading) a series of "similar" functions which accept a type rather than an actual array:
Base.similar{T}(::Type{Array{T,1}}, dims::Dims) = Array(T, dims)
Base.similar{T}(::Type{Array{T,2}}, dims::Dims) = Array(T, dims)
Base.similar{T,N}(::Type{Array{T,N}}, dims::Dims) = Array(T, dims)
Base.similar{T}(::Type{Array{T,1}}, ET, dims::Dims) = Array(ET, dims)
Base.similar{T}(::Type{Array{T,2}}, ET, dims::Dims) = Array(ET, dims)
Base.similar{T,N}(::Type{Array{T,N}}, ET, dims::Dims) = Array(ET, dims)
  1. replacing the appropriate line in the hcat definition:
    B = similar(full(A[1]), nrows, ncols)

with

    B = similar(promote_type(typeof(A)...), (nrows, ncols))  
  1. Having Promote rules for the appropriate datatype (in my case it was my own derived Img class):
Base.promote_type{T,N}(::Type{Img{T,N}},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N}(::Type{None},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N,S}(::Type{S},::Type{Img{T,N}}) = Img{T,N}

Base.promote_type{T,N}(::Type{Img{T,N}},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N}(::Type{Img{T,N}},::Type{None}) = Img{T,N}
Base.promote_type{T,N,S}(::Type{Img{T,N}},::Type{S}) = Img{T,N}

I know the latter is a bit of a hack, as one should usually use the promote_rule mechanism and possibly also only for the promotion in cooperation with array classes (not any type as shown here). But somehow I did not get the "promote_rule" mechanism to work properly for the generic types.

This then also allowed to encapsulate an array in my Img class such that Array functions such as "+" will automatically work. In array.jl this required similar changes in the loops which generate the "+" routine, such that then the promotion rules are automatically used.
Here is the whole example which demonstrates how to create a wrapped array class such that + and alike handle it correctly.
Still issues to do:
- Also make the .+ .* functions work which use the broadcast mechanism
- Apply this trick to all other appropriate functions (more concatenations) in array.jl
- Test whether the performance may be compromised by this mechanism
- Make a macro : @Encapsulate which automatically creates the necessary code for encapsulating array types
- Clean up the code in existing wrapped classes. Image? Dataframes? such that redefinitions of the standard array operations are only necessary when the meaning should really change.
- Overload the show function for the array class (can anybody help here? "show" seems not to be used in the REPL for arrays. So which is the correct function to specialize?)

Here is example code that demonstrates how to create a "derived" array which (so far partly) behaves the way it should. Note that the start of the program below are changes to the Julia core, which one may discuss to implement them permanently (after speed tests).

# Changes to make to array.jl
# This is an overload such that similar also works with types as input
Base.similar{T}(::Type{Array{T,1}}, dims::Dims) = Array(T, dims)
Base.similar{T}(::Type{Array{T,2}}, dims::Dims) = Array(T, dims)
Base.similar{T,N}(::Type{Array{T,N}}, dims::Dims) = Array(T, dims)
Base.similar{T}(::Type{Array{T,1}}, ET, dims::Dims) = Array(ET, dims)
Base.similar{T}(::Type{Array{T,2}}, ET, dims::Dims) = Array(ET, dims)
Base.similar{T,N}(::Type{Array{T,N}}, ET, dims::Dims) = Array(ET, dims)

import Base.promote_array_type  # needed for the definitions below

# The code below is what needs to be changed in array.jl
# only the "A" was replaced with promote_type(typeof(A),typeof(B)) in "similar"
for f in (:+, :-, :div, :mod, :&, :|, :$)
    @eval begin
        function ($f){S,T}(A::StridedArray{S}, B::StridedArray{T})
            F = similar(promote_type(typeof(A),typeof(B)),promote_type(S,T), promote_shape(size(A),size(B)))  # HERE is the change
            for i=1:length(A)
                @inbounds F[i] = ($f)(A[i], B[i])
            end
            return F
        end
        # interaction with Ranges
        function ($f){S,T<:Real}(A::StridedArray{S}, B::Range{T})
            F = similar(typeof(A),promote_type(S,T), promote_shape(size(A),size(B)))
            i = 1
            for b in B
                @inbounds F[i] = ($f)(A[i], b)
                i += 1
            end
            return F
        end
        function ($f){S<:Real,T}(A::Range{S}, B::StridedArray{T})
            F = similar(typeof(B),promote_type(S,T), promote_shape(size(A),size(B)))
            i = 1
            for a in A
                @inbounds F[i] = ($f)(a, B[i])
                i += 1
            end
            return F
        end
    end
end
for f in (:.+, :.-, :.*, :./, :.\, :.%, :div, :mod, :rem, :&, :|, :$)
    @eval begin
        function ($f){T}(A::Number, B::StridedArray{T})
            F = similar(typeof(B),promote_array_type(typeof(A),T),size(B))   # HERE is the CHANGE
            for i=1:length(B)
                @inbounds F[i] = ($f)(A, B[i])
            end
            return F
        end
        function ($f){T}(A::StridedArray{T}, B::Number)
            F = similar(typeof(A),promote_array_type(typeof(B),T),size(A))  # HERER is the CHANGE
            for i=1:length(A)
                @inbounds F[i] = ($f)(A[i], B)
            end
            return F
        end
    end
end
#--------------------------------
#   Now some changes to abstractarra.jl

function hcat{T}(A::AbstractVecOrMat{T}...)
    nargs = length(A)
    nrows = size(A[1], 1)
    ncols = 0
    dense = true
    for j = 1:nargs
        Aj = A[j]
        dense &= isa(Aj,Array)
        nd = ndims(Aj)
        ncols += (nd==2 ? size(Aj,2) : 1)
        if size(Aj, 1) != nrows; error("number of rows must match"); end
    end
    B = similar(promote_type(typeof(A)...), (nrows, ncols))   #  HERE WAS THE CHANGE!
    pos = 1
    if dense
        for k=1:nargs
            Ak = A[k]
            n = length(Ak)
            copy!(B, pos, Ak, 1, n)
            pos += n
        end
    else
        for k=1:nargs
            Ak = A[k]
            p1 = pos+(isa(Ak,AbstractMatrix) ? size(Ak, 2) : 1)-1
            B[:, pos:p1] = Ak
            pos = p1+1
        end
    end
    return B
end



# -------------- end of redefinition
a=[4 5 6 75 34.2];

type Img{T,N} <: DenseArray{T,N}  # If you derive from DenseArray, then at least the plus works after defining it. However promotion seems to be ignored
    data::DenseArray{T,N}
    Img(x::DenseArray{T,N}) = new(x)
end
Img{T,N}(x::DenseArray{T,N}) = Img{T,N}(x)  # outer constructor (see Point example in the documentation)
Img{T,N}(x::Type{T},dims::NTuple{N,Int64}) = Img{T,N}(Array(T,dims));  # to be compatible with the generators as in array.jl


b=Img(a);
b.data

Base.ndims{T,N}(animg::Img{T,N}) = Base.ndims(animg.data)
Base.size{T,N}(animg::Img{T,N}) = Base.size(animg.data)

Base.promote_type{T,N}(::Type{Img{T,N}},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N}(::Type{None},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N,S}(::Type{S},::Type{Img{T,N}}) = Img{T,N}

Base.promote_type{T,N}(::Type{Img{T,N}},::Type{Img{T,N}}) = Img{T,N}
Base.promote_type{T,N}(::Type{Img{T,N}},::Type{None}) = Img{T,N}
Base.promote_type{T,N,S}(::Type{Img{T,N}},::Type{S}) = Img{T,N}

Base.broadcast(f::Function, A1::Img, As...) = Img(broadcast(f,A1,As...)); # broadcast!(f,Img(Base.promote_eltype(A1,As...),Base.broadcast_shape(A1,As...)), A1, As...); # Base.similar(animg.data,varargs...)

Base.getindex{T,N}(animg::Img{T,N}, anind::Real) = Base.getindex(animg.data, anind)
Base.getindex{T,N}(animg::Img{T,N}, anind::DenseArray{T,N}) = Base.getindex(animg.data, anind)
Base.getindex{T,N}(animg::Img{T,N}, anind::AbstractArray{T,N}) = Base.getindex(animg.data, anind)
Base.getindex{T,N}(animg::Img{T,N}, varargs...) = Base.getindex(animg.data, varargs...)
Base.setindex!{T,N}(animg::Img{T,N}, varargs...) = Base.setindex!(animg.data, varargs...)

# ----------------
# Now similar definitions for the new type:
Base.similar{T,N}(::Img{T,N}, ET, dims::Dims) = Img(ET, dims)
# new nomenclature
Base.similar{T}(::Type{Img{T,1}}, dims::Dims) = Img(T, dims)
Base.similar{T}(::Type{Img{T,2}}, dims::Dims) = Img(T, dims)
Base.similar{T,N}(::Type{Img{T,N}}, dims::Dims) = Img(T, dims)
Base.similar{T}(::Type{Img{T,1}}, ET, dims::Dims) = Img(ET, dims)
Base.similar{T}(::Type{Img{T,2}}, ET, dims::Dims) = Img(ET, dims)
Base.similar{T,N}(::Type{Img{T,N}}, ET, dims::Dims) = Img(ET, dims)
# (x::Type{T},dims::NTuple{N,Int64})

Base.convert{T,N}(::Type{Img{T,N}}, x::Array{T,N}) = Img{T,N}(x)
Base.convert{T,N}(::Type{Array{T,N}}, animg::Img{T,N}) = Base.convert(Array{T,N},animg.data)   

b
c=b+b;
a+b
b+a

[a b]
[b a]

@mcg1969
Copy link

mcg1969 commented Aug 6, 2014

I just ran into some of these issues and posted about it over on julia-users.

I'm going to post a separate issue about hcat/vcat, but in brief, I have an abstract array of objects and I do not want to allow AbstractArray's cat function to do concatenation on them. The solution I have, for now is to reach into the non-exported namespace and override Base.cat_t:

Base.cat_t{T}( catdim::Integer, ::Type{Scalar{T}}, X::Base.AbstractArray... ) = cat_cvxa( catdim, X )

In abstractarray.jl, cat calls this function after using promote_eltype to determine the final array type.

cat(catdim::Integer, A::AbstractArray...) =
    cat_t(catdim, promote_eltype(A...), A...)

By overloading it as I have, I can intercept every call to cat_t whose elements are promoted up to my type. I really like the fact that cat is not doing any sort of conversion for me; I get to see all of the inputs in their original form, and do conversion only if/when I need to.

So far, so good, but this isn't quite enough when we're talking about different storage strategies. For example, what if we're using hcat with a mixture of dense and sparse matrices? It would be nice if the sparse matrix class could, say, do a quick estimate of the final number of nonzeros; and if the result is worth storing in a sparse matrix, it can "take over" the cat_t call; otherwise, the DenseArray output could be the default.

It's important to facilitate the overriding of default behavior here. The current implementations of hcat, vcat, and cat in abstractarray.jl do a lot of single-element access and other types of access that would not be optimal for certain storage strategies.

The one complication in the promote_array strategy is that the right promotion might be operation dependent. So promote_array may need to accept as an argument some sort of input that specifies the operation being performed: concatenation, summing, reduction, etc.

@madeleineudell
Copy link

This has been a major problem in Convex.jl; we've figured out a solution, but it's brittle, tricky, and computationally expensive. We want to be able to define vcat in a special way when at least one argument is a Convex.jl AbstractExpr without breaking vcat globally.

jump-dev/Convex.jl#145

@Sacha0
Copy link
Member

Sacha0 commented Sep 14, 2016

Leaving a bread crumb to #17660 (comment), which could help in the near term.

@dlfivefifty
Copy link
Contributor

Could a design similar to broadcasts Base.Broadcast.promote_containertype be useful here?

@nalimilan
Copy link
Member

@dlfivefifty I think so, see #20815.

@dlfivefifty
Copy link
Contributor

Also, I think there should be a vcatto!(dest, src) or similar, a la copyto!. I'm about to add this to LazyLinearAlgebra.jl

@vtjnash
Copy link
Sponsor Member

vtjnash commented Apr 11, 2023

Partially fixed by #19305?

vtjnash added a commit to JuliaSparse/SparseArrays.jl that referenced this issue Apr 11, 2023
This is type-piracy, but we cannot change that (until
JuliaLang/julia#2326), so at least do not make these method
intersections unnecessary slow and complicated for everyone who does not
care about SparseArrays and does not load it, and unreliable for
everyone who does load it.
vtjnash added a commit that referenced this issue Apr 11, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this issue Apr 11, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
ViralBShah pushed a commit to JuliaSparse/SparseArrays.jl that referenced this issue Apr 20, 2023
This is type-piracy, but we cannot change that (until
JuliaLang/julia#2326), so at least do not make these method
intersections unnecessary slow and complicated for everyone who does not
care about SparseArrays and does not load it, and unreliable for
everyone who does load it.
vtjnash added a commit that referenced this issue Apr 20, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this issue Jul 12, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this issue Jul 12, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this issue Jul 12, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this issue Jul 12, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this issue Jul 13, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
vtjnash added a commit that referenced this issue Jul 13, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.
KristofferC pushed a commit that referenced this issue Jul 18, 2023
This used to make a lot of references to design issues with the
SparseArrays package (#2326 /
#20815), which result in a
non-sensical dispatch arrangement, and contribute to a slow loading
experience do to the nonsense Unions that must be checked by subtyping.

(cherry picked from commit 5a922fa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays [a, r, r, a, y, s]
Projects
None yet
Development

No branches or pull requests