Skip to content
This repository has been archived by the owner on May 4, 2019. It is now read-only.

vcat between Arrays and DataArrays #130

Open
tshort opened this issue Nov 29, 2014 · 6 comments
Open

vcat between Arrays and DataArrays #130

tshort opened this issue Nov 29, 2014 · 6 comments

Comments

@tshort
Copy link

tshort commented Nov 29, 2014

The following seems broken:

julia> a = @data([1,NA])
2-element DataArray{Int64,1}:
 1  
  NA

julia> b = [3., 4.]
2-element Array{Float64,1}:
 3.0
 4.0

julia> [a,b]
4-element DataArray{Float64,1}:
 1.0
  NA
 3.0
 4.0

julia> [b,a]
ERROR: `convert` has no method matching convert(::Type{Float64}, ::NAtype)
 in setindex! at array.jl:307
 in setindex! at array.jl:345
 in cat_t at abstractarray.jl:730
 in vcat at abstractarray.jl:736

Both forms call vcat defined in abstractarray.jl. Without the NA, both forms work. Maybe the new Nullable approach will fix this.

@simonster
Copy link
Member

This is because vcat just picks the type of the first array. I think we'll need to overwrite vcat(::AbstractArray...) to return a DataArray/NullableArray if one of the passed arrays is a DataArray/NullableArray.

This also raises a concern. If you have:

a::NullableVector{Float64}
b::Vector{Float64}

Then is it okay if [b, a][1] returns a Nullable and not a Float64?

@tshort
Copy link
Author

tshort commented Dec 23, 2014

Promoting AbstractArrays might get tricky, especially if you include SubArrays, PooledDataArrays, and other yet-to-be-defined AbstractArrays.

@garborg
Copy link
Member

garborg commented Dec 23, 2014

I just made vcat(dfs) do basic container type promotion in this commit. (Not something that could reuse vcat(das) because of the filling-in-missing-columns step.) Anyway, it felt like what we'd ideally have was a function in Base that returned the DataType of the just the container returned by similar.

@tshort
Copy link
Author

tshort commented Feb 10, 2015

Of possible interest is a proposed update to hcat/vcat in base:

JuliaLang/julia#10155

I can't tell if it includes container promotion.

@Jutho
Copy link

Jutho commented Feb 11, 2015

So far, it doesn't. It still depends on a call to similar for one of the arguments. However, the redesigned code certainly makes it easier to change this behavior, and makes it easier to extend/redefine cat behaviour in packages. However, I am not sure how to proceed with container promotion, in te end you need to be able to infer a concrete type that you can easily instantiate with given size and element type.

@nalimilan
Copy link
Member

I feel like we need a standard function in base, say promote_container_type or promote_array_type, which would give you the type of the container you need to 1) create to combine two containers, or 2) to combine elements of different types together.

Example of 1: vcat(::NullableVector{Float64}, ::Vector{Float64}) should create a NullableArray{Float64}
Example of 2: vcat(::Nullable{Float64}, ::Float64) should create a NullableArray{Float64}

This is not needed only in concatenation functions (hence the need for an exported function in Base): for example, to write a recode function which would accept any AbstractArray, and return another AbstractArray with some values replaced with others according to a series of Pair arguments. If the input is Int[1, 2] and you replace 2 with NA/Nullable{Int}(), the result has to be a DataArray{Int}/NullableArray{Int}.

This looks pretty simple to me: just have as a fallback for AbstractArray like this:

promote_array_type{T1 <: AbstractArray, T2 <: AbstractArray}(x::Type{T1}, y::Type{T2}) = Array{promote_type(eltype(x), eltype(y))}

And then special cases, e.g. for DataArray:

promote_array_type{T1 <: AbstractArray, T2 <: DataArray}(x::Type{T1}, y::Type{T2}) = DataArray{promote_type(eltype(x), eltype(y))}

How does that sound? Am I missing something?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants