Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add broadcasting of AbstractDataFrame #1840

Merged
merged 30 commits into from
Jun 23, 2019
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
8bb061d
add broadcasting of AbstractDataFrame
bkamins Jun 8, 2019
6a6e954
switch to Tables.allocatecolumn
bkamins Jun 8, 2019
6f151ac
one more fix
bkamins Jun 8, 2019
628229b
revert similar and fix tests
bkamins Jun 8, 2019
8af3a0f
Apply suggestions from code review
bkamins Jun 9, 2019
c9a98bd
corrections after code review
bkamins Jun 9, 2019
5b4cc36
fix typo
bkamins Jun 9, 2019
24566e4
fix broadcasting assignment bug
bkamins Jun 20, 2019
2b82f79
fix SubDataFrame case
bkamins Jun 20, 2019
3810f75
add unaliasing of data frame against data frame
bkamins Jun 20, 2019
8e335f1
small fixes in legacy code
bkamins Jun 20, 2019
80a131e
optimized broadcasting
bkamins Jun 20, 2019
82e53a6
correct unaliasing
bkamins Jun 20, 2019
87206a2
small performance optimization
bkamins Jun 20, 2019
fba7cef
performance improvements
bkamins Jun 20, 2019
0e63fb8
add more broadcasting tests
bkamins Jun 20, 2019
bb4862b
more tests
bkamins Jun 20, 2019
5b8d2ec
Merge branch 'master' into new_dataframe_broadcasting
bkamins Jun 21, 2019
699cb6b
Merge branch 'master' into new_dataframe_broadcasting
bkamins Jun 21, 2019
6780b26
even more tests
bkamins Jun 21, 2019
432a530
getcolbc cleanup
bkamins Jun 21, 2019
3fdf733
fix after a code review
bkamins Jun 21, 2019
a67a3f5
unalias optimizations
bkamins Jun 21, 2019
5784100
more tests for common cases
bkamins Jun 21, 2019
b1813e1
improve helper signature
bkamins Jun 21, 2019
6a87c42
minor improvements
bkamins Jun 22, 2019
27c730a
minor improvements 2
bkamins Jun 22, 2019
9a68d30
Apply suggestions from code review
bkamins Jun 23, 2019
5d3ec40
fixes after code review
bkamins Jun 23, 2019
1f46086
Fix indentation
nalimilan Jun 23, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/src/lib/indexing.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ For performance reasons, accessing, via `getindex` or `view`, a single `row` and
* `df[col]` -> the vector contained in column `col`;
* `df[cols]` -> a freshly allocated `DataFrame` containing the copies of vectors contained in columns `cols`;
* `df[row, col]` -> the value contained in row `row` of column `col`, the same as `df[col][row]`;
* `df[CartesianIndex(row, col)]` -> the same as `df[row,col]`;
* `df[row, cols]` -> a `DataFrameRow` with parent `df` if `cols` is a colon and `df[cols]` otherwise;
* `df[rows, col]` -> a copy of the vector `df[col]` with only the entries corresponding to `rows` selected, the same as `df[col][rows]`;
* `df[rows, cols]` -> a `DataFrame` containing copies of columns `cols` with only the entries corresponding to `rows` selected.
Expand Down Expand Up @@ -82,6 +83,11 @@ Under construction

## Broadcasting

The following broadcasting rules apply to `AbstractDataFrame` objects:
* `AbstractDataFrame` behaves in broadcasting like a two-dimensional collection compatible with matrices.
* If an `AbstractDataFrame` takes part in broadcasting then a `DataFrame` is always produced as a result.
* If multiple `AbstractDataFrame` objects take part in broadcasting then they have to have identical column names.

It is possible to assign a value to `AbstractDataFrame` and `DataFrameRow` objects using the `.=` operator.
In such an operation `AbstractDataFrame` is considered as two-dimensional and `DataFrameRow` as single-dimensional.

Expand Down
63 changes: 63 additions & 0 deletions src/other/broadcasting.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,66 @@
### Broadcasting

Base.getindex(df::AbstractDataFrame, idx::CartesianIndex{2}) = df[idx[1], idx[2]]
Base.setindex!(df::AbstractDataFrame, val, idx::CartesianIndex{2}) =
(df[idx[1], idx[2]] = val)

Base.broadcastable(df::AbstractDataFrame) = df

struct DataFrameStyle <: Base.Broadcast.BroadcastStyle end

Base.Broadcast.BroadcastStyle(::Type{<:AbstractDataFrame}) =
DataFrameStyle()

Base.Broadcast.BroadcastStyle(::DataFrameStyle, ::Base.Broadcast.BroadcastStyle) = DataFrameStyle()
Base.Broadcast.BroadcastStyle(::Base.Broadcast.BroadcastStyle, ::DataFrameStyle) = DataFrameStyle()
Base.Broadcast.BroadcastStyle(::DataFrameStyle, ::DataFrameStyle) = DataFrameStyle()

function copyto_widen!(res::AbstractVector{T},
bc::Base.Broadcast.Broadcasted{DataFrameStyle},
pos, col) where T
for i in pos:length(axes(bc)[1])
bkamins marked this conversation as resolved.
Show resolved Hide resolved
val = bc[CartesianIndex(i, col)]
bkamins marked this conversation as resolved.
Show resolved Hide resolved
S = typeof(val)
if S <: T || promote_type(S, T) <: T
res[i] = val
else
newres = similar(Vector{promote_type(S, T)}, length(res))
copyto!(newres, 1, res, 1, i-1)
newres[i] = val
return copyto_widen!(newres, bc, i + 1, 2)
end
end
return res
end

function Base.copy(bc::Base.Broadcast.Broadcasted{DataFrameStyle})
bcf = Base.Broadcast.flatten(bc)
colnames = unique([_names(df) for df in bcf.args if df isa AbstractDataFrame])
if length(colnames) != 1
wrongnames = setdiff(union(colnames...), intersect(colnames...))
msg = join(wrongnames, ", ", " and ")
throw(ArgumentError("Column names in broadcasted data frames must match. " *
"Non matching column names are $msg"))
end
nrows = length(axes(bc)[1])
df = DataFrame()
for i in axes(bc)[2]
if nrows == 0
col = Any[]
else
v1 = bc[CartesianIndex(1, i)]
startcol = similar(Vector{typeof(v1)}, nrows)
startcol[1] = v1
col = copyto_widen!(startcol, bc, 2, i)
end
df[colnames[1][i]] = col
end
return df
end


### Broadcasting assignment

struct LazyNewColDataFrame
df::DataFrame
col::Symbol
Expand Down
85 changes: 84 additions & 1 deletion test/broadcasting.jl
Original file line number Diff line number Diff line change
@@ -1,11 +1,94 @@
module TestBroadcasting

using Test, DataFrames
using Test, DataFrames, PooledArrays

const ≅ = isequal

refdf = DataFrame(reshape(1.5:15.5, (3,5)))

@testset "broadcasting of AbstractDataFrame objects" begin
for df in (copy(refdf), view(copy(refdf), :, :))
@test identity.(df) == refdf
bkamins marked this conversation as resolved.
Show resolved Hide resolved
@test identity.(df) !== df
@test (x->x).(df) == refdf
@test (x->x).(df) !== df
@test (df .+ df) ./ 2 == refdf
bkamins marked this conversation as resolved.
Show resolved Hide resolved
@test (df .+ df) ./ 2 !== df
@test df .+ Matrix(df) == 2 .* df
@test Matrix(df) .+ df == 2 .* df
@test (Matrix(df) .+ df .== 2 .* df) == DataFrame(trues(size(df)), names(df))
@test df .+ 1 == df .+ ones(size(df))
@test df .+ axes(df, 1) == DataFrame(Matrix(df) .+ axes(df, 1), names(df))
@test df .+ permutedims(axes(df, 2)) == DataFrame(Matrix(df) .+ permutedims(axes(df, 2)), names(df))
end

df1 = copy(refdf)
df2 = view(copy(refdf), :, :)
@test (df1 .+ df2) ./ 2 == refdf
@test (df1 .- df2) == DataFrame(zeros(size(refdf)))
@test (df1 .* df2) == refdf .^ 2
@test (df1 ./ df2) == DataFrame(ones(size(refdf)))
end

@testset "broadcasting of AbstractDataFrame objects errors" begin
df = copy(refdf)
dfv = view(df, :, 2:ncol(df))

@test_throws DimensionMismatch df .+ dfv
@test_throws DimensionMismatch df .+ df[2:end, :]

@test_throws DimensionMismatch df .+ [1, 2]
@test_throws DimensionMismatch df .+ [1 2]
@test_throws DimensionMismatch df .+ rand(2,2)
@test_throws DimensionMismatch dfv .+ [1, 2]
@test_throws DimensionMismatch dfv .+ [1 2]
@test_throws DimensionMismatch dfv .+ rand(2,2)

df2 = copy(df)
names!(df2, [:x1, :x2, :x3, :x4, :y])
@test_throws ArgumentError df .+ df2
@test_throws ArgumentError df .+ 1 .+ df2
end

@testset "broadcasting of AbstractDataFrame objects corner cases" begin
df = DataFrame(c11 = categorical(["a", "b"]), c12 = categorical([missing, "b"]), c13 = categorical(["a", missing]),
c21 = categorical([1, 2]), c22 = categorical([missing, 2]), c23 = categorical([1, missing]),
p11 = PooledArray(["a", "b"]), p12 = PooledArray([missing, "b"]), p13 = PooledArray(["a", missing]),
p21 = PooledArray([1, 2]), p22 = PooledArray([missing, 2]), p23 = PooledArray([1, missing]),
b1 = [true, false], b2 = [missing, false], b3 = [true, missing],
f1 = [1.0, 2.0], f2 = [missing, 2.0], f3 = [1.0, missing],
s1 = ["a", "b"], s2 = [missing, "b"], s3 = ["a", missing])

df2 = DataFrame(c11 = categorical(["a", "b"]), c12 = [nothing, "b"], c13 = ["a", nothing],
nalimilan marked this conversation as resolved.
Show resolved Hide resolved
c21 = categorical([1, 2]), c22 = [nothing, 2], c23 = [1, nothing],
p11 = ["a", "b"], p12 = [nothing, "b"], p13 = ["a", nothing],
p21 = [1, 2], p22 = [nothing, 2], p23 = [1, nothing],
b1 = [true, false], b2 = [nothing, false], b3 = [true, nothing],
f1 = [1.0, 2.0], f2 = [nothing, 2.0], f3 = [1.0, nothing],
s1 = ["a", "b"], s2 = [nothing, "b"], s3 = ["a", nothing])

@test df ≅ identity.(df)
@test df ≅ (x->x).(df)
df3 = coalesce.(df, nothing)
@test df2 == df3
@test eltypes(df2) == eltypes(df3)
for i in axes(df, 2)
@test typeof(df2[i]) == typeof(df3[i])
end
df4 = (x -> df[1,1]).(df)
@test names(df4) == names(df)
@test all(isa.(eachcol(df4), Ref(CategoricalArray)))
@test all(eachcol(df4) .== Ref(categorical(["a", "a"])))

df5 = DataFrame(x = Any[1, 2, 3], y = Any[1, 2.0, big(3)])
@test identity.(df5) == df5
@test (x->x).(df5) == df5
@test df5 .+ 1 == DataFrame(Matrix(df5) .+ 1, names(df5))
@test eltypes(identity.(df5)) == [Int, BigFloat]
@test eltypes((x->x).(df5)) == [Int, BigFloat]
@test eltypes(df5 .+ 1) == [Int, BigFloat]
end

@testset "normal data frame and data frame row in broadcasted assignment - one column" begin
df = copy(refdf)
df[1] .+= 1
Expand Down