Skip to content

Commit

Permalink
Allow multicolumn transformations for AbstractDataFrame (#2461)
Browse files Browse the repository at this point in the history
  • Loading branch information
bkamins committed Oct 9, 2020
1 parent 9c98e09 commit 4ec8009
Show file tree
Hide file tree
Showing 7 changed files with 829 additions and 228 deletions.
4 changes: 4 additions & 0 deletions NEWS.md
Expand Up @@ -2,6 +2,10 @@

## Breaking changes

* the rules for transformations passed to `select`/`select!`, `transform`/`transform!`,
and `combine` have been made more flexible; in particular now it is allowed to
return multiple columns from a transformation function
[#2461](https://github.com/JuliaData/DataFrames.jl/pull/2461)
* CategoricalArrays.jl is no longer reexported: call `using CategoricalArrays`
to use it [#2404]((https://github.com/JuliaData/DataFrames.jl/pull/2404)).
In the same vein, the `categorical` and `categorical!` functions
Expand Down
3 changes: 2 additions & 1 deletion docs/src/lib/types.md
Expand Up @@ -55,7 +55,8 @@ The `ByRow` type is a special type used for selection operations to signal that
to each element (row) of the selection.

The `AsTable` type is a special type used for selection operations to signal that the columns selected by a wrapped
selector should be passed as a `NamedTuple` to the function.
selector should be passed as a `NamedTuple` to the function or to signal that it is requested
to expand the return value of a transformation into multiple columns.

## [The design of handling of columns of a `DataFrame`](@id man-columnhandling)

Expand Down
8 changes: 8 additions & 0 deletions docs/src/man/getting_started.md
Expand Up @@ -632,6 +632,14 @@ julia> select(df, :x2, :x2 => ByRow(sqrt)) # transform columns by row
├─────┼───────┼─────────┤
│ 1 │ 3 │ 1.73205 │
│ 2 │ 4 │ 2.0 │
julia> select(df, AsTable(:) => ByRow(extrema) => [:lo, :hi]) # return multiple columns
2×2 DataFrame
│ Row │ lo │ hi │
│ │ Int64 │ Int64 │
├─────┼───────┼───────┤
│ 1 │ 1 │ 5 │
│ 2 │ 2 │ 6 │
```

It is important to note that `select` always returns a data frame,
Expand Down
708 changes: 522 additions & 186 deletions src/abstractdataframe/selection.jl

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion src/groupeddataframe/splitapplycombine.jl
Expand Up @@ -502,7 +502,7 @@ function _combine_prepare(gd::GroupedDataFrame,
for p in cs
if p === nrow
push!(cs_vec, nrow => :nrow)
elseif p isa AbstractVector{<:Pair}
elseif p isa AbstractVecOrMat{<:Pair}
append!(cs_vec, p)
else
push!(cs_vec, p)
Expand Down
14 changes: 6 additions & 8 deletions test/grouping.jl
Expand Up @@ -1989,8 +1989,10 @@ end
[df DataFrame(x_function=[(-1,), (-2,) ,(-3,) ,(-4,) ,(-5,)],
y_function=[(-6,), (-7,) ,(-8,) ,(-9,) ,(-10,)])]

@test_throws ArgumentError combine(gdf, AsTable([:x, :y]) => ByRow(identity))
@test_throws ArgumentError combine(gdf, AsTable([:x, :y]) => ByRow(x -> df[1, :]))
@test combine(gdf, AsTable([:x, :y]) => ByRow(identity)) ==
DataFrame(g=[1,1,1,2,2], x_y_identity=ByRow(identity)((x=1:5, y=6:10)))
@test combine(gdf, AsTable([:x, :y]) => ByRow(x -> df[1, :])) ==
DataFrame(g=[1,1,1,2,2], x_y_function=fill(df[1, :], 5))
end

@testset "test correctness of ungrouping" begin
Expand Down Expand Up @@ -2710,12 +2712,8 @@ end
@test isequal_typed(combine(df, :x => (x -> 1:2) => :y), DataFrame(y=1:2))
@test isequal_typed(combine(df, :x => (x -> x isa Vector{Int} ? "a" : 'a') => :y),
DataFrame(y="a"))

# in the future this should be DataFrame(nrow=0)
@test_throws ArgumentError combine(nrow, df)

# in the future this should be DataFrame(a=1,b=2)
@test_throws ArgumentError combine(sdf -> DataFrame(a=1,b=2), df)
@test combine(nrow, df) == DataFrame(nrow=0)
@test combine(sdf -> DataFrame(a=1,b=2), df) == DataFrame(a=1,b=2)
end

@testset "disallowed tuple column selector" begin
Expand Down

0 comments on commit 4ec8009

Please sign in to comment.