Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add row-wise macros #267

Merged
merged 16 commits into from
Jul 25, 2021
10 changes: 7 additions & 3 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ In addition, DataFramesMeta provides

* `@orderby`, for sorting data frames
* `@subset` and `@subset!`, for keeping rows of a data frame matching a given condition
* Row-wise versions of the above macros in the form of `@rtransform`, `@rtransform!`,
`@rselect`, `@rselect!`, `@rorderby`, `@rsubset`, and `@rsubset!`.
* `@by`, for grouping and combining a data frame in a single step
* `@with`, for working with the columns of a data frame with high performance and
convenient syntax
Expand Down Expand Up @@ -272,15 +274,17 @@ df2 = @eachrow df begin
end
```

## Row-wise transformations with `@byrow`
## Row-wise transformations with `@byrow` and `@rtransform`/`@rselect`/etc.

`@byrow` provides a convenient syntax to apply operations by-row,
without having to vectorize manually.
without having to vectorize manually. Additionally, the macros
`@rtransform`, `@rtransform!`, `@rselect`, `@rselect!`,
`@rorderby`, `@rsubset`, and `@rsubset!` use `@byrow` by default.

DataFrames.jl provides the function wrapper `ByRow`. `ByRow(f)(x, y)`
is roughly equivalent to `f.(x, y)`. DataFramesMeta.jl allows users
to construct expressions using `ByRow` function wrapper with the
syntax `@byrow`.
syntax `@byrow` or the row-wise macros `@rtransform`, etc.

`@byrow` is not a "real" macro and cannot be used outside of
DataFramesMeta.jl macros. However its behavior within DataFramesMeta.jl
Expand Down
5 changes: 3 additions & 2 deletions src/DataFramesMeta.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,11 @@ using MacroTools

# Basics:
export @with,
@subset, @subset!,
@orderby,
@subset, @subset!, @rsubset, @rsubset!,
@orderby, @rorderby,
@by, @combine,
@transform, @select, @transform!, @select!,
@rtransform, @rselect, @rtransform!, @rselect!,
@eachrow, @eachrow!,
@byrow,
@based_on, @where # deprecated
Expand Down
130 changes: 130 additions & 0 deletions src/macros.jl
Original file line number Diff line number Diff line change
Expand Up @@ -577,6 +577,26 @@ macro subset(x, args...)
esc(subset_helper(x, args...))
end

function rsubset_helper(x, args...)
exprs, outer_flags = create_args_vector(args...; wrap_byrow=true)
t = (fun_to_vec(ex; no_dest=true, outer_flags=outer_flags) for ex in exprs)
quote
$subset($x, $(t...); skipmissing=true)
end
end


"""
@rsubset(d, i...)

Row-wise version of `@subset`, i.e. all operations use `@byrow` by
default. See [`@subset`](@ref) for details.
"""
macro rsubset(x, args...)
esc(rsubset_helper(x, args...))
end


"""
@subset(x, args...)

Expand All @@ -595,6 +615,15 @@ function subset!_helper(x, args...)
end
end

function rsubset!_helper(x, args...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a way to avoid duplicating these helper functions, for example by passing subset or subset! to a more generic one?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that't what DataFrameMacros does. I will do this in a future PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do it now? :-p

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I've simplified the @byrow check but it's hard to actually make everything in a coherent function

  1. The ugly t = (fun_to_vec(ex; no_dest=true, outer_flags=outer_flags) for ex in exprs) can't be gotten rid of yet because we still need to support some deprecated functionality in @combine.
  2. There are lots of exceptions, for example @subset requires a keyword argument, @orderby doesn't even call a DataFrames function. Writing a function for these cases would be just as easy as writing out the expression.

exprs, outer_flags = create_args_vector(args...; wrap_byrow=true)
t = (fun_to_vec(ex; no_dest=true, outer_flags=outer_flags) for ex in exprs)
quote
$subset!($x, $(t...); skipmissing=true)
end
end


"""
@subset!(d, i...)

Expand Down Expand Up @@ -737,6 +766,17 @@ macro subset!(x, args...)
end


"""
@rsubset!(d, i...)

Row-wise version of `@subset!`, i.e. all operations use `@byrow` by
default. See [`@subset!`](@ref) for details.
"""
macro rsubset!(x, args...)
esc(rsubset!_helper(x, args...))
end


##############################################################################
##
## @orderby
Expand Down Expand Up @@ -899,6 +939,24 @@ macro orderby(d, args...)
esc(orderby_helper(d, args...))
end

function rorderby_helper(x, args...)
exprs, outer_flags = create_args_vector(args...; wrap_byrow=true)
t = (fun_to_vec(ex; gensym_names=true, outer_flags=outer_flags) for ex in exprs)
quote
$DataFramesMeta.orderby($x, $(t...))
end
end

"""
rorderby(d, args...)

Row-wise version of `@orderby`, i.e. all operations use `@byrow` by
default. See [`@orderby`](@ref) for details.
"""
macro rorderby(d, args...)
esc(rorderby_helper(d, args...))
end


##############################################################################
##
Expand Down Expand Up @@ -1017,6 +1075,24 @@ macro transform(x, args...)
esc(transform_helper(x, args...))
end

function rtransform_helper(x, args...)
exprs, outer_flags = create_args_vector(args...; wrap_byrow=true)

t = (fun_to_vec(ex; gensym_names=false, outer_flags=outer_flags) for ex in exprs)
quote
$DataFrames.transform($x, $(t...))
end
end

"""
@rtransform(x, args...)

Row-wise version of `@transform`, i.e. all operations use `@byrow` by
default. See [`@transform`](@ref) for details.
"""
macro rtransform(x, args...)
esc(rtransform_helper(x, args...))
end

##############################################################################
##
Expand Down Expand Up @@ -1113,6 +1189,23 @@ macro transform!(x, args...)
esc(transform!_helper(x, args...))
end

function rtransform!_helper(x, args...)
exprs, outer_flags = create_args_vector(args...; wrap_byrow=true)

t = (fun_to_vec(ex; gensym_names=false, outer_flags=outer_flags) for ex in exprs)
quote
$DataFrames.transform!($x, $(t...))
end
end

"""
@rtransform!(x, args...)

Row-wise version of `@transform!`, i.e. all operations use `@byrow` by
default. See [`@transform!`](@ref) for details."""
macro rtransform!(x, args...)
esc(rtransform_helper(x, args...))
end

##############################################################################
##
Expand Down Expand Up @@ -1227,6 +1320,25 @@ macro select(x, args...)
esc(select_helper(x, args...))
end

function rselect_helper(x, args...)
exprs, outer_flags = create_args_vector(args...; wrap_byrow=true)

t = (fun_to_vec(ex; gensym_names=false, outer_flags=outer_flags) for ex in exprs)
quote
$DataFrames.select($x, $(t...))
end
end

"""
@rselect(x, args...)

Row-wise version of `@select`, i.e. all operations use `@byrow` by
default. See [`@select`](@ref) for details.
"""
macro rselect(x, args...)
esc(rselect_helper(x, args...))
end


##############################################################################
##
Expand Down Expand Up @@ -1336,6 +1448,24 @@ macro select!(x, args...)
esc(select!_helper(x, args...))
end

function rselect!_helper(x, args...)
exprs, outer_flags = create_args_vector(args...; wrap_byrow=true)

t = (fun_to_vec(ex; gensym_names=false, outer_flags=outer_flags) for ex in exprs)
quote
$DataFrames.select($x, $(t...))
end
end

"""
@rselect!(x, args...)

Row-wise version of `@select!`, i.e. all operations use `@byrow` by
default. See [`@select!`](@ref) for details.
"""
macro rselect!(x, args...)
esc(rselect_helper(x, args...))
end

##############################################################################
##
Expand Down
12 changes: 10 additions & 2 deletions src/parsing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -388,14 +388,22 @@ If a `:block` expression, return the `args` of
the block as an array. If a simple expression,
wrap the expression in a one-element vector.
"""
function create_args_vector(arg)
function create_args_vector(arg; wrap_byrow::Bool=false)
arg, outer_flags = extract_macro_flags(MacroTools.unblock(arg))

if wrap_byrow
if outer_flags[Symbol("@byrow")][]
throw(ArgumentError("Redundant @byrow calls"))
end

outer_flags[Symbol("@byrow")][] = true
end

if arg isa Expr && arg.head == :block
x = MacroTools.rmlines(arg).args
else
x = Any[arg]
end

return x, outer_flags
end
end
Loading