Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates for 0.6 #73

Merged
merged 18 commits into from
Jul 11, 2017
29 changes: 14 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# DataFramesMeta.jl

[![DataFramesMeta](http://pkg.julialang.org/badges/DataFramesMeta_0.4.svg)](http://pkg.julialang.org/?pkg=DataFramesMeta&ver=0.4)
[![DataFramesMeta](http://pkg.julialang.org/badges/DataFramesMeta_0.5.svg)](http://pkg.julialang.org/?pkg=DataFramesMeta?pkg=DataFramesMeta&ver=0.5)
[![DataFramesMeta](http://pkg.julialang.org/badges/DataFramesMeta_0.6.svg)](http://pkg.julialang.org/?pkg=DataFramesMeta?pkg=DataFramesMeta&ver=0.6)
[![Coveralls](https://coveralls.io/repos/github/JuliaStats/DataFramesMeta.jl/badge.svg?branch=master)](https://coveralls.io/github/JuliaStats/DataFramesMeta.jl?branch=master)
Expand All @@ -16,7 +15,7 @@ These macros improve performance and provide more convenient syntax.

`@with` allows DataFrame columns to be referenced as symbols like
`:colX` in expressions. If an expression is wrapped in `^(expr)`,
`expr` gets passed through untouched. If an expression is wrapped in
`expr` gets passed through untouched. If an expression is wrapped in
`_I_(expr)`, the column is referenced by the variable `expr` rather than
a symbol. Here are some examples:

Expand Down Expand Up @@ -132,7 +131,7 @@ df2 = @byrow! df begin
@newcol colX::Array{Float64}
@newcol colY::DataArray{Int}
:colX = :B == 2 ? pi * :A : :B
if :A > 1
if :A > 1
:colY = :A * :B
end
end
Expand Down Expand Up @@ -226,7 +225,7 @@ The following operations are now included:
groups based on the given criteria. Returns a GroupedDataFrame.

- `DataFrame(g)` -- Convert groups back to a DataFrame with the same
group orderings.
group orderings.

- `@based_on(g, z = mean(:a))` -- Summarize results within groups.
Returns a DataFrame.
Expand All @@ -242,7 +241,7 @@ GroupedDataFrame. You can also iterate over GroupedDataFrames.

The most general split-apply-combine approach is based on `map`.
`map(fun, g)` returns a GroupApplied object with keys and vals. This
can be used with `combine`.
can be used with `combine`.


# Performance
Expand All @@ -265,7 +264,7 @@ abstract type; each concrete composite type inherits from this. The advantages
of this approach are:

* You can access single columns directly using `df.colA`. This is type stable,
so code should be faster. (There is still the function boundary to worry
so code should be faster. (There is still the function boundary to worry
about.)

* All indexing operations can be done currently.
Expand All @@ -276,19 +275,19 @@ Some downsides include:
a `CompositeDataFrame` may waste memory.

* You cannot change the structure of a `CompositeDataFrame` once created.
It is nearly like an immutable object. For example to add a column, you need
It is nearly like an immutable object. For example to add a column, you need
to do something like:

```julia
transform(df, newcol = df.colA + 5)
```

An advantage of this is that the API becomes more functional. All
manipulations of the `CompositeDataFrame` return a new object.
Normally, this doesn't create much more memory.

To create a CompositeDataFrame, use `CompositeDataFrame`:

```julia
n = 10
d = CompositeDataFrame(a = 1:n, b = rand(10), c = DataArray(rand(1:3, n)))
Expand All @@ -300,7 +299,7 @@ appropriate.

You can also name the type of the `CompositeDataFrame` by including that as the
first symbol:

```julia
n = 10
d = CompositeDataFrame(:MyDF, a = 1:n, b = rand(n), c = DataArray(rand(1:3, n)))
Expand All @@ -320,7 +319,7 @@ MyDF(n::Integer) = MyDF(zeros(Int, n), zeros(n), DataArray(zeros(n)))
d = MyDF(10)
```

Note that a `CompositeDataFrame` is type stable with field access like `df.colA`
Note that a `CompositeDataFrame` is type stable with field access like `df.colA`
but not with `getindex` indexing like `df[:colA]`. `df[:colA]` works, but it is
not type stable.

Expand All @@ -337,13 +336,13 @@ y = [x.a * x.b for x in eachrow(d)]

In the example above, the call to `CompositeDataFrame` creates the type `MyDF`
that holds the composite data frame and another type `MyDFRow` that is used by
`row` and `eachrow`.
`row` and `eachrow`.

# Package Maintenance

[Tom Short](https://github.com/tshort) is the lead maintainer. Any of the
[JuliaStats collaborators](https://github.com/orgs/JuliaStats/teams/collaborators)
also have write access and can accept pull requests.
[Tom Short](https://github.com/tshort) is the lead maintainer. Any of the
[JuliaStats collaborators](https://github.com/orgs/JuliaStats/teams/collaborators)
also have write access and can accept pull requests.

Pull requests are welcome. Pull requests should include updated tests. If
functionality is changed, docstrings should be added or updated. Generally,
Expand Down
44 changes: 25 additions & 19 deletions src/DataFramesMeta.jl
Original file line number Diff line number Diff line change
Expand Up @@ -194,12 +194,13 @@ where(d::AbstractDataFrame, arg) = d[arg, :]
where(d::AbstractDataFrame, f::Function) = d[f(d), :]
where(g::GroupedDataFrame, f::Function) = g[Bool[f(x) for x in g]]

and(x, y) =
function and(x, y)
if VERSION < v"0.6.0-"
:($x & $y)
else
:($x .& $y)
end
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to move the conditional outside of the function definition, i.e.

if VERSION < v"0.6.0-"
    and(x, y) = :($x & $y)
else
    and(x, y) = :($x .& $y)
end

Otherwise the version check is performed every time the function is called.


function where_helper(d, args...)
:($where($d, $(with_anonymous(reduce(and, args)))))
Expand Down Expand Up @@ -243,9 +244,8 @@ julia> @where(df, :x .> x)
julia> @where(df, :x .> x, :y .== 3)
0×2 DataFrames.DataFrame

julia> d = DataFrame(
n = 1:20,
x = [3, 3, 3, 3, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 3, 1, 1, 2]);
julia> d = DataFrame(n = 1:20, x = [3, 3, 3, 3, 1, 1, 1, 2, 1, 1,
2, 1, 1, 2, 2, 2, 3, 1, 1, 2]);

julia> g = groupby(d, :x);

Expand Down Expand Up @@ -320,9 +320,11 @@ orderbyconstructor(d) = x -> x

function orderby_helper(d, args...)
_D = gensym()
:(let $_D = $d
$orderby($_D, $(with_anonymous(:($orderbyconstructor($_D)($(args...))))))
end)
quote
let $_D = $d
$orderby($_D, $(with_anonymous(:($orderbyconstructor($_D)($(args...))))))
end
end
end

"""
Expand All @@ -340,13 +342,12 @@ Sort by criteria. Normally used to sort groups in GroupedDataFrames.
```jldoctest
julia> using DataFrames, DataFramesMeta

julia> d = DataFrame(
n = 1:20,
x = [3, 3, 3, 3, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 3, 1, 1, 2]);
julia> d = DataFrame(n = 1:20, x = [3, 3, 3, 3, 1, 1, 1, 2, 1, 1,
2, 1, 1, 2, 2, 2, 3, 1, 1, 2]);

julia> g = groupby(d, :x);

julia> @macroexpand @orderby(g, mean(:n))
julia> @orderby(g, mean(:n))
DataFrames.GroupedDataFrame 3 groups with keys: Symbol[:x]
First Group:
5×2 DataFrames.SubDataFrame{Array{Int64,1}}
Expand Down Expand Up @@ -407,9 +408,11 @@ function transform(g::GroupedDataFrame; kwargs...)
end

function transform_helper(x, args...)
:( $transform($x, $(map(args) do kw
Expr(:kw, kw.args[1], with_anonymous(kw.args[2]))
end...) ) )
quote
$transform($x, $(map(args) do kw
Expr(:kw, kw.args[1], with_anonymous(kw.args[2]))
end...) )
end
end

"""
Expand Down Expand Up @@ -537,7 +540,7 @@ end

function by_helper(x, what, args...)
:($by($x, $what,
$(with_anonymous(:($DataFrame($(map(replace_equals_with_kw, args)...)))))))
$(with_anonymous(:($DataFrame($(map(replace_equals_with_kw, args)...)))))))
end

"""
Expand Down Expand Up @@ -639,19 +642,22 @@ end

expandargs(x) = x
expandargs(q::QuoteNode) = Expr(:kw, q.value, q)
expandargs(e::Expr) =
function expandargs(e::Expr)
if e.head == :quote
Expr(:kw, e.args[1], e)
else
replace_equals_with_kw(e)
end
end

function select_helper(x, args...)
DF = gensym()
select_args = with_helper(DF, :($select($DF, $(map(expandargs, args)...))))
:(let $DF = $x
$(with_helper(DF, :($select($DF, $(map(expandargs, args)...)))))
end)
quote
let $DF = $x
$(with_helper(DF, :($select($DF, $(map(expandargs, args)...)))))
end
end
end

"""
Expand Down
34 changes: 21 additions & 13 deletions src/byrow.jl
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,11 @@ byrow_replace(x) = x

function byrow_find_newcols(e::Expr, newcol_decl)
if e.head == :macrocall && e.args[1] == Symbol("@newcol")
ea =
if VERSION < v"0.7-"
e.args[2]
else
e.args[3]
end
if VERSION < v"0.7-"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to be more exact with the version cutoff used. What changed in Base such that the number of arguments in the expression increased? If you can find the PR that introduced that change, you can use contrib/commit-name.sh in the Julia repo to get the exact version that corresponds to that change.

Copy link
Member

@ararslan ararslan Jul 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since this is in a function (but in this case the conditional is less easily separable), the condition should have an @static to evaluate the branch once at parse time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably JuliaLang/julia@3c3ced4? That's 0.7.0-DEV.481.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think it was this: JuliaLang/julia#21746

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try :(@newcol Array{Int}).args in 0.7

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be 0.7.0-DEV.357 then.

ea = e.args[2]
else
ea = e.args[3]
end
# expression to assign a new column to df
return (nothing, Any[Expr(:kw, ea.args[1], Expr(:call, ea.args[2], :_N))])
else
Expand Down Expand Up @@ -79,8 +78,8 @@ as in `@with`. Note that the scope within `@byrow!` is a hard scope.
with eltype `Int`. Note that the returned `AbstractDataFrame` includes these new
columns, but the original `d` is not affected. This feature makes it easier to
use `byrow!` for data transformations. `_N` is introduced to represent the
length of the dataframe, `_D` represents the dataframe including added columns,
and `row` represents the current row.
length of the dataframe, `_D` represents the `dataframe` including added columns,
and `row` represents the index of the current row.

### Arguments

Expand All @@ -99,11 +98,20 @@ julia> using DataFrames, DataFramesMeta
julia> df = DataFrame(A = 1:3, B = [2, 1, 2]);

julia> let x = 0
@byrow!(df, if :A + :B == 3; x += 1 end) # This doesn't work without the let
@byrow! df begin
if :A + :B == 3
x += 1
end
end # This doesn't work without the let
x
end
2

julia> @byrow! df if :A > :B; :A = 0 end
julia> @byrow! df begin
if :A > :B
:A = 0
end
end
3×2 DataFrames.DataFrame
│ Row │ A │ B │
├─────┼───┼───┤
Expand All @@ -112,9 +120,9 @@ julia> @byrow! df if :A > :B; :A = 0 end
│ 3 │ 0 │ 2 │

julia> df2 = @byrow! df begin
@newcol colX::Array{Float64}
:colX = :B == 2 ? pi * :A : :B
end
@newcol colX::Array{Float64}
:colX = :B == 2 ? pi * :A : :B
end
3×3 DataFrames.DataFrame
│ Row │ A │ B │ colX │
├─────┼───┼───┼─────────┤
Expand Down
7 changes: 4 additions & 3 deletions src/compositedataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,8 @@ row() = nothing

"""
CompositeDataFrame(columns::Vector{Any}, cnames::Vector{Symbol}; inmodule = DataFramesMeta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap lines at 92 chars.

CompositeDataFrame(columns::Vector{Any}, cnames::Vector{Symbol}, typename::Symbol; inmodule = DataFramesMeta)
CompositeDataFrame(columns::Vector{Any}, cnames::Vector{Symbol}, typename::Symbol;
inmodule = DataFramesMeta)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be aligned with column.

CompositeDataFrame(; inmodule = DataFramesMeta, kwargs...)
CompositeDataFrame(typename::Symbol; inmodule = DataFramesMeta, kwargs...)

Expand All @@ -57,8 +58,8 @@ This uses `eval` to create a new type within the module specified by the
* `typename` : the optional name of the type created
* `kwargs` : the key gives the column names, and the value is the column contents
* `inmodule = DataFramesMeta` : a keyword argument to specify what module you
want to define the type in. Consider passing `current_module()` or
`@__MODULE__` depending on your julia version.
want to define the type in. Consider passing `current_module()`
(`VERSION < v"0.7-") or `@__MODULE__` depending on your julia version.

### Examples

Expand Down
14 changes: 7 additions & 7 deletions src/linqmacro.jl
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ julia> using DataFrames, DataFramesMeta
julia> n = 100;

julia> df = DataFrame(a = rand(1:3, n),
b = ["a","b","c","d"][rand(1:4, n)],
x = rand(n));
b = ["a","b","c","d"][rand(1:4, n)],
x = rand(n));

julia> x1 = @linq transform(where(df, :a .> 2, :b .!= "c"), y = 10 * :x);

Expand All @@ -48,11 +48,11 @@ julia> @linq select(orderby(x1, :b, -:meanX), var = :b, :meanX, :meanY)
│ 3 │ "d" │ 0.568289 │ 5.68289 │

julia> @linq df |>
transform(y = 10 * :x) |>
where(:a .> 2) |>
by(:b, meanX = mean(:x), meanY = mean(:y)) |>
orderby(:meanX) |>
select(:meanX, :meanY, var = :b)
transform(y = 10 * :x) |>
where(:a .> 2) |>
by(:b, meanX = mean(:x), meanY = mean(:y)) |>
orderby(:meanX) |>
select(:meanX, :meanY, var = :b)
4×3 DataFrames.DataFrame
│ Row │ meanX │ meanY │ var │
├─────┼──────────┼─────────┼─────┤
Expand Down
4 changes: 2 additions & 2 deletions test/dict.jl
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ d2 = @select(d, :y, z = :y + :s, :e)
@test DataFramesMeta.with_helper(:df, :(f.(1))) == :(f.(1))
@test DataFramesMeta.with_helper(:df, :(f.(b + c))) == :(f.(b + c))
@test DataFramesMeta.with_helper(:df, Expr(:., :a, QuoteNode(:b))) == Expr(:., :a, QuoteNode(:b))
@test DataFramesMeta.select_helper(:df, 1).args[1].args[2].args[3] == 1
@test DataFramesMeta.select_helper(:df, QuoteNode(:a)).args[1].args[2].args[2].args[2].args[2].args[3].args[1] == :a
@test DataFramesMeta.select_helper(:df, 1).args[2].args[1].args[2].args[3] == 1
@test DataFramesMeta.select_helper(:df, QuoteNode(:a)).args[2].args[1].args[2].args[2].args[2].args[2].args[3].args[1] == :a
@test DataFramesMeta.expandargs(QuoteNode(:a)) == Expr(:kw, :a, QuoteNode(:a))
@test DataFramesMeta.byrow_find_newcols(:(;), Any[]) == (Any[], Any[])

Expand Down