Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion regarding the use of broadcasting . #165

Closed
xiaodaigh opened this issue Aug 31, 2020 · 8 comments
Closed

Discussion regarding the use of broadcasting . #165

xiaodaigh opened this issue Aug 31, 2020 · 8 comments
Milestone

Comments

@xiaodaigh
Copy link
Contributor

I like chaining stuff together when using DataFramesMeta, but I find that I have to put in a lot of dots everywhere to broadcast. The @. doesn't seem to compose well with Pipe.jl and DataFrames.jl

using DataFrames, DataFramesMeta, Pipe
@pipe bureau_bal |>
        @where(_, :STATUS .!= "C") |>
        @where(_, :MONTHS_BALANCE .> -11) |>
        @transform(_, STATUS = replacex1.(:STATUS)) |>
        groupby(_, :SK_ID_BUREAU)

Been asked to put it here for discussion. Any ideas?

@pdeffebach
Copy link
Collaborator

The @. macro works just fine when placed on the other side of the equals sign

df = DataFrame(rand(2,2))

julia> @transform(df, y = @. :x1 + :x2)
2×3 DataFrame
│ Row │ x1       │ x2       │ y       │
│     │ Float64  │ Float64  │ Float64 │
├─────┼──────────┼──────────┼─────────┤
│ 1   │ 0.862046 │ 0.728847 │ 1.59089 │
│ 2   │ 0.834787 │ 0.231523 │ 1.06631 │

julia> @where(df, @. :x1 > :x2)
2×2 DataFrame
│ Row │ x1       │ x2       │
│     │ Float64  │ Float64  │
├─────┼──────────┼──────────┤
│ 1   │ 0.862046 │ 0.728847 │
│ 2   │ 0.834787 │ 0.231523 │

It can't be used before @transform because DataFrames are broadcastable so this would be inconsistent with Base.

It also can't be used before the = sign because we are creating new columns, not broadcasted assignment. Maybe it's worth adding this feature for overwriting existing columns. This might be complicated though, in general implementing broadcasting is a bit complex.

We will see how easy this is to do after we use DataFrames.transform as the backend.

@xiaodaigh
Copy link
Contributor Author

Can do have a macro like

@broadcast true begin
 @ transform(...)
end

then everything in the broadcast will become broadcast true?

@pdeffebach
Copy link
Collaborator

I think we would use a @row macro instead,

@transform(df, @row y = :x + :z)

I am also thinking of having separate functions for the two behaviors, something like @genvec and @genrow where the only difference is whether or not functions are wrapped in ByRow.

@xiaodaigh
Copy link
Contributor Author

This doesn't work

@where(df, @. :x1 > :x2,  :x1 > :x3)
@where(df, @.(:x1 > :x2), @.(:x1 > :x3))
``



> @transform(df, @row y = :x + :z)

Would this be slower than column semantics?

@pdeffebach
Copy link
Collaborator

Please post the error message, because it works for me.

julia> df = DataFrame(rand(100, 2));

julia> @where(df, @.(:x1 > .6), @.(:x2 < .5));

In general @transform probably wouldn't be that much slower. The internal implementation relies on Tables.namedtupleiterator, so its looping through named tuples and will probably be quite fast.

The benefit of moving to the new backend is that we will have to care much less about performance in DataFramesMeta. As long as ByRow is fast DataFramesMeta will be fast.

@xiaodaigh
Copy link
Contributor Author

xiaodaigh commented Sep 15, 2020

You are right. It works. Suspect it was something in my package system that messed it up

@pdeffebach pdeffebach added this to the 1.X milestone Mar 7, 2021
@pdeffebach
Copy link
Collaborator

Marking as 1.X because I want to add a @ByRow macro still, since broadcasting doesn't cover all scenarios.

@pdeffebach
Copy link
Collaborator

Closed with the addition of @rtransform etc. in #267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants