Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cumsum function in mutate does not return values as expected on grouped dataframe #105

Closed
yahrMason opened this issue Jul 13, 2023 · 4 comments

Comments

@yahrMason
Copy link

I am looking to implement a cumulative sum over a column, by group. I am using the following Tidier.jl code, but it does not return what I expect.

using DataFrames, Tidier

df = DataFrame(
    grp = ["one", "one", "one", "two", "two", "two"],
    t = [1,2,3,1,2,3],
    x = [4,5,6,8,1,2]
)

df2 = @chain df begin
    @group_by(grp)
    @mutate(x_cumsum = cumsum(x))
    @ungroup
end

Returns a dataframe that looks like this (apologies for horrid formatting)

Row │ grp t x x_cumsum
│ String Int64 Int64 Array…
───┼────────────────────────────────
1 │ one 1 4 fill(4)
2 │ one 2 5 fill(5)
3 │ one 3 6 fill(6)
4 │ two 1 8 fill(8)
5 │ two 2 1 fill(1)
6 │ two 3 2 fill(2)

@yahrMason yahrMason changed the title cumsum function does not return values as expected cumsum function in mutate does not return values as expected on grouped dataframe Jul 13, 2023
@kdpsingh
Copy link
Member

I think this is an auto-vectorization issue.

Can you try adding a tilde in front of cumsum() like this?

df2 = @chain df begin
    @group_by(grp)
    @mutate(x_cumsum = ~cumsum(x))
    @ungroup
end

If that works, I can update the code to remove the need for a tilde. Let me know. Thanks for reporting!

@yahrMason
Copy link
Author

I can confirm that adding the tilde in front of the cumsum now works as intended. Thanks!

@kdpsingh
Copy link
Member

Great, we'll get this fixed soon so that it will work both with and without the tilde.

What's happening is that cumsum() is being auto-vectorized to cumsum.() by Tidier. We have an array of exclusions that include which functions not to vectorize.

I'll add it to that list of exclusions, and in the longer term, the plan is to make that list modifiable by end-users so they can register other functions that they don't want to see auto-vectorized by Tidier in their workflow.

@kdpsingh
Copy link
Member

This is now fixed on the GitHub version of Tidier. cumsum(), cumprod(), and accumulate() no longer require a tilde. Will get this update on the registry shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants