Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcast.data.table should be able to handle multiple aggregation functions #716

Closed
arunsrinivasan opened this issue Jul 3, 2014 · 5 comments
Assignees
Milestone

Comments

@arunsrinivasan
Copy link
Member

Aggregation is not cheap (esp. on large data) and it'd be great to have multiple aggregations done in the same cast function call. Roughly I'm thinking it should be something like:

dcast.data.table(DT, formula, fun.aggregate = 
       list( prefix1 = mean(., na.rm=TRUE), 
             prefix2 = sum(.),  # or simply sum
             prefix3 = function(.) length(unique(.)) )

where prefix* would be the string that'd be attached to the cast columns.

@arunsrinivasan
Copy link
Member Author

Implemented in commit fc753c2 in 1.9.8. Will merge and add news later.

@arunsrinivasan
Copy link
Member Author

  • Simply fun.aggregate argument
  • Allow value.var to be list instead
  • update documentation

Final setting should be more or less:

require(data.table)
DT = data.table(x=1:5, y=paste("v", 1:5, sep=""), 
                        v1=6:10, v2=11:15, 
                        k1=letters[1:5], k2=letters[6:10])

# current syntax if we'd like to sum v1,v2 cols and paste k1,k2 cols
# writing it separately for clarity
DT.m = melt(DT, id=1:2, measure=list(3:4, 5:6))
paste_ = function(x) paste(x, collapse="")
dcast.data.table(DT.m, x ~ y, fun.aggregate = 
    funs(.(sum, vars="value1"), .(paste_, vars="value2")), value.var=c("value1", "value2"))

which is a lot of "bla" and redundancy. Instead it could be:

dcast.data.table(DT.m, x ~ y, fun.aggregate = 
    list(sum, function(x) paste(x, collapse="")), value.var=list("value1", "value2"))

Idea: If value.var is a character vector or length=1 list, then fun.aggregate will be applied to all those columns. Else if value.var is a list AND length(value.var) == number of functions, then apply each one to that function.

@arunsrinivasan arunsrinivasan modified the milestones: v1.9.6, v1.9.8 Mar 5, 2015
@arunsrinivasan
Copy link
Member Author

Done.

dcast.data.table(DT.m, x ~ y, fun.aggregate = 
    list(sum, function(x) paste(x, collapse="")), value.var=list("value1", "value2"))
#    x v1_sum_value1 v2_sum_value1 v3_sum_value1 v4_sum_value1 v5_sum_value1 v1_function_value2
# 1: 1            17             0             0             0             0                 af
# 2: 2             0            19             0             0             0                   
# 3: 3             0             0            21             0             0                   
# 4: 4             0             0             0            23             0                   
# 5: 5             0             0             0             0            25                   
#    v2_function_value2 v3_function_value2 v4_function_value2 v5_function_value2
# 1:                                                                            
# 2:                 bg                                                         
# 3:                                    ch                                      
# 4:                                                       di                   
# 5:                                                                          ej

@franknarf1
Copy link
Contributor

Any chance of adding an option to have the columns in the result alternate, like v1_sum_value1, then v1_sum_value2, etc.? I used the multi-arg value.var over on SO and couldn't find a nice way to reorder the columns like that: http://stackoverflow.com/a/32570910/1191259

@UweBlock
Copy link
Contributor

I would like to second @franknarf1. Here is another case on SO where an option to reorder the columns in an alternating way would have been handy: http://stackoverflow.com/a/43650705/3817004

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants