New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key breaks `by` functuanality #1704

Closed
DavidArenburg opened this Issue May 16, 2016 · 3 comments

Comments

Projects
None yet
3 participants
@DavidArenburg
Member

DavidArenburg commented May 16, 2016

I'm not entirely certain what causes this, so here's the most minimal WE I could find

library(data.table) # Tested on v 1.9.7
dt <-  data.table( origin = c("A", "A", "A", "A", "A", "A", "B", "B", "A", "A", "C", "C", "B", "B", "B", "B", "B", "C", "C", "B", "A", "C", "C", "C", "C", "C", "A", "A", "C", "C", "B", "B"),
                   destination = c("A", "A", "A", "A", "B", "B", "A", "A", "C", "C", "A", "A", "B", "B", "B", "C", "C", "B", "B", "A", "B", "C", "C", "C", "A", "A", "C", "C", "B", "B", "C", "C"),
                   points_in_dest = c(5, 5, 5, 5, 4, 4, 5, 5, 3, 3, 5, 5, 4, 4, 4, 3, 3, 4, 4, 5, 4, 3, 3, 3, 5,5, 3, 3, 4, 4, 3, 3),
                   depart_time = c(7, 8, 16, 18, 7, 8, 16, 18, 7, 8, 16, 18, 7, 8, 16, 7, 8, 16, 18, 8, 16, 7, 8, 18, 7, 8, 16, 18, 7, 8, 16, 18),   
                   travel_time = c(0, 0, 0, 0, 70, 10, 70, 10, 10, 10, 70, 70, 0, 0, 0, 70, 10, 10, 70, 70, 10, 0, 0, 0, 10, 70, 10, 70, 10, 70, 70, 10) )

dt[ depart_time<=8  & travel_time < 60, condition1 := TRUE]
dt[ depart_time>=16 & travel_time < 60, condition2 := TRUE] 

setkey(dt, origin, destination)
res <- unique(dt[(condition1)])[unique(dt[(condition2)]), 
                                on = c(destination = "origin", origin = "destination"), 
                                nomatch = 0L]
res[, .(points = sum(points_in_dest)),  keyby = origin]
#    origin points
#1:      A      5
#2:      A      4
#3:      B      4
#4:      B      3
#5:      C      5
#6:      C      4
#7:      C      3

As you can see, by didn't work as intended and all rows were returned. It is obviously a keying problem as the following fixes this

setattr(res, "sorted", NULL)
res[, .(points = sum(points_in_dest)), keyby = origin]
#    origin points
#1:      A      9
#2:      B      7
#3:      C     12

Or, alternatively fore-classing origin to a factor

res[, .(points = sum(points_in_dest)), keyby = factor(origin)]
#    factor points
#1:      A      9
#2:      B      7
#3:      C     12

This was taken from this SO question http://stackoverflow.com/questions/37239649/aggregate-data-table-based-on-condition-in-another-row

@arunsrinivasan arunsrinivasan added this to the v1.9.8 milestone May 16, 2016

@arunsrinivasan

This comment has been minimized.

Show comment
Hide comment
@arunsrinivasan

arunsrinivasan May 16, 2016

Member

Very nice example. Will fix. Thanks.

Member

arunsrinivasan commented May 16, 2016

Very nice example. Will fix. Thanks.

@MichaelChirico

This comment has been minimized.

Show comment
Hide comment
@MichaelChirico

MichaelChirico May 16, 2016

Contributor

gotta say, that's a creative way to spell functionality!

Contributor

MichaelChirico commented May 16, 2016

gotta say, that's a creative way to spell functionality!

@DavidArenburg

This comment has been minimized.

Show comment
Hide comment
@DavidArenburg

DavidArenburg May 16, 2016

Member

Fixed....

Member

DavidArenburg commented May 16, 2016

Fixed....

@arunsrinivasan arunsrinivasan self-assigned this Jul 21, 2016

arunsrinivasan added a commit that referenced this issue Jul 21, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment