Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[R-Forge #2696] change data.table by-without-by syntax to require a "by" #371
Submitted by: Eduard Antonyan; Assigned to: Nobody; R-Forge link
This request stems from the following SO thread:
Since this is something that old timers, and of course the author of the package are probably very used to, the following examples may not seem unusual to them, however I'll do my best to show you the progression of expected results for someone relatively new to the package (I've been using it for about a month now, and love it so far) and how the current syntax breaks expectations and forces to go through extensive investigation to figure out what's going on.
d = data.table(a = c(1,1,1,2,2,3,4), b = c(1,1,3,4,5,6,7), c = 1:7, key = "a") t = data.table(a = c(1,2), key = "a") z = data.table(a = 3, key = "a") # first, the set up - getting to know data.table # i,j,by syntax and running a few commands d # a b c #3: 3 6 6 d[6, a] #  3 d[6, b] #  6 d[a <= 2] # a b c #1: 1 1 1 #1: 1 1 2 #1: 1 3 3 #2: 2 4 4 #2: 2 5 5 d[a <= 2, sum(c)] #  15 d[a <= 2, sum(c), by = a] # a V1 #1: 1 6 #2: 2 9 # ok, so with the above set-up, let's do some merges and see what the results are (together with what I contend the results *should* be with that syntax) d[z] # a b c #3: 3 6 6 d[z, a] # a a #1: 3 3 # "should" be #  3 # to get the above result, one "should" type instead d[z, a, by = a] d[z, b] # a b #1: 3 6 # "should" be #  6 d[t] # prints same output as d[a <= 2] d[t, sum(c)] # prints same output as d[a <= 2, sum(c), by = a] # "should" print same output as d[a <= 2, sum(c)] d[t, sum(c), by = a] # complains and prints same output as above ("should" not complain, and should silently do the by-without-by, for speed reasons, internally) d[t, sum(c), by = b] # no complaints and does exactly what one would expect, i.e. same as d[a <= 2, sum(c), by = b]
I can see how this may not seem obviously off for someone who's been relying on current behavior for a while, but please believe me when I say this, for someone who's just getting to know the package current behavior makes no sense. Yes, it's documented in no less than 3 FAQ points (which seems to indicate that this syntax is a stumbling block not just for me), but that doesn't make it less unintuitive.
The above completely breaks the reading of
Let me be very clear - I love