Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upUnexpected results when combining `on` with `keyby` or `by` #1943
Comments
|
Interesting. The problem in part 1 was in v1.9.6 as well, so it doesn't seem to be a regression. > d1[.(x="b"),by=y,max(z),on="x"] # v1.9.6
y V1
1: 1 4
2: 1 7Works in v1.9.6 with keyed join rather than > setkey(d1,x)
> d1["b",keyby=y,max(z)]
y V1
1: 0 7
2: 1 4
> d1["b",by=y,max(z)]
y V1
1: 1 4
2: 0 7And for completeness just to check, part 2 gives |
I was working my way through the
data.tablevignette Secondary indices and auto indexing. To make it easier to compare original data and results, I replaced the non-minimal "flights" data with much smaller datasets throughout.It worked fine until section 2f "Aggregation using
by", whereonis combined withkeyby. There we find code to "Get the maximum departure delay for each month corresponding to origin = "JFK". Order the result by month":When I tried the equivalent code on two different data sets, two different issues appeared:
1. Label of the
keybyandbyvariable identical for different levelsTranslating the code from the vignette: Get the maximum
zfor eachycorresponding tox = "b". Order the result byyThe label of the
keybyvariableyis erroneously1also for the0level.Also when using
byinstead ofkeybytogether withonthe labels are wrong:Just to verify that corresponding code without
onworks fine:2. (a) Label of the
keyby(orby) variable and the resulting value do not match. (b) Result not sorted by thekeybyvariable.Again, equivalent desired outcome: get the maximum
hpfor eachvscorresponding toam = 0. Order the result byvs.First, just look at the data corresponding to
am = 0to easier spot the desired result:When combining
keybyandon, the result is not sorted and the labels don't match the values:When combining
byandon, the labels don't match the values:Subsetting without
onworks fine:So far I have not been able to discern any particular pattern in the data which generates these results (e.g. any particular order of
onand/orkeybyvariables in the original and/or subset data).Can you spot any mistakes in my code or is there something strange going on here?
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
data.table_1.9.8