Empty data.table produced with .SD when grouping by all columns #3262

st-pasha · 2019-01-09T19:00:01Z

> DT = data.table(A=c(1,2,1,2,1,2), B=c(1,2,1,1,2,2))
> DT[, .SD, by=.(A)]  # As expected
   A B
1: 1 1
2: 1 1
3: 1 2
4: 2 2
5: 2 1
6: 2 2
> DT[, .SD, by=.(A, B)]  # not expected
Empty data.table (0 rows) of 2 cols: A,B

Likewise,

> DT[, .SD, by=.(A+B)]  # not expected
Empty data.table (0 rows) of 1 col: A

Henrik-P · 2019-01-09T19:30:40Z

If I understand this correctly, it may seem consistent with the help text:

.SD is a data.table containing the Subset of x's Data for each group, excluding any columns used in by

...with the part "excluding any columns used in by" being critical here. Thus, I don't think it's grouping by multiple (per se) columns which causes the empty data set, but grouping with all columns:

d <- data.table(x = 1, y = 2, z = 3, w = 4)

d[ , names(.SD), by = .(x)]$V1
# [1] "y" "z" "w"

d[ , names(.SD), by = .(x, y)]$V1
# [1] "z" "w"

d[ , names(.SD), by = .(x, y, z)]$V1
# [1] "w"

d[ , names(.SD), by = .(x, y, z, w)]$V1
# character(0)

Possibly related issue: Columns appearing in the function in by= disappers in j

st-pasha · 2019-01-09T20:54:54Z

@Henrik-P You're right that this is closely related to #1427.
And you're right that .SD becomes empty when all columns are used up in the groupby.
Still, I feel the final result is incorrect: the columns used in by are supposed to be implicitly added to the front of the j result even when that j is an empty data.table.

r2evans · 2020-07-28T15:40:48Z

I think there should be a distinction between "excluding any columns used in by" and "0 columns". I think it's perfectly valid to have 0 columns and some rows. This is actually not unique to .SD:

data.frame(a=1:5)[,0]
# data frame with 0 columns and 5 rows
data.table(a=1:5)[,0]
# Null data.table (0 rows and 0 cols)

jangorecki · 2020-07-28T18:08:42Z

If it is valid to have rows and 0 columns depends on how you internally define structure of your data.
Databases does not allow to have rows and 0 columns at the same time.
Having rows but not columns is more a matrix/array expected behaviour, not a (db) table, which data.frame is closely corresponding to. There was a good discussion about that behaviour already.

r2evans · 2020-07-28T19:36:46Z

I'll look for the previous discussions, I'm not surprised they are around (but I didn't find them).

st-pasha added the bug label Jan 9, 2019

mattdowle removed the bug label Jan 10, 2019

mattdowle changed the title ~~Empty data.table produced with .SD when grouping by multiple columns~~ Empty data.table produced with .SD when grouping by all columns Jan 10, 2019

mattdowle mentioned this issue Jan 11, 2019

by= ignored if there is no j expression #3263

Closed

ColeMiller1 mentioned this issue Jun 23, 2020

Columns appearing in the function in by= disappers in j #1427

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty data.table produced with .SD when grouping by all columns #3262

Empty data.table produced with .SD when grouping by all columns #3262

st-pasha commented Jan 9, 2019 •

edited

Henrik-P commented Jan 9, 2019 •

edited

st-pasha commented Jan 9, 2019

r2evans commented Jul 28, 2020

jangorecki commented Jul 28, 2020

r2evans commented Jul 28, 2020

Empty data.table produced with .SD when grouping by all columns #3262

Empty data.table produced with .SD when grouping by all columns #3262

Comments

st-pasha commented Jan 9, 2019 • edited

Henrik-P commented Jan 9, 2019 • edited

st-pasha commented Jan 9, 2019

r2evans commented Jul 28, 2020

jangorecki commented Jul 28, 2020

r2evans commented Jul 28, 2020

st-pasha commented Jan 9, 2019 •

edited

Henrik-P commented Jan 9, 2019 •

edited