Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.table not compatible with ggplot when it was generated from empty data.table #4597

Closed
matthewgson opened this issue Jul 11, 2020 · 2 comments · Fixed by #4609
Closed
Labels
Milestone

Comments

@matthewgson
Copy link

matthewgson commented Jul 11, 2020

Hi,

I think I found a bug case where data.table does not work with ggplot.
Here's an example:

dt = data.table()
dt[, x := seq(1,100)]
dt[, y := seq(100,1)]
head(dt)
   x   y
1: 1 100
2: 2  99
3: 3  98
4: 4  97
5: 5  96
6: 6  95

# generate ggplot

ggplot(dt, aes(x,y)) + geom_point()

Error in `$<-.data.frame`(x, name, value) : 
  replacement has 1 row, data has 0

A workaround was

dt %>% setDF %>% setDT # convert to DF and convert it back

ggplot(dt, aes(x,y)) + geom_point() # now it works

which doesn't seem convenient.

I'm not certain if it is coming from data.table or ggplot, for plot(dt) just works fine, but I guessed it was from data.table side because setDT somehow resolves the issue. Or was I naive to use empty data.table from scratch?

@ColeMiller1
Copy link
Contributor

ColeMiller1 commented Jul 11, 2020

Thanks for the report - I can reproduce. Even more minimal, the issue occurs without any aes:

library(data.table)
library(ggplot2)

dt = data.table()
dt[, x := seq(1,100)]
dt[, y := seq(100,1)]

ggplot(dt)
#> Error in `$<-.data.frame`(x, name, value): replacement has 1 row, data has 0

It appears that the main issue is that our dt does not get row names after assignment. We can manually work around this by setting the attribute:

row.names(dt)
# character(0)
setattr(dt, "row.names", 1:100)
ggplot(dt, aes(x, y)) + geom_point()
## success!

It seems like we need to either prevent users from starting from an empty data.table or update row.names on assignment when the number of rows is greater than the current number of rows of dt. And while a null.data.table seems to be the main way where this issue would be, here are some other edge cases:

## 0 row data table with assignment
dt = data.table(x = integer())
dt[, y := 1:2][]
#Empty data.table (0 rows and 2 cols): x,y

## 1 row data.table with assignment
dt = data.table(x = 1L)
dt[, y := 1:2]

#Error in `[.data.table`(dt, , `:=`(y, 1:2)) : 
#  Supplied 2 items to be assigned to 1 items of column 'y'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

## 2 row data.table with length(0) assignment
dt = data.table(x = 1:2)
dt[, y := integer()][]

##   x  y
##1: 1 NA
## 2: 2 NA

@ColeMiller1
Copy link
Contributor

It looks like allowing a null.data.table to be updated is intended:

data.table/src/assign.c

Lines 321 to 323 in a347623

const int nrow = LENGTH(dt) ? length(VECTOR_ELT(dt,0)) :
(isNewList(values) && length(values) ? length(VECTOR_ELT(values,0)) : length(values));
// ^ when null data.table the new nrow becomes the fist column added

The solution seems to be to set row names in assign when dt is a null.data.table. So my next step is figuring out how to do that in C.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants