Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-Forge #5252] rbindlist should support expression columns #546

Closed
arunsrinivasan opened this issue Jun 8, 2014 · 10 comments · Fixed by #3811 · May be fixed by #4196
Closed

[R-Forge #5252] rbindlist should support expression columns #546

arunsrinivasan opened this issue Jun 8, 2014 · 10 comments · Fixed by #3811 · May be fixed by #4196
Assignees
Labels
dev feature request non-atomic column e.g. list columns, S4 vector columns
Milestone

Comments

@arunsrinivasan
Copy link
Member

Submitted by: Matt Dowle; Assigned to: Nobody; R-Forge link

From Jan Górecki on email :

library(data.table)
# desired solution
a <- data.table(c1 = 1, c2 = 'asd', c3 = expression(as.character(Sys.time())))
b <- data.table(c1 = 3, c2 = 'qwe', c3 = expression(as.character(Sys.time()+5)))
dt <- rbind(a,b)
#Error in rbindlist(allargs) : Unsupported column type 'expression'
dt <- rbindlist(list(a,b))
#Error in rbindlist(list(a, b)) : Unsupported column type 'expression'

# current workaround
a <- data.table(c1 = 1, c2 = 'asd', c3 =
list(expression(as.character(Sys.time()))))
b <- data.table(c1 = 3, c2 = 'qwe', c3 =
list(expression(as.character(Sys.time()+5))))
dt <- rbind(a,b)
dt <- rbindlist(list(a,b))
# both ok, but the desired solution would be:
dt[,list(c1,c2,eval(c3))]
# and it will not produce expected results because of previous workaround
# another workaround must be done:
dt[,list(c1,c2,lapply(c3,eval))]
@jangorecki
Copy link
Member

could be handled by more generic solution described in #1302

@MichaelChirico
Copy link
Member

Looked into this a bit. Actually not straightforward because SIZEOF(EXPRSXP) = 0 -- I'm not sure how to deal with such object. Actually the error message is different than in the original post:

a <- data.table(c1 = 1, c2 = 'asd', c3 = expression(as.character(Sys.time())))
b <- data.table(c1 = 3, c2 = 'qwe', c3 = expression(as.character(Sys.time()+5)))
dt <- rbind(a,b)

Error in rbindlist(l, use.names, fill, idcol) :
(list) object cannot be coerced to type 'logical'

I guess that looks a bit puzzling, but it's because EXPRSXP is sort-of-list -- we can use VECTOR_ELT on it and its elements are LANGSXP. And because SIZEOF(EXPRSXP)=0, maxType is LGLSXP, so coercion is attempted from EXPRSXP->LGLSXP.

If we change the default maxType to be EXPRSXP, we'll get a different error, this time from assign.c:memrecycle which doesn't support EXPRSXP, and further it's not clear to me how to fix memrecycle to support EXPRSXP.

Any ideas @jangorecki? Or should we wontfix?

@jangorecki
Copy link
Member

jangorecki commented Sep 2, 2019

I think we agree on API of such complex structure, in was discussed in #1302 and related PR.
They have to be wrapped in a list, so c-like operation (which is what rbind/rbindlist is meant to do for corresponding columns) works for them.
Below example shows it is actually consistent.

a <- data.table(c1 = 1, c2 = 'asd', c3 = expression(as.character(Sys.time())))
b <- data.table(c1 = 3, c2 = 'qwe', c3 = expression(as.character(Sys.time()+5)))
c(a$c3, b$c3)
#expression(as.character(Sys.time()), as.character(Sys.time() + 5))
# this is not what we wanted!
c(list(a$c3), list(b$c3))
# this one is!

so the proper way is to prohibit to create expression column that is not wrapped into list column.

@MichaelChirico
Copy link
Member

Playing around with this now a bit more & seems possible & not too difficult actually.

@jangorecki
Copy link
Member

jangorecki commented Dec 22, 2019

This issue has not been resolved by #3981.
Expected output from the first post in this issue is

data.table(c1 = c(1, 3), c2 = c("asd", "qwe"), c3 = list(
  expression(as.character(Sys.time())), expression(as.character(Sys.time() + 5))
))

while the PR #3811 produce the following output

data.table(c1 = c(1, 3), c2 = c("asd", "qwe"), c3 = expression(
    as.character(Sys.time()), as.character(Sys.time() + 5)
))

It was reported by me in PR which has been merged without further discussion.

@jangorecki jangorecki reopened this Dec 22, 2019
@mattdowle
Copy link
Member

mattdowle commented Dec 22, 2019

All I remember seeing was that you thought it wouldn't be needed by anyone (#3811 (review)). I saw it wasn't marked WIP, was passing tests, so I merged it to make progress and out of the way. @MichaelChirico, are you able to see a solution?

@sritchie73
Copy link
Contributor

I think I have a working solution:

  • In src/init.c instead of initializing the typeorder array to 0, initialize it to 7 (or any other number > typeorder[VECSXP]
  • In src/rbindlist.c add an if statement that checks whether TYPEORDER(thisType) > TYPEORDER(VECSXP), and if so sets maxType to VECSXP

This means any type not listed in the typeorder in src/init.c (e.g EXPRSXP) will be coerced into a list.

As an aside, I think we could safely reduce the sizes and typeorder arrays from length 100 to 32 as there can only be 32 possible SEXPTYPES unless they increase the number of bits in the sxpinfo header (currently 5 bits reserved for SEXPTYPE)

@sritchie73
Copy link
Contributor

Actually it looks like coerceVector(EXPRSXP, VECSXP) does not handle the conversion correctly, it returns a list of language types instead of expression types:

> sapply(rbind(A,B)$c3, typeof)
[1] "language" "language"
> rbind(A,B)$c3
[[1]]
as.character(Sys.time())

[[2]]
as.character(Sys.time() + 5)

Target behaviour:

> sapply(c(list(A$c3), list(B$c3)), typeof)
[1] "expression" "expression"
> c(list(A$c3), list(B$c3))
[[1]]
expression(as.character(Sys.time()))

[[2]]
expression(as.character(Sys.time() + 5))

@sritchie73
Copy link
Contributor

sritchie73 commented Jan 23, 2020

I'm confused by the requested behaviour

Currently, rbindlist is consistent with base R:

> a <- data.table(c1 = 1, c2 = 'asd', c3 = expression(as.character(Sys.time())))
> b <- data.table(c1 = 3, c2 = 'qwe', c3 = expression(as.character(Sys.time()+5)))
> dt <- rbind(a,b)
> all.equal(dt$c3, c(a$c3, b$c3))
[1] TRUE

This issue requests that the expression columns be coerced to lists of expressions, but this only makes sense if all input data.tables have nrow=1.

For example, if someone has a data.table with nrow > 1, then any expression column will have been defined using the base R behaviour illustrated above:

> c = data.table(c1 = c(1,3), c2 = c('asd', 'qwe'), c3=c(expression(as.character(Sys.time())), expression(as.character(Sys.time()+5))))
> c
   c1  c2                                                              c3
1:  1 asd expression(as.character(Sys.time()), as.character(Sys.time() +
2:  3 qwe                                                             5))

If we change the behaviour of rbindlist, what is the expected behaviour when binding this data.table? Should we keep the current behaviour, coercing to a composite expression, or put each element in a list as is the requested behaviour?

@jangorecki
Copy link
Member

jangorecki commented Mar 27, 2020

After reading expression manual more carefully I spotted that it is well defined there as

R expression vector is a list of calls, symbols etc

And then we can also observe:

vector("expression", 3)
#expression(NULL, NULL, NULL)

Therefore it make sense to not wrap expression in extra list column, but process it as a vector, so c directly.
This is only valid for expression, but not for language or symbols which should be still wrapped into list. or eventually into expression instead. IMO we don't want to maintain wrapping into expression, lets leave it to users, as they may actually prefer to have them inside the list objects instead.
Current unit test covers it well, so issue can be closed. Thanks Michael!

@jangorecki jangorecki added the non-atomic column e.g. list columns, S4 vector columns label Apr 5, 2020
@jangorecki jangorecki modified the milestones: 1.12.11, 1.12.9 May 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev feature request non-atomic column e.g. list columns, S4 vector columns
Projects
None yet
5 participants