Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an efficient unnest function #2146

Open
statquant opened this issue May 7, 2017 · 6 comments · May be fixed by #4156
Open

Create an efficient unnest function #2146

statquant opened this issue May 7, 2017 · 6 comments · May be fixed by #4156
Labels
feature request top request One of our most-requested issues

Comments

@statquant
Copy link

statquant commented May 7, 2017

Similarly to tidyr:::unnest data.table would benefits from a fast unnest function.
I found

is there something canonical ?
if not that a FR !

@MichaelChirico
Copy link
Member

MichaelChirico commented Jun 2, 2017

Also:

https://stackoverflow.com/questions/44336733/
https://stackoverflow.com/q/48831637/3576984


Suggestion to make such a function more flexible than unnest as well; building on the example in the linked SO Q, consider:

dt1 <- data.table(
  colA=   c('A1','A2','A3'), 
  colB=list('B1',c('B2a','B2b'),'B3'),
  colC=list(c('C1a', 'C1b'),'C2','C3'), 
  colD=   c('D1','D2','D3')
)

tidyr::unnest(dt1) is an error, but a "cross-join" at rows with mis-matched lengths is probably appropriate.

@arunsrinivasan
Copy link
Member

Yes, this would be useful, agree. As I wrote under #2159, I remember implementing unwrap() sometime ago, and I prefer it to unnest, if that's all fine. Marking #2159 as duplicate.

@franknarf1
Copy link
Contributor

Some other possible extensions / convenience features from this SO q: https://stackoverflow.com/q/56981960

# example with a list of nested DFs
library(data.table)
DT <- data.table(colA=   c('A1','A2','A3'), 
    colB=list(data.frame(), 
        data.frame(colsubB1=c('B2a','B2b'),colsubB2=c('B2c', 'B2d')), 
        data.frame(colsubB1=c('A3a','A3b'),colsubB2=c('A3c', 'A3d'))),
    colC=   c('C1','C2','C3'), 
    colD=   c('D1','D2','D3')

DT[, lens := sapply(colB, nrow)]

#    colA         colB colC colD lens
# 1:   A1 <data.frame>   C1   D1    0
# 2:   A2 <data.frame>   C2   D2    2
# 3:   A3 <data.frame>   C3   D3    2


# desired output...
# preserves length == 0 elements filled with NA (instead of dropping)
# drops the columns in place (instead of at the end)

#    colA colsubB1 colsubB2 colC colD lens
# 1:   A1     <NA>     <NA>   C1   D1    0
# 2:   A2      B2a      B2c   C2   D2    2
# 3:   A2      B2b      B2d   C2   D2    2
# 4:   A3      A3a      A3c   C3   D3    2
# 5:   A3      A3b      A3d   C3   D3    2

# versus tidyr::unnest
tidyr::unnest(DT)
#    colA colC colD lens colsubB1 colsubB2
# 1:   A2   C2   D2    2      B2a      B2c
# 2:   A2   C2   D2    2      B2b      B2d
# 3:   A3   C3   D3    2      A3a      A3c
# 4:   A3   C3   D3    2      A3b      A3d

@r2evans
Copy link

r2evans commented Sep 24, 2020

As a mod to @franknarf1 's code, the empty row can be preserved

tidyr::unnest(DT, colB, keep_empty = TRUE)
# # A tibble: 5 x 5
#   colA  colsubB1 colsubB2 colC  colD 
#   <chr> <chr>    <chr>    <chr> <chr>
# 1 A1    <NA>     <NA>     C1    D1   
# 2 A2    B2a      B2c      C2    D2   
# 3 A2    B2b      B2d      C2    D2   
# 4 A3    A3a      A3c      C3    D3   
# 5 A3    A3b      A3d      C3    D3   

@mattdowle mattdowle modified the milestones: 1.13.1, 1.13.3 Oct 17, 2020
@mattdowle mattdowle removed this from the 1.14.1 milestone Aug 28, 2021
@aourednik
Copy link

This seems to work quite efficiently (my test with 1 million rows of similar structure took about a minute)

dt <- data.table(A=c(1,2,3),B=list(c("A","B","C"),"D",c("E","F"))
dt.flat <- dt[,unlist(B),by=A]

@MichaelChirico MichaelChirico added the top request One of our most-requested issues label Apr 14, 2024
@m-muecke
Copy link

For reference the unnest implementation in mlr3misc: https://mlr3misc.mlr-org.com/reference/unnest.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request top request One of our most-requested issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants