Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x[, .(shift(b)), keyby = a] returns list type (should be int) #5939

Closed
shrektan opened this issue Feb 18, 2024 · 7 comments · Fixed by #5950
Closed

x[, .(shift(b)), keyby = a] returns list type (should be int) #5939

shrektan opened this issue Feb 18, 2024 · 7 comments · Fixed by #5950
Assignees
Milestone

Comments

@shrektan
Copy link
Member

shrektan commented Feb 18, 2024

Upgrading to the latest CRAN version of data.table, expression like x[, .(shift(b)), keyby = a] will return a list type for V1 column, which is unexpected and inconsistent. In previous versions, it always returns <int>.

library(data.table)
x = data.table(a = c(rep(1, 5), rep(2, 5)), b = 1:10)
x[, shift(b), keyby = a]
#> Key: <a>
#>         a    V1
#>     <num> <int>
#>  1:     1    NA
#>  2:     1     1
#>  3:     1     2
#>  4:     1     3
#>  5:     1     4
#>  6:     2    NA
#>  7:     2     6
#>  8:     2     7
#>  9:     2     8
#> 10:     2     9
x[, .(shift(b)), keyby = a]
#> Key: <a>
#>         a     V1
#>     <num> <list> <------------ ### it should be <int> ###
#>  1:     1     NA
#>  2:     1      1
#>  3:     1      2
#>  4:     1      3
#>  5:     1      4
#>  6:     2     NA
#>  7:     2      6
#>  8:     2      7
#>  9:     2      8
#> 10:     2      9
x[, .(shift(b), b), keyby = a]
#> Key: <a>
#>         a    V1     b
#>     <num> <int> <int>
#>  1:     1    NA     1
#>  2:     1     1     2
#>  3:     1     2     3
#>  4:     1     3     4
#>  5:     1     4     5
#>  6:     2    NA     6
#>  7:     2     6     7
#>  8:     2     7     8
#>  9:     2     8     9
#> 10:     2     9    10

Created on 2024-02-18 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31)
#>  os       macOS Sonoma 14.0
#>  system   aarch64, darwin20
#>  ui       X11
#>  language en
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Asia/Shanghai
#>  date     2024-02-18
#>  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
#>  data.table  * 1.15.0  2024-01-30 [1] CRAN (R 4.3.1)
#>  digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
#>  fs            1.6.3   2023-07-20 [1] CRAN (R 4.3.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
#>  htmltools     0.5.6   2023-08-10 [1] CRAN (R 4.3.0)
#>  knitr         1.44    2023-09-11 [1] CRAN (R 4.3.0)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.3.0)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown     2.24    2023-08-14 [1] CRAN (R 4.3.0)
#>  rstudioapi    0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  styler        1.10.2  2023-08-29 [1] CRAN (R 4.3.0)
#>  vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.3.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
#>  xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@jangorecki
Copy link
Member

This should be included in coming patch release

@jangorecki jangorecki added this to the 1.15.2 milestone Feb 18, 2024
@MichaelChirico
Copy link
Member

Bisected #5205 as the first commit where this becomes list, so as a workaround you can disable GForce (any of various ways, e.g. #5942)

@MichaelChirico
Copy link
Member

This should be included in coming patch release

we should absolutely fix this, but it's not going to get us removed from CRAN, so if no one has time for a fix right now, it could be part of 1.15.4 patch release instead

@renkun-ken
Copy link
Member

I encounter this problem too with the following example:

library(data.table)

dt <- expand.grid(
  date = 1:10,
  time = 1:10,
  id = 1:5
)

setDT(dt, key = c("date", "time", "id"))
dt[, x := runif(.N)]
dt[time == 1, x1 := shift(x, type = "lead"), by = id]

previous producing

Key: <date, time, id>
Index: <time>
      date  time    id         x        x1
     <int> <int> <int>     <num>     <num>
  1:     1     1     1 0.1217285 0.2041882
  2:     1     1     2 0.2499479 0.3632787
  3:     1     1     3 0.9992822 0.3553096
  4:     1     1     4 0.9851168 0.1797294
  5:     1     1     5 0.2273191 0.2967562
 ---                                      
496:    10    10     1 0.2503363        NA
497:    10    10     2 0.2226453        NA
498:    10    10     3 0.3320792        NA
499:    10    10     4 0.3255982        NA
500:    10    10     5 0.1976716        NA

but now producing

> dt
Key: <date, time, id>
Index: <time>
      date  time    id          x                                                                    x1
     <int> <int> <int>      <num>                                                                <list>
  1:     1     1     1 0.74905819 0.59769795,0.30605044,0.74046541,0.08694826,0.24060779,0.89059279,...
  2:     1     1     2 0.51506472 0.59769795,0.30605044,0.74046541,0.08694826,0.24060779,0.89059279,...
  3:     1     1     3 0.82395315 0.59769795,0.30605044,0.74046541,0.08694826,0.24060779,0.89059279,...
  4:     1     1     4 0.81986857 0.59769795,0.30605044,0.74046541,0.08694826,0.24060779,0.89059279,...
  5:     1     1     5 0.78196531 0.59769795,0.30605044,0.74046541,0.08694826,0.24060779,0.89059279,...
 ---                                                                                                   
496:    10    10     1 0.53337552                                                                      
497:    10    10     2 0.03719923                                                                      
498:    10    10     3 0.73477434                                                                      
499:    10    10     4 0.03107033                                                                      
500:    10    10     5 0.72256881                               

@MichaelChirico
Copy link
Member

Can you check if #5950 solves your use case?

@renkun-ken
Copy link
Member

Can you check if #5950 solves your use case?

Just tried still not working in my case

@ben-schwen
Copy link
Member

ben-schwen commented Feb 24, 2024

          Apparently, gshift cannot work with subsetting since it always creates vectors of `length(x)` but not length `irows`.

Originally posted by @ben-schwen in #5950 (comment)

This becomes clearer when using @renkun-ken example without the assignment by :=

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants