Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setcolororder scrambling the dataset #6171

Open
eacabbi opened this issue Jun 8, 2024 · 10 comments
Open

Setcolororder scrambling the dataset #6171

eacabbi opened this issue Jun 8, 2024 · 10 comments

Comments

@eacabbi
Copy link

eacabbi commented Jun 8, 2024

Recently after the last developer update data.table started behaving strangely. Among the encountered issued, I witnessed in my code that setcolorder does not work any more as intended.

setcolorder(dataset,c("x","y","z"))
did not lead to reordering the column in the dataset in the order, but just to rename the inital columns of the datset without altering the content. This implied that a column which was second in the dataset before is after the command called "y" but does not has the right contents.

@TimothyWillard
Copy link

Could you provide a reproducible example? I'm unable to recreate what I understand the issue to be with version 1.15.4:

library(data.table)
DT = data.table(
  'abc'=letters,
  'def'=LETTERS,
  'ghi'=1L:26L
)
str(DT)
#> Classes 'data.table' and 'data.frame':   26 obs. of  3 variables:
#>  $ abc: chr  "a" "b" "c" "d" ...
#>  $ def: chr  "A" "B" "C" "D" ...
#>  $ ghi: int  1 2 3 4 5 6 7 8 9 10 ...
#>  - attr(*, ".internal.selfref")=<externalptr>
setcolorder(DT, c('def', 'ghi', 'abc'))
str(DT)
#> Classes 'data.table' and 'data.frame':   26 obs. of  3 variables:
#>  $ def: chr  "A" "B" "C" "D" ...
#>  $ ghi: int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ abc: chr  "a" "b" "c" "d" ...
#>  - attr(*, ".internal.selfref")=<externalptr>

Created on 2024-06-08 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       macOS Sonoma 14.5
#>  system   x86_64, darwin23.2.0
#>  ui       unknown
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2024-06-08
#>  pandoc   3.2 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.3)
#>  data.table  * 1.15.4  2024-03-30 [1] CRAN (R 4.3.3)
#>  digest        0.6.35  2024-03-11 [1] CRAN (R 4.3.3)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.3)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.3)
#>  fs            1.6.4   2024-04-25 [1] CRAN (R 4.3.3)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.3)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
#>  knitr         1.46    2024-04-06 [1] CRAN (R 4.3.3)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.3)
#>  reprex        2.1.0   2024-01-11 [1] CRAN (R 4.3.3)
#>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.3)
#>  rmarkdown     2.26    2024-03-05 [1] CRAN (R 4.3.3)
#>  rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.3.3)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.3)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.3.3)
#>  xfun          0.43    2024-03-25 [1] CRAN (R 4.3.3)
#>  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.3)
#> 
#>  [1] /usr/local/lib/R/4.3/site-library
#>  [2] /usr/local/Cellar/r/4.3.3/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@eacabbi
Copy link
Author

eacabbi commented Jun 8, 2024

ok that's strange. I can confirm that your example runs fine on my computer, and still I went back, double-checked my data, and can confirm that for my dataset setcolorder truly scrambles the columns...

`─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.4.0 (2024-04-24)
os macOS Sonoma 14.5
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Madrid
date 2024-06-08
pandoc NA

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
! package * version date (UTC) lib source
bdsmatrix 1.3-7 2024-03-02 [1] CRAN (R 4.4.0)
bit 4.0.5 2022-11-15 [1] CRAN (R 4.4.0)
bit64 4.0.5 2020-08-30 [1] CRAN (R 4.4.0)
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0)
cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.4.0)
cli 3.6.2 2023-12-11 [1] CRAN (R 4.4.0)
collapse 2.0.14 2024-05-24 [1] CRAN (R 4.4.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.4.0)
crayon 1.5.2 2022-09-29 [1] CRAN (R 4.4.0)
V data.table * 1.15.99 2024-03-30 [1] CRAN (R 4.4.0) (on disk 1.15.4)
devtools * 2.4.5 2022-10-11 [1] CRAN (R 4.4.0)
digest 0.6.35 2024-03-11 [1] CRAN (R 4.4.0)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
dreamerr 1.4.0 2023-12-21 [1] CRAN (R 4.4.0)
DT * 0.33 2024-04-04 [1] CRAN (R 4.4.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
fixest 0.12.1 2024-05-18 [1] https://fastverse.r-universe.dev (R 4.4.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.4.0)
Formula 1.2-5 2023-02-24 [1] CRAN (R 4.4.0)
fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
V fst * 0.9.8 2024-06-08 [1] Github (fstpackage/fst@6f9ec28) (on disk 0.9.9)
fstcore * 0.9.18 2023-12-02 [1] CRAN (R 4.4.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
ggplot2 * 3.5.1 2024-04-23 [1] CRAN (R 4.4.0)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
gtable 0.3.5 2024-04-22 [1] CRAN (R 4.4.0)
haven * 2.5.4 2023-11-30 [1] CRAN (R 4.4.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.4.0)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0)
httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.4.0)
later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0)
lattice 0.22-6 2024-03-20 [1] CRAN (R 4.4.0)
lfe 3.0-0 2024-02-29 [1] CRAN (R 4.4.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
lmtest 0.9-40 2022-03-21 [1] CRAN (R 4.4.0)
lpdensity 2.4 2023-01-21 [1] CRAN (R 4.4.0)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.4.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
MASS 7.3-60.2 2024-04-24 [1] local
Matrix 1.7-0 2024-03-22 [1] CRAN (R 4.4.0)
maxLik 1.5-2.1 2024-03-24 [1] CRAN (R 4.4.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0)
mime 0.12 2021-09-28 [1] CRAN (R 4.4.0)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0)
miscTools 0.6-28 2023-05-03 [1] CRAN (R 4.4.0)
munsell 0.5.1 2024-04-01 [1] CRAN (R 4.4.0)
nlme 3.1-165 2024-06-06 [1] CRAN (R 4.4.0)
numDeriv 2016.8-1.1 2019-06-06 [1] CRAN (R 4.4.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0)
pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.4.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
pkgload 1.3.4 2024-01-16 [1] CRAN (R 4.4.0)
plm * 2.6-4 2024-04-01 [1] CRAN (R 4.4.0)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.4.0)
promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.4.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0)
rbibutils 2.2.16 2023-10-25 [1] CRAN (R 4.4.0)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.4.0)
rddensity * 2.5 2024-01-22 [1] CRAN (R 4.4.0)
Rdpack 2.6 2023-11-08 [1] CRAN (R 4.4.0)
rdrobust * 2.2 2023-11-03 [1] CRAN (R 4.4.0)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.4.0)
readxl * 1.4.3 2023-07-06 [1] CRAN (R 4.4.0)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.4.0)
rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
sandwich 3.1-0 2023-12-11 [1] CRAN (R 4.4.0)
scales 1.3.0 2023-11-28 [1] CRAN (R 4.4.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
shiny 1.8.1.1 2024-04-02 [1] CRAN (R 4.4.0)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.4.0)
stringmagic 1.1.2 2024-04-30 [1] CRAN (R 4.4.0)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.4.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.4.0)
tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.4.0)
timechange 0.3.0 2024-01-18 [1] CRAN (R 4.4.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.4.0)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0)
usethis * 2.2.3 2024-02-19 [1] CRAN (R 4.4.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
vroom 1.6.5 2023-12-05 [1] CRAN (R 4.4.0)
withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0)
zoo 1.8-12 2023-04-13 [1] CRAN (R 4.4.0)

[1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library`

Any idea on something else I can provide? Unfortunately I cannot move the data...

@TimothyWillard
Copy link

  1. Could you provide a subset of the data or a dataset that has a similar str?
  2. The code that is causing the issue would be helpful, maybe it's not the setcolorder call that's causing a problem?
  3. sessionInfo()

@TimothyWillard
Copy link

TimothyWillard commented Jun 8, 2024

This V data.table * 1.15.99 2024-03-30 [1] CRAN (R 4.4.0) (on disk 1.15.4) suggests that the data.table being used here was installed from source maybe? Trying with the current master branch (a5e2bca):

library(data.table)
DT = data.table(
  'abc'=letters,
  'def'=LETTERS,
  'ghi'=1L:26L
)
str(DT)
#> Classes 'data.table' and 'data.frame':   26 obs. of  3 variables:
#>  $ abc: chr  "a" "b" "c" "d" ...
#>  $ def: chr  "A" "B" "C" "D" ...
#>  $ ghi: int  1 2 3 4 5 6 7 8 9 10 ...
#>  - attr(*, ".internal.selfref")=<externalptr>
setcolorder(DT, c('def', 'ghi', 'abc'))
str(DT)
#> Classes 'data.table' and 'data.frame':   26 obs. of  3 variables:
#>  $ def: chr  "A" "B" "C" "D" ...
#>  $ ghi: int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ abc: chr  "a" "b" "c" "d" ...
#>  - attr(*, ".internal.selfref")=<externalptr>

Created on 2024-06-08 with reprex v2.1.0

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.3 (2024-02-29)
#>  os       macOS Sonoma 14.5
#>  system   x86_64, darwin23.2.0
#>  ui       unknown
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2024-06-08
#>  pandoc   3.2 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.3.3)
#>  data.table  * 1.15.99 2024-06-08 [1] local
#>  digest        0.6.35  2024-03-11 [1] CRAN (R 4.3.3)
#>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.3.3)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.3)
#>  fs            1.6.4   2024-04-25 [1] CRAN (R 4.3.3)
#>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.3)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
#>  knitr         1.46    2024-04-06 [1] CRAN (R 4.3.3)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.3)
#>  reprex        2.1.0   2024-01-11 [1] CRAN (R 4.3.3)
#>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.3.3)
#>  rmarkdown     2.26    2024-03-05 [1] CRAN (R 4.3.3)
#>  rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.3.3)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.3)
#>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.3.3)
#>  xfun          0.43    2024-03-25 [1] CRAN (R 4.3.3)
#>  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.3.3)
#> 
#>  [1] /usr/local/lib/R/4.3/site-library
#>  [2] /usr/local/Cellar/r/4.3.3/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@TysonStanley
Copy link
Member

Setting options(datatable.verbose = TRUE) could help diagnose as well.

@tdhock
Copy link
Member

tdhock commented Jun 8, 2024

original post wrote "after the last developer update" meaning github master? could be related to #6068?

@eacabbi
Copy link
Author

eacabbi commented Jun 9, 2024

sorry for the annoying stuff, it's not simple to really reproduce it here (the dataset has a lot of variables).
What I can say is that

  • yes, compiled from source
  • it is setcolorder, I have worked around the rest of the code and really, the issue happens trying to reorder columns.
  • I tried to only select a subset of variables and then use setcolorder to reorder the columns: everything worked fine there.
  • setting to verbose did not tell me anything useful, especially considering my previous point.

So it must be something about the fact that I have many many columns, and some of them maybe create a problem. It is not obvious to me what that might be, I am experimenting a bit to see whether I figure it out. If I understand anything more I will let you know.

Thanks!

@tdhock
Copy link
Member

tdhock commented Jun 9, 2024

would be useful if you could create a data set with 1 row and many many columns that reproduces your issue.

@MichaelChirico
Copy link
Member

MichaelChirico commented Jun 9, 2024

Yes, can you reproduce this issue by doing the following?

# ... other code ...
dataset <- dataset[0]
setcolorder(dataset, ...) # the same setcolorder() call

If so, hopefully you're comfortable sharing at least your column names.

Another suggestion: anonymize the data like so:

anonymized_data <- dataset |>
  lapply(\(x) vector(typeof(x), length(x))) |>
  setDT()
setcolorder(anonymized_dataset, ...)

Some more care could be taken to reproduce common types like factor/Date/POSIXct, but hopefully this gives you a good idea of how to proceed.

@TysonStanley
Copy link
Member

Hi @eacabbi any updates on this? I think Michael had a good suggestion for how to share a more reproducible example if it's possible to share very minimal information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants