Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upsetkey changing data (not just sorting) #2540
Comments
|
Thank you for the good report. I can reproduce it and will investigate and solve the issue ASAP. |
|
Dear Patrick, Cheers, |
I have found a case where setkey can actually change the underlying rows of data (more than just sorting). It is like to structure that indexes the rows for each vector is out of sync.
The case happens when:
Please see the reproducible example:
library(data.table)
# set up some dummy dataa <- c('A', 'B', 'D', 'C')
b <- as.numeric(c(20160101,20160131, 20160102 ))
ab <- CJ(a=a, b=b, sorted = FALSE)
c <- as.numeric(c(20170101,20170131, 20170102 ))
ab2 <- CJ(a = a, b = c, sorted = FALSE)
ab <- rbindlist(list(ab, ab2))
# set up the test data.table that will give us strange resultstest <- data.table(a = ab$a)
# this must be issue ?test[, c('astart', 'aend') := as.integer(ab$b)]
# once we set the keys some unque records are removed and some are duplicatedsetkey(test, a, astart, aend)
# duplicate dataab[ (a == "A") & (b == 20160101)] # there was one row
test[(a == "A") & (astart == 20160101)] # now there are two rows?
# some of the rows have been removedtest[(a == "A") & (astart == 20170101)] # now there are no rows where a == "A"?
ab[ (a == "A") & (b == 20170101)] # there was one row
#Output of sessionInfo()R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] shiny_1.0.5 mdo_0.3.3 data.table_1.10.4-3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.14 compiler_3.4.2 bindr_0.1 tools_3.4.2 xts_0.10-0 digest_0.6.12 bit_1.1-12 evaluate_0.10.1 lubridate_1.7.1 jsonlite_1.5 tibble_1.3.4 lattice_0.20-35
[13] ff_2.2-13 pkgconfig_2.0.1 rlang_0.1.4 fastmatch_1.1-0 rstudioapi_0.7 yaml_2.1.15 bindrcpp_0.2 dplyr_0.7.4 stringr_1.2.0 knitr_1.17 htmlwidgets_0.9 rprojroot_1.2
[25] DT_0.2 grid_3.4.2 glue_1.2.0 R6_2.2.2 bookdown_0.5 rmarkdown_1.8 magrittr_1.5 backports_1.1.1 htmltools_0.3.6 rsconnect_0.8.5 assertthat_0.2.0 mime_0.5
[37] xtable_1.8-2 httpuv_1.3.5 stringi_1.1.6 zoo_1.8-0