Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ALTREP bug after adding >=64th column #4734

Closed
etryn opened this issue Oct 3, 2020 · 4 comments · Fixed by #4756
Closed

ALTREP bug after adding >=64th column #4734

etryn opened this issue Oct 3, 2020 · 4 comments · Fixed by #4756
Milestone

Comments

@etryn
Copy link

etryn commented Oct 3, 2020

I found this bug after I freshly installed the newest available versions of R, RStudio, data.table, and dplyr on a new Mac. I tried to run a simple script that worked on my old computer and ran into this bug. It seemed to happen somewhat unpredictably after running several lines of dplyr, with a session restart occasionally fixing it but only temporarily. Once the bug had "hit" a specific object, the error occurred on any interaction with that object afterwards including RStudio View() and printing the object to the console. I'm a novice but I worked with a friend to create this reproducible example which will hopefully work on other computers. In my actual script, the bug did not always occur in the same part of the code.

Minimal reproducible example

library(dplyr)
library(data.table)
exp <-
  structure(list(a = 1L, b = 1L, c = 1L, d = 1L, e = 1L, f = 1L, g = 1L, h = 1L, i = 1L, j = 1L, k = 1L, l = 1L, m = 1L, n = 1L, o = 1L, p = 1L, q = 1L, r = 1L, s = 1L, t = 1L, u = 1L, v = 1L, w = 1L, x = 1L, y = 1L, z = 1L,
                 aa = 1L, ab = 1L, ac = 1L, ad = 1L, ae = 1L, af = 1L, ag = 1L, ah = 1L, ai = 1L, aj = 1L, ak = 1L, al = 1L, am = 1L, an = 1L, ao = 1L, ap = 1L, aq = 1L, ar = 1L, as = 1L, at = 1L, au = 1L, av = 1L, aw = 1L, ax = 1L, ay = 1L, az = 1L,
                 ba = 1L, bb = 1L, bc = 1L, bd = 1L, be = 1L, bf = 1L, bg = 1L, bh = 1L, bi = 1L, bj = 1L, bk = 1L, bl = 1L, bm = 1L), class = "data.frame", row.names = c(NA, -1L))

expadj <- as.data.table(exp)
expadj <- expadj %>% mutate(a = 11) # overwrite preexisting column (??) (doesn't seem to matter which?)

expadj$NewColumn <- expadj$b
# fails here with error "Error in setalloccol(y) : can't set ALTREP truelength"

It seems possibly similar to #2990 or #3051 and in my case seems related to having at least 64 columns before the dplyr line. Changing the existing column without using dplyr prevents the bug from occurring.

Output of sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.13.0 dplyr_1.0.2      

loaded via a namespace (and not attached):
 [1] crayon_1.3.4     R6_2.4.1         lifecycle_0.2.0  magrittr_1.5    
 [5] pillar_1.4.6     rlang_0.4.7      rstudioapi_0.11  vctrs_0.3.4     
 [9] generics_0.0.2   ellipsis_0.3.1   tools_4.0.2      glue_1.4.2      
[13] purrr_0.3.4      compiler_4.0.2   pkgconfig_2.0.3  tidyselect_1.1.0
[17] tibble_3.0.3    
@jangorecki
Copy link
Member

Could you provide an example which uses data.table and base R? dplyr installation takes long time so it is not really good to force readers of your report to install it.

@ColeMiller1
Copy link
Contributor

I can reproduce with R 4.0.2 / data.table 1.13.0 / dplyr 1.0.0 / Windows 10.

I cannot reproduce using base methods. However, here is a more minimal dplyr example:

library(dplyr)
library(data.table)

mutate(setDT(as.list(1:64)), V1 = 11)
##Error in .shallow(x, cols = cols, retain.key = TRUE) : 
##  can't set ALTREP truelength

The traceback indicates that for this, print.data.table is the culprit which means the object output from the mutate call is immediately bad. For example, we can change to print.data.frame and be OK:

print.data.frame(mutate(setDT(as.list(1:64)), V1 = 11))
#>   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
#> 1 11  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21
#>   V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 V39 V40
#> 1  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
#>   V41 V42 V43 V44 V45 V46 V47 V48 V49 V50 V51 V52 V53 V54 V55 V56 V57 V58 V59
#> 1  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59
#>   V60 V61 V62 V63 V64
#> 1  60  61  62  63  64

Thus far the requirements to reproduce:

  1. A data.table with at least 64 columns
  2. A dplyr::mutate call which modifies an existing column in the data.table.

@etryn
Copy link
Author

etryn commented Dec 11, 2020

Just returned to the project where I first encountered this bug, and wanted to say the code now runs perfectly with data.table 1.13.4. Thanks so much for the fix!

@Steviey

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants