Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

obscure error in melt #1754

Closed
franknarf1 opened this issue Jun 23, 2016 · 7 comments · Fixed by #3455
Closed

obscure error in melt #1754

franknarf1 opened this issue Jun 23, 2016 · 7 comments · Fixed by #3455
Labels
Milestone

Comments

@franknarf1
Copy link
Contributor

I have some more super-wide data and wanted to melt it:

melt(DT, 1:2)
# Error in rbindlist(l, use.names, fill, idcol) : 
#   Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'character'

if I had instead run with melt(DT, names(DT)[1:2]), I'd see the same message. Interestingly, when I repeat the command, I get a different error:

# Error in rbindlist(l, use.names, fill, idcol) : 
#   Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'integer'

After that, I never see the 'character' error again. I ran this a few times on my single-row example and managed to crash R. This error isn't a big problem, though; just means I need to approach the problem differently.

@wpmarble
Copy link

wpmarble commented Dec 4, 2016

EDIT: problem ended up being due to mistakenly assigning the same name to multiple columns.

Tacking on another example of this:

library(dplyr);library(data.table)

synth = fread("http://stanford.edu/~wpmarble/MLAB_data.txt") %>% t %>% as.data.table

colnames(synth) = c("state", "income", "retailprice", "percent_15_19", 
                    "beercons", "smoking88", "smoking80", "smoking75",
                    paste0("smoking", 70:99), "smoking00")

synth.long = melt(synth, 
                  id.vars = c("state", "income", "retailprice", 
                              "percent_15_19", "beercons"), 
                  measure = patterns("^smoking"))
synth.long

# Error in rbindlist(l, use.names, fill, idcol) : 
#  Value of SET_STRING_ELT() must be a 'CHARSXP' not a 'integer'

Different runs will produce a slightly different error; sometimes instead of 'integer' it says 'character' or 'raw'. A few times the last line (printing synth.long) has crashed RStudio.

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.1 (unknown)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.9.6 dplyr_0.4.3     

loaded via a namespace (and not attached):
[1] R6_2.1.2       assertthat_0.1 magrittr_1.5   parallel_3.3.0 tools_3.3.0   
[6] DBI_0.4-1      Rcpp_0.12.5    chron_2.3-47      

@wligtenberg
Copy link
Contributor

It is not just that columns have the same name, because that can also work just fine.
I had a bit of trouble to make a nice simple reproducible example.

This works fine:

DT <- setDT(data.frame("Time.point" = seq(0, 6), "Time.(h)" = c(0.0, 0.5, 1.0, 3.0, 5.0, 7.0, 24.0), 
                       "NEW.ME" = runif(7), "NEW.ME" = runif(7), check.names = FALSE))
DT <- data.table::melt(data = DT, c("Time.point", "Time.(h)"), na.rm = TRUE)
DT

This one will either error, or crash R:

DT <- setDT(data.frame("Time.point" = seq(0, 6), "Time.(h)" = c(0.0, 0.5, 1.0, 3.0, 5.0, 7.0, 24.0), 
     "NEW.ME" = runif(7), "NEW.ME" = runif(7), "NEW.ME" = runif(7), "NEW.ME" = runif(7), "NEW.ME" = runif(7), 
     "NEW.ME" = runif(7), "NEW.ME" = runif(7), "NEW.ME" = runif(7), "NEW.MER" = runif(7), "F050" = runif(7), 
     "NEW.MER" = runif(7), "F16-42-123p123C" = runif(7), "F16-42-123p123C" = runif(7), "NEW.MER" = runif(7), 
     "F16-42-123p123C" = runif(7), check.names = FALSE))
DT <- data.table::melt(data = DT, c("Time.point", "Time.(h)"), na.rm = TRUE)
DT

SessionInfo:

R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C                       LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.0

loaded via a namespace (and not attached):
[1] tools_3.3.2

@MichaelChirico
Copy link
Member

@wpmarble your example actually segfaults on current dev:

synth = fread("http://stanford.edu/~wpmarble/MLAB_data.txt") %>% t %>% as.data.table
# trying URL 'http://stanford.edu/~wpmarble/MLAB_data.txt'
# Content type 'text/plain' length 15833 bytes (15 KB)
# ==================================================
# downloaded 15 KB
 
colnames(synth) = c("state", "income", "retailprice", "percent_15_19", 
                    "beercons", "smoking88", "smoking80", "smoking75",
                    paste0("smoking", 70:99), "smoking00")

synth.long = melt(synth, 
                  id.vars = c("state", "income", "retailprice", 
                              "percent_15_19", "beercons"), 
                  measure = patterns("^smoking"))
synth.long

#  *** caught segfault ***
# address 0xe000008e, cause 'memory not mapped'

@tdhock
Copy link
Member

tdhock commented Nov 26, 2017

I am having the same issue, segfault after calling melt on a data.table with two identical column names. Here is a MRE:

library(data.table)
devtools::session_info()
buggy.dt <- fread("month,Record high,Average high,Daily mean,Average low,Record low,Average precipitation,Average rainfall,Average snowfall,Average precipitation,Average rainy,Average snowy,Mean monthly sunshine hours
Jan,12.8,-5.4,-8.9,-12.4,-33.5,73.6,28.4,45.9,15.8,4.3,13.6,99.2
Feb,15,-3.7,-7.2,-10.6,-33.3,70.9,22.7,46.6,12.8,4,11.1,119.5
Mar,25.9,2.4,-1.2,-4.8,-28.9,80.2,42.2,36.8,13.6,7.4,8.3,158.8
Apr,30.1,11,7,2.9,-17.8,76.9,65.2,11.8,12.5,10.9,3,181.7
May,34.2,19,14.5,10,-5,86.5,86.5,0.4,12.9,12.8,0.14,229.8
Jun,34.5,23.7,19.3,14.9,1.1,87.5,87.5,0,13.8,13.8,0,250.1
Jul,36.1,26.6,22.3,17.9,7.8,106.2,106.2,0,12.3,12.3,0,271.6
Aug,35.6,24.8,20.8,16.7,6.1,100.6,100.6,0,13.4,13.4,0,230.7
Sep,33.5,19.4,15.7,11.9,0,100.8,100.8,0,12.7,12.7,0,174.1")
tall.dt <- melt(buggy.dt, id.vars="month", verbose=TRUE)
print(tall.dt)

The output I get on my system is:

tdhock@recycled:~/projects/temperature-sensor(master)$ R --vanilla < buggy-simple.R 

R version 3.4.2 (2017-09-28) -- "Short Summer"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: i686-pc-linux-gnu (32-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(data.table)
> devtools::session_info()
Session info ------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.2 (2017-09-28)
 system   i686, linux-gnu             
 ui       X11                         
 language en_US                       
 collate  en_US.UTF-8                 
 tz       Canada/Eastern              
 date     2017-11-25                  

Packages ----------------------------------------------------------------------
 package    * version date       source                                
 base       * 3.4.2   2017-11-18 local                                 
 compiler     3.4.2   2017-11-18 local                                 
 data.table * 1.10.5  2017-08-28 Github (Rdatatable/data.table@a869907)
 datasets   * 3.4.2   2017-11-18 local                                 
 devtools     1.13.2  2017-06-02 cran (@1.13.2)                        
 digest       0.6.12  2017-01-27 cran (@0.6.12)                        
 graphics   * 3.4.2   2017-11-18 local                                 
 grDevices  * 3.4.2   2017-11-18 local                                 
 memoise      1.1.0   2017-04-21 cran (@1.1.0)                         
 methods    * 3.4.2   2017-11-18 local                                 
 stats      * 3.4.2   2017-11-18 local                                 
 utils      * 3.4.2   2017-11-18 local                                 
 withr        2.0.0   2017-07-28 cran (@2.0.0)                         
> buggy.dt <- fread("month,Record high,Average high,Daily mean,Average low,Record low,Average precipitation,Average rainfall,Average snowfall,Average precipitation,Average rainy,Average snowy,Mean monthly sunshine hours
+ Jan,12.8,-5.4,-8.9,-12.4,-33.5,73.6,28.4,45.9,15.8,4.3,13.6,99.2
+ Feb,15,-3.7,-7.2,-10.6,-33.3,70.9,22.7,46.6,12.8,4,11.1,119.5
+ Mar,25.9,2.4,-1.2,-4.8,-28.9,80.2,42.2,36.8,13.6,7.4,8.3,158.8
+ Apr,30.1,11,7,2.9,-17.8,76.9,65.2,11.8,12.5,10.9,3,181.7
+ May,34.2,19,14.5,10,-5,86.5,86.5,0.4,12.9,12.8,0.14,229.8
+ Jun,34.5,23.7,19.3,14.9,1.1,87.5,87.5,0,13.8,13.8,0,250.1
+ Jul,36.1,26.6,22.3,17.9,7.8,106.2,106.2,0,12.3,12.3,0,271.6
+ Aug,35.6,24.8,20.8,16.7,6.1,100.6,100.6,0,13.4,13.4,0,230.7
+ Sep,33.5,19.4,15.7,11.9,0,100.8,100.8,0,12.7,12.7,0,174.1")
> tall.dt <- melt(buggy.dt, id.vars="month", verbose=TRUE)
'measure.vars' is missing. Assigning all columns other than 'id.vars' columns as 'measure.vars'.
Assigned 'measure.vars' are [Record high, Average high, Daily mean, Average low, ...].
> print(tall.dt)

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: rbindlist(l, use.names, fill, idcol)
 2: data.table::.rbind.data.table(...)
 3: rbind(deparse.level, ...)
 4: rbind(head(x, topn), tail(x, topn))
 5: print.data.table(tall.dt)
 6: print(tall.dt)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault
tdhock@recycled:~/projects/temperature-sensor(master)$

I also tried with the current data.table github master (3db6e98) and I observed the same segfault.

@sung
Copy link

sung commented Oct 30, 2018

Hello, any updates with regard to 'melting' DT with >=2 identical columns?

@MichaelChirico
Copy link
Member

@sung for now, just rename. use make.names to help.

@sung
Copy link

sung commented Oct 30, 2018

Thanks, @MichaelChirico, it seems to work for now too:
as.data.table(melt(as.data.frame(DT[,..columns.with.idental.names]))

@mattdowle mattdowle added this to the 1.12.2 milestone Mar 14, 2019
@mattdowle mattdowle mentioned this issue Mar 14, 2019
8 tasks
mattdowle added a commit that referenced this issue Mar 21, 2019
mattdowle added a commit that referenced this issue Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants