Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upBug report - RSession Hangs #1470
Comments
|
I cannot reproduce the hang on Ubuntu R 3.2.3 and 1.9.7. |
|
Hm, the performance hit seems to be due to optimising Seems like having a lot of columns in options(datatable.optimize=0L) # without optimisation
system.time(DT[,.SD,by=st])
# user system elapsed
# 0.481 0.012 0.502
options(datatable.optimize=Inf) # with optimisation
system.time(DT[,.SD,by=st])
# user system elapsed
# 53.125 8.002 61.784 Can't reproduce the session hang. |
|
@Jorges1000 is this still a problem in the latest releases? |
|
not sure this line with
|
|
Looks like it might be > mt = rep(rownames(mtcars)[1:25],20)
> st = rep(state.name,10)
> DT = data.table(mt=mt, st=st, matrix(sample(1:(30000L*500),30000*500,replace=T),
nrow=500,ncol=30000), key='mt')
> options(datatable.optimize=0L)
> system.time(DT[,.SD,by=st])
user system elapsed
0.512 0.012 0.367
> options(datatable.optimize=Inf)
> system.time(DT[,.SD,by=st])
user system elapsed
25.083 3.157 28.107
> Rprof()
> system.time(DT[,.SD,by=st])
user system elapsed
24.321 2.708 26.897
> Rprof(NULL)
> summaryRprof()
$by.self
self.time self.pct total.time total.pct
"[.data.table" 13.88 51.26 27.02 99.78
"dotN" 13.12 48.45 13.12 48.45
"gc" 0.06 0.22 0.06 0.22
"c" 0.02 0.07 0.02 0.07@MichaelChirico commented here, that call to |
Rsession hangs
When a data.table with large numbers of columns is queried using .SD, first this takes much longer than just creating the DT (from about a minute to nearly 10 minutes), then after a while R starts running in the background for large period of time (5-10 minutes) even without any command. We can see on the Activity Monitor that the rsession process is on at 100% and RStudio unresponsive. Note that R library is in a custom folder and this happens more often if many queries are done on DT. Tried turning off options(datatable.auto.index=FALSE) to no avail.
Using the latest versions of RStudio (0.99.489), R (3.2.3), and data.table (1.9.6) under OS X 10.9.5 (Mavericks) on x86_64-apple-darwin13.4.0 (64-bit). attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] data.table_1.9.6 microbenchmark_1.4-2.1
loaded via a namespace (and not attached): Rcpp_0.12.2 digest_0.6.8 MASS_7.3-45 chron_2.3-47 grid_3.2.3 plyr_1.8.3 gtable_0.1.2 magrittr_1.5 scales_0.3.0 ggplot2_1.0.1 stringi_1.0-1 reshape2_1.4.1 proto_0.3-10 tools_3.2.3 stringr_1.0.0 munsell_0.4.2 colorspace_1.2-6