Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding performance regression from testthat with SnowParam #127

Closed
LTLA opened this issue Nov 6, 2020 · 0 comments
Closed

Avoiding performance regression from testthat with SnowParam #127

LTLA opened this issue Nov 6, 2020 · 0 comments

Comments

@LTLA
Copy link
Contributor

LTLA commented Nov 6, 2020

Since testthat version 3.0.0 landed, I have experienced a significant increase in the amount of time taken to run unit tests. For example, scran's unit tests have ballooned out from the already-hefty ~280 seconds to an unacceptable 440 seconds. Some forensics suggest that this is caused by an unfortunate interaction between testthat and BiocParallel.

Reproducing the problem

Create a testthat/ directory in your current working directory. Inside testthat/, create a test-snow.R file with:

test_that("what's going on here?", {
    BPPARAM <- SnowParam(2)
    out <- bplapply(1:10, identity, BPPARAM=BPPARAM)
    expect_identical(1:10, unlist(out))
})

Then, from your current working directory, start a new R session and run:

library(testthat)
library(scran)
library(BiocParallel)
system.time(test_dir("testthat", package="BiocGenerics"))
##    user  system elapsed 
##   3.463   0.329  15.560 

These timings were taken an R environment running testthat 3.0.0. On a similar environment with testthat 2.3.2, I get:

##    user  system elapsed 
##   0.811   0.160   2.321 

So it's clearly gotten worse. Some clues can be found by going back to our 3.0.0 environment and omitting the scran load:

library(testthat)
library(BiocParallel)
system.time(test_dir("testthat", package="BiocGenerics"))
##    user  system elapsed 
##   0.962   0.052   2.124 

You can see that the mere act of loading scran introduces a several-fold delay, despite the fact that no scran functions are ever used in test-snow.R! This leads us towards the underlying cause...

Diagnosis

The fundamental problem is that testthat 3.0.0 has taken to storing the entire top-level environment in the global options:

https://github.com/r-lib/testthat/blob/d0f78e63534516618c84da9114e20dae111093dc/R/test-files.R#L230

This gets sucked into the global_options:

if (exportglobals) {
blacklist <- c(
"askpass", "asksecret", "buildtools.check",
"buildtools.with", "pager", "plumber.swagger.url",
"profvis.print", "restart", "reticulate.repl.hook",
"reticulate.repl.initialize", "reticulate.repl.teardown",
"shiny.launch.browser", "terminal.manager", "error"
)
global_options <- base::options()
global_options <- global_options[!names(global_options) %in% blacklist]
}

BiocParallel then attempts to serialize all symbols in the environment to send to the workers, leading to the observed delay. Incidentally, this is the reason for the package="BiocGenerics" (though any package would probably do here). When package=NULL, the top-level environment is empty as no variables have been defined in the global namespace, so no real damage is done. When package="BiocGenerics", the namespace of the package is used as the environment, presumably causing all symbols in the search path to be included for serialization.

Solution

Adding "topLevelEnvironment" to BiocParallel's blacklist fixes the problem.

library(testthat)
library(scran)
library(BiocParallel)
system.time(test_dir("testthat", package="BiocGenerics"))
##    user  system elapsed 
##   0.897   0.150   2.526 
Session information
R version 4.0.3 Patched (2020-10-31 r79390)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /home/luna/Software/R/R-4-0-branch/lib/libRblas.so
LAPACK: /home/luna/Software/R/R-4-0-branch/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BiocParallel_1.25.0         scran_1.19.0               
 [3] SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0
 [5] Biobase_2.50.0              GenomicRanges_1.42.0       
 [7] GenomeInfoDb_1.26.0         IRanges_2.24.0             
 [9] S4Vectors_0.28.0            BiocGenerics_0.36.0        
[11] MatrixGenerics_1.2.0        matrixStats_0.57.0         
[13] testthat_3.0.0             

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                compiler_4.0.3           
 [3] bluster_1.0.0             XVector_0.30.0           
 [5] bitops_1.0-6              BiocNeighbors_1.8.0      
 [7] tools_4.0.3               DelayedMatrixStats_1.12.0
 [9] zlibbioc_1.36.0           pkgload_1.1.0            
[11] statmod_1.4.35            lifecycle_0.2.0          
[13] lattice_0.20-41           pkgconfig_2.0.3          
[15] rlang_0.4.8               Matrix_1.2-18            
[17] igraph_1.2.6              rstudioapi_0.11          
[19] cli_2.1.0                 DelayedArray_0.16.0      
[21] GenomeInfoDbData_1.2.4    withr_2.3.0              
[23] desc_1.2.0                rprojroot_1.3-2          
[25] locfit_1.5-9.4            grid_4.0.3               
[27] glue_1.4.2                scuttle_1.0.0            
[29] R6_2.5.0                  snow_0.4-3               
[31] fansi_0.4.1               limma_3.46.0             
[33] irlba_2.3.3               magrittr_1.5             
[35] BiocSingular_1.6.0        edgeR_3.32.0             
[37] ps_1.4.0                  backports_1.2.0          
[39] sparseMatrixStats_1.2.0   assertthat_0.2.1         
[41] beachmat_2.6.0            rsvd_1.0.3               
[43] dqrng_0.2.1               RCurl_1.98-1.2           
[45] crayon_1.3.4             
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant