Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombies when processx (via callr) and parallel are used in the same session. #113

Closed
wlandau opened this issue May 17, 2018 · 10 comments
Closed

Comments

@wlandau
Copy link

wlandau commented May 17, 2018

The drake package tries to be a high-performance computing engine, so it includes some small unit tests of its parallel backends. In R 3.5.0, when I run these tests with test_check() or devtools::test(), I see a bunch of zombie processes when I call top -bn1 | grep R$ in a Linux shell. Then, when I quit the R session, I see "Error while shutting down parallel: unable to terminate some child processes". This does not seem to happen if I use devtools::check() or revert to R 3.4.4.

The tests use callr and parallel at various points, though never in the same test. If I run each test in a separate R session, or if I deactivate only the uses of callr, there are no zombies.

So far, I have not been able to create a small reprex. I will work further on this, but in the meantime, I think general advice will help me debug.

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: KDE neon User Edition 5.12

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] MASS_7.3-50         abind_1.4-5         knitr_1.20         
[4] bindrcpp_0.2.2      drake_5.1.3.9001    testthat_2.0.0.9000

loaded via a namespace (and not attached):
 [1] storr_1.1.3        tidyselect_0.2.4   purrr_0.2.4        listenv_0.7.0     
 [5] htmltools_0.3.6    yaml_2.1.19        blob_1.1.1         XML_3.98-1.11     
 [9] rlang_0.2.0        R.oo_1.22.0        pillar_1.2.2       glue_1.2.0        
[13] withr_2.1.2        DBI_1.0.0          R.utils_2.6.0      CodeDepends_0.5-3 
[17] bit64_0.9-7        bindr_0.1.1        stringr_1.3.1      commonmark_1.5    
[21] R.methodsS3_1.7.1  visNetwork_2.0.3   future_1.8.1       htmlwidgets_1.2   
[25] devtools_1.13.5    codetools_0.2-15   evaluate_0.10.1    memoise_1.1.0     
[29] callr_2.0.4        parallel_3.5.0     Rcpp_0.12.16       backports_1.1.2   
[33] formatR_1.5        jsonlite_1.5       bit_1.1-13         digest_0.6.15     
[37] stringi_1.2.2      processx_3.1.0     dplyr_0.7.4        rprojroot_1.3-2   
[41] cli_1.0.0          tools_3.5.0        magrittr_1.5       tibble_1.4.2      
[45] RSQLite_2.1.1      crayon_1.3.4       future.apply_0.2.0 pkgconfig_2.0.1   
[49] xml2_1.2.0         lubridate_1.7.4    assertthat_0.2.0   roxygen2_6.0.1    
[53] R6_2.2.2           globals_0.11.0     igraph_1.2.1       compiler_3.5.0    

Related: r-lib/testthat#757

@wlandau
Copy link
Author

wlandau commented May 18, 2018

Update: the following is sufficient to produce the zombies I am seeing.

for (i in 1:2){
  parallel::mclapply(1:2, sqrt, mc.cores = 2)
  processx::run("ls")
}

wlandau-lilly added a commit to ropensci/drake that referenced this issue May 18, 2018
Put all the tests involving processx at the end
(after all the SIGCHLD stuff that the parallel package uses).
@wlandau
Copy link
Author

wlandau commented May 18, 2018

The conditions for zombies seem to be a bit more specific. Zombies spawn here:

parallel::mclapply(1:2, sqrt, mc.cores = 2)
processx::run("ls")
parallel::mclapply(1:2, sqrt, mc.cores = 2)

but not in any of these example sessions:

parallel::mclapply(1:2, sqrt, mc.cores = 2)
processx::run("ls")
processx::run("ls")
parallel::mclapply(1:2, sqrt, mc.cores = 2)
processx::run("ls")
parallel::mclapply(1:2, sqrt, mc.cores = 2)
processx::run("ls")

@gaborcsardi
Copy link
Member

Looks like you have a workaround, so I'll close this. As I said at the callr issue, fork clusters are not reliable, and should not be used.

I'll think about a solution for the signal handle clashes, right now I only have hacky and dangerous workarounds.

@wlandau
Copy link
Author

wlandau commented May 27, 2018

Sounds reasonable, thanks!

@kforner
Copy link

kforner commented Jan 29, 2020

As I said at the callr issue, fork clusters are not reliable, and should not be used.

You mean fork clusters as created by mclapply ? I've used them for more than 6 years for every purpose, and never had any problem.

But now after switching to R 3.6 I'm also seeing this Error while shutting down parallel: unable to terminate some child processes message. My package used both parallel and callr, and callr calls within some mclapply() calls.
Any suggestion or work-around ?

Thanks.

@gaborcsardi
Copy link
Member

You mean fork clusters as created by mclapply ? I've used them for more than 6 years for every purpose, and never had any problem.

https://duckduckgo.com/?q=%22fork+without+exec%22&t=canonical&ia=web

Unfortunately there is no workaround, only to avoid using both fork clusters and processx together.

@kforner
Copy link

kforner commented Jan 29, 2020

https://duckduckgo.com/?q=%22fork+without+exec%22&t=canonical&ia=web

So the problem seems to be with multi-threaded programs.
R is not multi-threaded. While some c/c++ code called by R could be, we can reasonably assume that once the external function has returned, all threads have been reaped.
Did I miss something ?

Unfortunately there is no workaround, only to avoid using both fork clusters and processx together.

The problem is that I have a large codebase using mclapply. What would you recommend to use instead ?

@gaborcsardi
Copy link
Member

R is not multi-threaded.

On some platforms the system libraries are, and they don't support fork without exec. It is also not just a problem with multi-threaded programs. See e.g.
https://stat.ethz.ch/pipermail/r-devel/2020-January/078911.html

The problem is that I have a large codebase using mclapply. What would you recommend to use instead ?

First of all, is that "error" really an R error or just a message? Because it is the latter, and you are not seeing other issues, e.g. zombies or crashes, then you can just ignore it.

Also, can you pls open a new issue for this, maybe I can look into some hack to notify parallel that its subprocess finished.

@kforner
Copy link

kforner commented Jan 29, 2020

It seems to just be a message. But it also prevents quitting the R console/session normally.
I'll try to get a reproducible example, and then I'll create a new issue.

Thanks!

@gaborcsardi
Copy link
Member

I think what is happening is that

  1. parallel starts some processes, defines a signal handler
  2. processx is loaded and starts some processes, so it redefines the signal handler
  3. a parallel subprocess finishes, but parallel does not notice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants