Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Error in result_fetch(res@ptr, n = n) : database disk image is malformed\n" in parallel computing #207

Closed
luyoutao opened this issue May 23, 2019 · 5 comments

Comments

@luyoutao
Copy link

Dear author,

Thanks for developing such a useful tool! I have trouble with clusterProfiler when used in parallel. For example,

universe <- keys(org.Hs.eg.db, keytype = "SYMBOL")
res <- mclapply(1:100, enrichGO(
    gene = sample(universe, 1000), 
    OrgDb = org.Hs.eg.db, 
    keyType = "SYMBOL", 
    ont = "BP", 
    pvalueCutoff = 1, 
    qvalueCutoff = 1, 
    universe = universe), 
mc.core = 32)

Some returned value could be

[1] "Error in result_fetch(res@ptr, n = n) : database disk image is malformed\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<Rcpp::exception in result_fetch(res@ptr, n = n): database disk image is malformed>

This issue happens sporadically hence hard to trace or reproduce.

Searching the website, I find the problem seem to be specific to SQLite backend packages used in multi-process routines. Instances include:

Possible cause could be that SQLite server fails to handle excess queries during a short time, which is a typical condition when mc.cores is large. Therefore I don't think this is a bug in clusterProfiler. However, I want to know if there is any possible solution to it.

Thanks,

@luyoutao
Copy link
Author

PS. since SQLite is not a client–server database, my hypothesis might be wrong. Anyway, I'm curious about the feasibility of doing enrichGO with large number of forked processes in parallel. Thanks.

@luyoutao
Copy link
Author

PS2. One thing might be noteworthy is that, once mclapply is replaced by lapply and enrichGO runs for once, mclapply seems to stop producing such error later in the same R session. In other words, it looks like enrichGO cannot work well with mclapply in a "cold" run.

@Finesim97
Copy link

Thank you so much for this information, I was planning on parallelizing the compareCluster llpply call. enrichGO seems to cache the "GO_DATA" in a environment object, but I wasn't sure whether it survives the forking process, but If running a analysis before the parallel call fixes the database inconsistency error, it does (or maybe the caching of the sqlite database file happens somewhere else, I don't know). Thanks, again.

@GuangchuangYu
Copy link
Member

you can prepare a GO2Gene data frame from org.Hs.eg.db and use enricher documented at https://yulab-smu.github.io/clusterProfiler-book/chapter3.html.

This should work with mclapply.

@grst
Copy link

grst commented Mar 7, 2023

A solution is presented here:

Instead of using the global org.Hs.eg.db, every worker needs to have its own database connection:

db <- AnnotationDbi::loadDb(org.Hs.eg_dbfile())
on.exit(RSQLite::dbDisconnect(dbconn(db)))
enrichGO(..., OrgDB = db, ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants