"Error in result_fetch(res@ptr, n = n) : database disk image is malformed\n" in parallel computing #207

luyoutao · 2019-05-23T15:13:31Z

Dear author,

Thanks for developing such a useful tool! I have trouble with clusterProfiler when used in parallel. For example,

universe <- keys(org.Hs.eg.db, keytype = "SYMBOL")
res <- mclapply(1:100, enrichGO(
    gene = sample(universe, 1000), 
    OrgDb = org.Hs.eg.db, 
    keyType = "SYMBOL", 
    ont = "BP", 
    pvalueCutoff = 1, 
    qvalueCutoff = 1, 
    universe = universe), 
mc.core = 32)

Some returned value could be

[1] "Error in result_fetch(res@ptr, n = n) : database disk image is malformed\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<Rcpp::exception in result_fetch(res@ptr, n = n): database disk image is malformed>

This issue happens sporadically hence hard to trace or reproduce.

Searching the website, I find the problem seem to be specific to SQLite backend packages used in multi-process routines. Instances include:

Possible cause could be that SQLite server fails to handle excess queries during a short time, which is a typical condition when mc.cores is large. Therefore I don't think this is a bug in clusterProfiler. However, I want to know if there is any possible solution to it.

Thanks,

The text was updated successfully, but these errors were encountered:

luyoutao · 2019-05-24T00:12:06Z

PS. since SQLite is not a client–server database, my hypothesis might be wrong. Anyway, I'm curious about the feasibility of doing enrichGO with large number of forked processes in parallel. Thanks.

luyoutao · 2019-05-24T18:03:37Z

PS2. One thing might be noteworthy is that, once mclapply is replaced by lapply and enrichGO runs for once, mclapply seems to stop producing such error later in the same R session. In other words, it looks like enrichGO cannot work well with mclapply in a "cold" run.

Finesim97 · 2019-05-29T06:29:25Z

Thank you so much for this information, I was planning on parallelizing the compareCluster llpply call. enrichGO seems to cache the "GO_DATA" in a environment object, but I wasn't sure whether it survives the forking process, but If running a analysis before the parallel call fixes the database inconsistency error, it does (or maybe the caching of the sqlite database file happens somewhere else, I don't know). Thanks, again.

GuangchuangYu · 2019-05-29T07:06:56Z

you can prepare a GO2Gene data frame from org.Hs.eg.db and use enricher documented at https://yulab-smu.github.io/clusterProfiler-book/chapter3.html.

This should work with mclapply.

grst · 2023-03-07T09:09:05Z

A solution is presented here:

Instead of using the global org.Hs.eg.db, every worker needs to have its own database connection:

db <- AnnotationDbi::loadDb(org.Hs.eg_dbfile())
on.exit(RSQLite::dbDisconnect(dbconn(db)))
enrichGO(..., OrgDB = db, ...)

luyoutao closed this as completed Oct 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Error in result_fetch(res@ptr, n = n) : database disk image is malformed\n" in parallel computing #207

"Error in result_fetch(res@ptr, n = n) : database disk image is malformed\n" in parallel computing #207

luyoutao commented May 23, 2019

luyoutao commented May 24, 2019

luyoutao commented May 24, 2019

Finesim97 commented May 29, 2019

GuangchuangYu commented May 29, 2019

grst commented Mar 7, 2023

"Error in result_fetch(res@ptr, n = n) : database disk image is malformed\n" in parallel computing #207

"Error in result_fetch(res@ptr, n = n) : database disk image is malformed\n" in parallel computing #207

Comments

luyoutao commented May 23, 2019

luyoutao commented May 24, 2019

luyoutao commented May 24, 2019

Finesim97 commented May 29, 2019

GuangchuangYu commented May 29, 2019

grst commented Mar 7, 2023