Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filterAndTrim error messages when multithreading #273

Closed
fanli-gcb opened this issue Jun 27, 2017 · 8 comments
Closed

filterAndTrim error messages when multithreading #273

fanli-gcb opened this issue Jun 27, 2017 · 8 comments
Milestone

Comments

@fanli-gcb
Copy link
Contributor

I've noticed in some of my automated runs that filterAndTrim was crashing with this error:

filterAndTrim(fnFs, filtFs, fnRs, filtRs, truncLen=truncation_lengths[[run]], maxN=0, maxEE=c(2,2), truncQ=2, rm.phix=TRUE,compress=TRUE, multithread=T, matchIDs=TRUE, verbose=TRUE)

Error in mcmapply(fastqPairedFilter, mapply(c, fwd, rev, SIMPLIFY = FALSE),  : 
  'names' attribute [28] must be the same length as the vector [22]

I believe this has to do with the multithreading - specifically if enough threads are requested such that there is insufficient memory on the machine, some of them die silently and lead to the mcmapply error above. This easily remedied by setting multithread=ncores where ncores is something reasonable (16 in my case). Just thought I'd share in case anyone else has run into this as the error message is not informative.

I don't see any good place to catch this (as it seems internal to mcmapply) but perhaps a note in the tutorial or FAQs/troubleshooting might be helpful.

@benjjneb
Copy link
Owner

Agreed, we should add the potential memory issues to the documentation.

The error reporting out of multithreaded filterAndTrim is a larger problem than that though, as none of the errors in multithreaded mode are reported out in intelligible fashion to the user. This needs a better solution.

@benjjneb benjjneb added this to the 1.6 milestone Jun 27, 2017
@benjjneb benjjneb changed the title filterAndTrim error when multithreading filterAndTrim error messages when multithreading Jun 27, 2017
@joey711
Copy link
Collaborator

joey711 commented Jun 27, 2017

A good solution is an informative warning/stop up-front so that the users is (at least somewhat) protected from the waisted time/resources of a run that will hit a memory fail.

One possibility might be to check available memory up-front with some rules-of-thumb against the sequence file size. In some of our earlier work we found a decent correlation between compressed file size and peak memory required. This could be operationalized into a formal prediction with some reasonable default margin, knowing that the prediction isn't perfect.

Better fail messages are also helpful, of course.

@xinbaiusc
Copy link

I also have a similar error occurred when running filterAndTrim:
Error in mcmapply(fastqPairedFilter, mapply(c, fwd, rev, SIMPLIFY = FALSE), :
'mc.cores' > 1 is not supported on Windows

Do anyone of you know how it occurs and how to fix it? I just set multithread=FALSE

@fanli-gcb
Copy link
Contributor Author

Windows doesn't support forking:
https://stat.ethz.ch/R-manual/R-devel/library/parallel/html/mclapply.html

@xinbaiusc
Copy link

So does "Windows doesn't support forking" means I should just set multithread=FALSE? It is the only way I make my code run successfully.

@benjjneb
Copy link
Owner

@xinbaiusc For now you just have to do as you've done, set multithread=FALSE on windows. As @fanli-gcb pointed out, the filtering multithreading relies on mclapply which doesn't work on windows (the multithreading in other commands does not use mclapply and is cross-platform).

As a reminder to myself: This needs to be added to the documentation and tutorial 1.4, and for 1.6 a graceful fail-back on windows should be implemented.

@vnsriniv
Copy link

@benjjneb I just started using dada2 on a Windows system and would like to apply multithreading. However since forking is not supported in windows, I was wondering if anyone has thoughts on using the foreach package to multithread?

@benjjneb
Copy link
Owner

Error messages from the individual cores on which errors were encountered are now propagated to standard output by filterAndTrim(..., multithread=TRUE): 4709085

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants