-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can Spades be made deterministic? #111
Comments
Could you please report the cases when SPAdes output is non-deterministic (e.g. input / parameters and the observed behavior)? SPAdes is designed to be deterministic modulo the # of thread option (so, the output for 8 threads could be different compared to the output of SPAdes using 16 threads) and our tests shows that it is indeed so. |
@asl You are correct. Spades indeed appears to be deterministic. I must have incorrectly assigned to Spades some other source of randomness in my overall workflow. I have now ran the script below, and got back identical md5 sums from all 40 invocations.
|
Great! Note that there is no need to reorder the contigs. The output is deterministic as well :) |
I am using Spades in a clinical strain surveillance application. It would be highly desirable to be able to generate exactly the same outputs from the same inputs and parameters every time that I run Spades. I need a complete "repeatability".
In my tests, this does not seem to be the case.
Would you have any pointers on how much work it would take to add a
--deterministic
switch?If I implemented this, would you be interested in a pull request?
Quick search for
srand
in v.3.11.0 code suggests this:srand(48)
)ext
directory - not clear yet, probably not deterministic. Might have to change both how they are called with the seed argument, as well as the code where identical reads are randomly placed.nlopt
is definitely not. I can see this linenlopt_srand_time_default(); /* default is non-deterministic */
. They want to have different seeds in different threads, and that function generates the seed by combining time with the thread ID.--deterministic
mode. Otherwise it might take too much work to implement the fixed ordering of work among the threads.The text was updated successfully, but these errors were encountered: