Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

boost math error during EM iteration: Evaluation of function at pole -nan #48

Closed
warrenmcg opened this issue Mar 15, 2016 · 4 comments
Closed

Comments

@warrenmcg
Copy link

Hi Rob,

I've been running another group's samples (single-end, second-strand protocol), and I have a script that iterates through each sample and runs salmon. I'm running the latest version (0.6.0) with the following arguments: salmon quant -i salmon_index --libType SF -r <(gzip -c -d $IN_FILE) -o $OUTPUT --numBootstraps 100 --useVBOpt --useFSPD --geneMap $GENES --biasCorrect -p 59

During the EM iteration step (soon after the 500th round, when salmon recalculates effective lengths), I get this error:

[jointLog] [info] iteration 500, recomputing effective lengths
[jointLog] [info] iteration = 500 | max rel diff. = 64.1299
Exception : [Error in function boost::math::digamma<long double>(long double): Evaluation of function at pole -nan]
salmon quant was invoked improperly.
For usage information, try salmon quant --help
Exiting.

I can't tell if this is just a regular possible occurrence with the non-deterministic algorithm or if this is never supposed to happen. These particular samples are extremely high depth (about ~170-190M reads per sample), so that might be the cause, but I don't understand enough of how the algorithm works to know how to troubleshoot or to put together a toy dataset that reproduces the error. Rerunning the sample that causes the error often works.

If I can throw in a feature request here, it would be great to be able to set the seed to make the runs deterministic. Is that possible?

@rob-p
Copy link
Collaborator

rob-p commented Mar 15, 2016

Hi @warrenmcg,

Thanks for this bug report. This error is caused by the --useVBOpt argument. The telltale sign that this is the cause is that the exception contains the text Exception : [Error in function boost::math::digamma<long double>(long double): Evaluation of function at pole -nan] salmon quant was invoked improperly. The digamma function is used only in the useVBOpt codepath.

I can say that this behavior is never expected (i.e. it's not the case that due to the stochastic nature of the algorithm we sometimes expect to evaluate the digamma function on an argument of -nan). Actually, even if one could set a randomization seed, he could not make salmon purely deterministic unless it was restricted to a single thread. This is because the order in which reads are observed during the online phase of the algorithm can change, as a result of the multithreaded read parser (which is ok since the absolute order of the reads in the file is, itself, random).

What I suspect is that there is some corner case here where the argument to the digamma function should be checked for validity before the digamma function is called. The key, of course, is to find a small example that will reproduce this behavior so that the bug can be tracked down and fixed. In the meantime, my recommendation would be to remove the --useVBOpt option to Salmon. In this case, it will use the standard (rather than variational Bayesian) EM algorithm, which should yield similarly accurate results.

@warrenmcg
Copy link
Author

Well, I'm glad it was easy to troubleshoot. I'm crunched for a deadline, so I won't be able to devote much time to helping putting together a toy dataset for you; in a few weeks, I'll see what I can do. Thank you very much for your help!

@rob-p
Copy link
Collaborator

rob-p commented Mar 15, 2016

No problem. I'll post back here if I find anything related to this issue in the meantime. Otherwise, you should be just fine running without --useVBOpt (which is actually what I suspect most people do, since the "standard" EM is the default anyway).

@rob-p
Copy link
Collaborator

rob-p commented May 24, 2017

There is now more stringent checking of the input to the digamma function, and so these issues should be resolved in the current release. Please report back (and re-open this) if you still encounter this issue.

@rob-p rob-p closed this as completed May 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants