boost math error during EM iteration: Evaluation of function at pole -nan #48

warrenmcg · 2016-03-15T14:27:43Z

Hi Rob,

I've been running another group's samples (single-end, second-strand protocol), and I have a script that iterates through each sample and runs salmon. I'm running the latest version (0.6.0) with the following arguments: salmon quant -i salmon_index --libType SF -r <(gzip -c -d $IN_FILE) -o $OUTPUT --numBootstraps 100 --useVBOpt --useFSPD --geneMap $GENES --biasCorrect -p 59

During the EM iteration step (soon after the 500th round, when salmon recalculates effective lengths), I get this error:

[jointLog] [info] iteration 500, recomputing effective lengths
[jointLog] [info] iteration = 500 | max rel diff. = 64.1299
Exception : [Error in function boost::math::digamma<long double>(long double): Evaluation of function at pole -nan]
salmon quant was invoked improperly.
For usage information, try salmon quant --help
Exiting.

I can't tell if this is just a regular possible occurrence with the non-deterministic algorithm or if this is never supposed to happen. These particular samples are extremely high depth (about ~170-190M reads per sample), so that might be the cause, but I don't understand enough of how the algorithm works to know how to troubleshoot or to put together a toy dataset that reproduces the error. Rerunning the sample that causes the error often works.

If I can throw in a feature request here, it would be great to be able to set the seed to make the runs deterministic. Is that possible?

The text was updated successfully, but these errors were encountered:

rob-p · 2016-03-15T14:37:34Z

Hi @warrenmcg,

Thanks for this bug report. This error is caused by the --useVBOpt argument. The telltale sign that this is the cause is that the exception contains the text Exception : [Error in function boost::math::digamma<long double>(long double): Evaluation of function at pole -nan] salmon quant was invoked improperly. The digamma function is used only in the useVBOpt codepath.

I can say that this behavior is never expected (i.e. it's not the case that due to the stochastic nature of the algorithm we sometimes expect to evaluate the digamma function on an argument of -nan). Actually, even if one could set a randomization seed, he could not make salmon purely deterministic unless it was restricted to a single thread. This is because the order in which reads are observed during the online phase of the algorithm can change, as a result of the multithreaded read parser (which is ok since the absolute order of the reads in the file is, itself, random).

What I suspect is that there is some corner case here where the argument to the digamma function should be checked for validity before the digamma function is called. The key, of course, is to find a small example that will reproduce this behavior so that the bug can be tracked down and fixed. In the meantime, my recommendation would be to remove the --useVBOpt option to Salmon. In this case, it will use the standard (rather than variational Bayesian) EM algorithm, which should yield similarly accurate results.

warrenmcg · 2016-03-15T14:41:52Z

Well, I'm glad it was easy to troubleshoot. I'm crunched for a deadline, so I won't be able to devote much time to helping putting together a toy dataset for you; in a few weeks, I'll see what I can do. Thank you very much for your help!

rob-p · 2016-03-15T14:43:38Z

No problem. I'll post back here if I find anything related to this issue in the meantime. Otherwise, you should be just fine running without --useVBOpt (which is actually what I suspect most people do, since the "standard" EM is the default anyway).

rob-p · 2017-05-24T04:56:52Z

There is now more stringent checking of the input to the digamma function, and so these issues should be resolved in the current release. Please report back (and re-open this) if you still encounter this issue.

warrenmcg mentioned this issue Mar 17, 2016

NaNs generated for up to 60% of transcripts with --useFSPD and --biasCorrect turned on #50

Closed

rob-p closed this as completed May 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

boost math error during EM iteration: Evaluation of function at pole -nan #48

boost math error during EM iteration: Evaluation of function at pole -nan #48

warrenmcg commented Mar 15, 2016

rob-p commented Mar 15, 2016

warrenmcg commented Mar 15, 2016

rob-p commented Mar 15, 2016

rob-p commented May 24, 2017

boost math error during EM iteration: Evaluation of function at pole -nan #48

boost math error during EM iteration: Evaluation of function at pole -nan #48

Comments

warrenmcg commented Mar 15, 2016

rob-p commented Mar 15, 2016

warrenmcg commented Mar 15, 2016

rob-p commented Mar 15, 2016

rob-p commented May 24, 2017