Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skygrid gets stuck #1030

Closed
maxbiostat opened this issue Sep 26, 2018 · 20 comments
Closed

Skygrid gets stuck #1030

maxbiostat opened this issue Sep 26, 2018 · 20 comments
Labels
Milestone

Comments

@maxbiostat
Copy link
Contributor

If one runs this XML with -seed 1 (or any other seed for that matter) the logPopSizes will get stuck pretty much instantly. Have tried with more gridpoints, makes no difference, so I chose to use fewer to make it easier to debug. I also have recompiled BEAST with FAIL_SILENTLY = false in GMRFSkyrideBlockUpdateOperator but the problem doesn't seem to be the well-known Newton-Raphson problem.

If anyone has any insight I'd love to hear it. Better yet if you can find something obvious and stupid I've messed up in the XML. This is simulated data on a simulated tree (constant Ne = 1) using dates from a real data set. Another data set with a different tree and the same dates gives the same "getting stuck" problem.

@carolynzy
Copy link

Any luck you there? I have the same problem here. It seems that the data structure is the problem. Because with the same environment and parameters, with another slightly different dataset, the problem dissappeared. I only have the problem with this specific dataset.
I'm using Beast 1.10.1.

@maxbiostat
Copy link
Contributor Author

maxbiostat commented May 30, 2019

@carolynzy , there's a temporary workaround. Go to the XML and replace

		<gmrfGridBlockUpdateOperator scaleFactor="1.0" weight="2">
			<gmrfSkyrideLikelihood idref="skygrid"/>
		</gmrfGridBlockUpdateOperator>
		<scaleOperator scaleFactor="0.75" weight="1">
			<parameter idref="skygrid.precision"/>
		</scaleOperator>

with

		<adaptableVarianceMultivariateNormalOperator weight="3.0" scaleFactor="1.0" initial="1200" burnin="600" beta="0.05" coefficient="1.0" autoOptimize="true" formXtXInverse="false">
			<transform type="none">
				<parameter idref="skygrid.logPopSize"/>
			</transform>
			<transform type="log">
				<parameter idref="skygrid.precision"/>
			</transform>
		</adaptableVarianceMultivariateNormalOperator>

@carolynzy
Copy link

@maxbiostat Thanks for the hint!
I have just tried to work around with the priors. So it seems the priors I used were not optimized. I changed the "alpha" and "treeModel rootHeight" based on experience. , and changed "gmrfGibbsOperator" and "skygrid.precision". After doing this, I ran through a test of 1 mil iterations without stuck. I guess the trick is to find the best priors. Because if I just change the operators it didn't work. But not 100% sure. Have you tried this?

@maxbiostat
Copy link
Contributor Author

@carolynzy,
If you just do the change I suggested above, do you still get the "getting stuck" problem?

@carolynzy
Copy link

@maxbiostat the problem solved after changing the priors.

@maxbiostat
Copy link
Contributor Author

@carolynzy Changing the priors should be done only for statistical reasons, never for computational ones. I suggest you change only the operators and see if solves the problem.

@carolynzy
Copy link

@maxbiostat I have a question, is this problem a computational one? Or it might a statistical one. Because I think this problem is caused by the constante rejection of the sampled probability distribution. Am I right? So, it could be the prior distribution doesn't fit the real data. Is that possible? I'm naive to this, please correct me if I'm wrong.
I will try your suggestion and get back later.

@maxbiostat
Copy link
Contributor Author

@carolynzy , it is most certainly a computational problem. The default priors, while not perfect, are designed to be reasonable for the vast majority of applications.

@GuyBaele
Copy link
Contributor

I agree with @maxbiostat and do not support this kind of prior tweaking.
When using the adaptive operator, I would suggest not to include the precision parameter and keep using a scaleOperator to estimate it.

@mdhall272
Copy link
Contributor

Did anyone ever work out exactly why the block operator experiences this behaviour? While obviously adjusting the priors shouldn't be done to prevent it, it seems potentially instructive if the priors determine whether it happens.

@maxbiostat
Copy link
Contributor Author

Hi, @mdhall272 , AFAIK @mandevgill is working out exactly what's going on. If you're interested in other priors for the precision, I had a play around with them here. The PC prior does seem to lead to better behaved chains, mainly because (I think) it demarcates implausible values more strongly.

@carolynzy
Copy link

@maxbiostat Your suggestion improved the stuck problem. Now it could run through millions of interations withou stuck. But the coverge takes really a long time. It's still running at the moment.
However, I don't think it's inapproriate to change the priors, as you can see from this tutorial, https://github.com/taming-the-beast/Prior-selection
So if I'm confident in the prior distribution, I think select a proper prior could improve the mcmc process.

@mdhall272
Copy link
Contributor

Hi, @mdhall272 , AFAIK @mandevgill is working out exactly what's going on. If you're interested in other priors for the precision, I had a play around with them here. The PC prior does seem to lead to better behaved chains, mainly because (I think) it demarcates implausible values more strongly.

To be honest I'm too far removed from the problem these days to be of much help - I did wonder if a fix (apart from using a simpler move) was close though.

@carolynzy
Copy link

Hi, @maxbiostat , while I'm still running on your suggestions, I found something strange. I uploaded the log file here,
mdr_filterinv_testMaxbiostat_skygrid_cutoff70_2.72e-5.log
When opening it in Trace, you could notice the posterior and likelihood trace plot were cut in the middle, one has smaller average and the other has bigger average. And the ESS for these two parameters were only 4, while others have reached dozens to hundreds. What could be the problem here? Is this normal?

@maxbiostat
Copy link
Contributor Author

@carolynzy ,

First, here is a picture of the trace plot for context to others:

image

I don't know exactly what the problem is and it's hard to tell without the XML. What I can say is that the Markov chain has most likely jumped to a new mode. It is certainly not normal, in the sense that ideally BEAST should be able to quickly traverse tree space and reach higher modes. In real life, however, this doesn't always happen. What I can recommend is that you run multiple independent chains and observe whether the chains ultimately reach a particular mode. If they do, you can set the appropriate burn-in and use the samples only from that region.

@babarlelephant
Copy link

babarlelephant commented Jun 17, 2022

Same problem here with a small alignment and dimension = 50 and the solution #1030 (comment) above worked,
the skygrid.logpopsize1 (beast 1.10.4 and 1.10.5) is now updated by the MCMC whereas it was staying constant with the beauty generated xml.

@GuyBaele
Copy link
Contributor

What about using Hamiltonian Monte Carlo? Does the problem still persist?
https://wellcomeopenresearch.org/articles/5-53

@rambaut rambaut added the bug label Apr 27, 2023
@rambaut rambaut added this to the v.1.10.5 milestone Apr 27, 2023
@rambaut
Copy link
Contributor

rambaut commented Apr 27, 2023

Need to pick a solution for the next release. Either adaptableVarianceMultivariateNormalOperator or Hamiltonian.

@GuyBaele
Copy link
Contributor

Most certainly Hamiltonian for the skygrid.

@rambaut
Copy link
Contributor

rambaut commented Apr 27, 2024

Hamiltonian MC is implemented in BEAUTi so closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants