New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange convergency of model m7.1 #7
Comments
Hi Max, I'll have a look in about an hour or so. I do seem to recall Turing has always had an issue with some of these wide priors in the past, but will check that. I've also seen some cases where in R a specific seed is used/needed to get the books results. |
In this case, I don't think this is a "lucky seed" problem, tried several seeds and the resulting posterior for log_sigma is always very different. Even the shape of the postrior is different :). Turing posterior having much fatter right tail. Maybe exponentiation influences it somehow, I don't know. |
You can find my three versions (Julia, R and numpyro) here: https://github.com/Shmuma/rethinking-2ed-julia/tree/main/ch7-issue |
Hi Max, can you display/check the mu values? E.g. I get with StanSample.jl:
Which corresponds to log sigma ~ -1.77 for this run |
Hi Max, I think we're comparing different models and I am worried about the MVNormal in your model above. I'll compile the models I've tried and send them tomorrow. |
Yes, sure, no rush.
In terms of the model, I’m checking model m7.1 from code piece 7.3 (chapter
7).
I’m using MvNormal because during the model fitting we have several values
of mu to be fitted to the target values. Alternative will be to iterate
over input batch and use `y[i] ~ Normal(mu[i], exp(log_sigma))` in the
loop. I checked both approaches on several models in prior chapters and
they produced the same results.
But thanks for suggestion, I’ll check this on this model.
Ср, 8 сент. 2021 г. в 02:21, Rob J Goedman ***@***.***>:
Hi Max, I think we're comparing different models and I am worried about
the MVNormal in your model above.
I'll compile the models I've tried and send them tomorrow.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAQE2WJHU6IVZWL55NTKO3UA2UAZANCNFSM5DSSKUXA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
wbr, Max Lapan
|
From R:
A big difference between quap() and ulam() in R. Using StanSample.jl:
If I look at the quap result:
and the StanSample draws:
A similar difference between quap and proper sampling. So maybe the normality assumption for quadratic approximation is not appropriate? |
From your log-sigma plot it is also clear that this is not normal. I confirmed your result with Turing but haven't yet looked at Turing's quap estimate. Would be interesting showing the density plots superimposed. I do believe there is at least 1 observation that might have a lot of influence. |
Using Turing I get (for proper draws):
Which matches above results. Trying to get MAP() to work in Turing. |
Aaah, got something working:
which is close to above stan_quap() estimates. For it to work it needs |
@goedman Any ideas how to dissect it further? So, Stan and Turing are converge to -1.39 but quap and numpyro are giving -1.70. In the data there is one outlier point, going to try to remove it and check the convergence. Besides this simple experiments, I'm totally stuck :(. So, next logical step will be to open the issue in Turing.jl and check what developers are thinking about this |
Don't know numpyro but I would be surprised if the -1.70 is not numpyro's estimate of the MAP, not the mean of the mcmc draws. The quap (MAP) estimate is often different from the mcmc mean and in this case both the Stan (in Julia and in R) and the Turing (very different formulated) models give the exact same optimized MAP estimate using Optim (and R's equivalent optimizer). This is maybe a very good example why you don't always want Normal priors (where MAP and MLE are good estimators). Particularly for sigma or later on covariance matrices. No seed value will make the 2 values converge. Now if you would use a dataset (or manipulate this particular dataset by dropping observations - never a good idea) to data that is more Normal-ly correlated the 2 values will get closer. That's how I explain this result. |
Thanks a lot for the explanation! |
Hi Rob!
Recently I sumbled on a strange issue related to the model m7.1 in the book. The problem is log_sigma, which converges to the different distribution when I use Turing versus numpyro/quap.
Turing model converges to this:
But R result is different:
Very similar to R results are produced by numpyro version of this model (https://fehiepsi.github.io/rethinking-numpyro/07-ulysses-compass.html):
The difference is significant enough to produce different LPPD values (my model have almost 2 times lower than reported in the book).
If I enforce the log_sigma close to the book (by changing priors, for example), everything becomes the same. But after tons of experimentation with different samplers I still getting this -1.39 mean :)
As both R and numpyro are the some, I have a feeling that this is some subtle Turing problem, but run out of ideas how it could be :).
As I can see in the comments, you also have "fun" with chapter 7, so, if you have any ideas/suggestions about this issue, they will be very appreciated.
The text was updated successfully, but these errors were encountered: