Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange convergency of model m7.1 #7

Closed
Shmuma opened this issue Sep 7, 2021 · 15 comments
Closed

Strange convergency of model m7.1 #7

Shmuma opened this issue Sep 7, 2021 · 15 comments

Comments

@Shmuma
Copy link

Shmuma commented Sep 7, 2021

Hi Rob!

Recently I sumbled on a strange issue related to the model m7.1 in the book. The problem is log_sigma, which converges to the different distribution when I use Turing versus numpyro/quap.

Turing model converges to this:
image

But R result is different:
image

Very similar to R results are produced by numpyro version of this model (https://fehiepsi.github.io/rethinking-numpyro/07-ulysses-compass.html):
image

The difference is significant enough to produce different LPPD values (my model have almost 2 times lower than reported in the book).

If I enforce the log_sigma close to the book (by changing priors, for example), everything becomes the same. But after tons of experimentation with different samplers I still getting this -1.39 mean :)

As both R and numpyro are the some, I have a feeling that this is some subtle Turing problem, but run out of ideas how it could be :).

As I can see in the comments, you also have "fun" with chapter 7, so, if you have any ideas/suggestions about this issue, they will be very appreciated.

@goedman
Copy link
Member

goedman commented Sep 7, 2021

Hi Max, I'll have a look in about an hour or so. I do seem to recall Turing has always had an issue with some of these wide priors in the past, but will check that. I've also seen some cases where in R a specific seed is used/needed to get the books results.

@Shmuma
Copy link
Author

Shmuma commented Sep 7, 2021

In this case, I don't think this is a "lucky seed" problem, tried several seeds and the resulting posterior for log_sigma is always very different. Even the shape of the postrior is different :). Turing posterior having much fatter right tail. Maybe exponentiation influences it somehow, I don't know.

@Shmuma
Copy link
Author

Shmuma commented Sep 7, 2021

You can find my three versions (Julia, R and numpyro) here: https://github.com/Shmuma/rethinking-2ed-julia/tree/main/ch7-issue

@goedman
Copy link
Member

goedman commented Sep 7, 2021

Hi Max, can you display/check the mu values? E.g. I get with StanSample.jl:

a
0.531499 ± 0.118
bA
0.169704 ± 0.123
sigma
0.169111 ± 0.169

mu.1
0.39922 ± 0.16
mu.2
0.375877 ± 0.172
mu.3
0.360315 ± 0.18
mu.4
0.46925 ± 0.131
mu.5
0.687122 ± 0.154
mu.6
0.772715 ± 0.2
mu.7
0.655998 ± 0.141

Which corresponds to log sigma ~ -1.77 for this run

@Shmuma
Copy link
Author

Shmuma commented Sep 7, 2021

for mu I have this distribution

95B7B0D1-2209-48CC-97F3-184D9334D508

5FCCC78A-EB0D-4466-924A-CE1F243DD8A8

But mu is a function of a and b, and their distribution matches R and numpyro. Only log_sigma differes :)

Could you please share the code? If stan produces -1.7 for log_sigma it reinforces my suspicion of some subtle Turing bug

@goedman
Copy link
Member

goedman commented Sep 8, 2021

Hi Max, I think we're comparing different models and I am worried about the MVNormal in your model above.

I'll compile the models I've tried and send them tomorrow.

@Shmuma
Copy link
Author

Shmuma commented Sep 8, 2021 via email

@Shmuma
Copy link
Author

Shmuma commented Sep 8, 2021

Results of MvNormal replaced with Normal:

image

image

So, result with Turing is still the same weird -1.39 value.

@goedman
Copy link
Member

goedman commented Sep 8, 2021

From R:

m7.1 <- quap(
+     alist(
+         brain_std ~ dnorm( mu , exp(log_sigma) ),
+         mu <- a + b*mass_std,
+         a ~ dnorm( 0.5 , 1 ),
+         b ~ dnorm( 0 , 10 ),
+         log_sigma ~ dnorm( 0 , 1 )
+     ), data=d )

> precis(m7.1)
           mean   sd  5.5% 94.5%
a          0.53 0.07  0.42  0.64
b          0.17 0.07  0.05  0.29
log_sigma -1.71 0.29 -2.18 -1.24

> stancode(m7.1u)
data{
    int N;
    vector[N] mass;
    int brain[N];
    vector[N] brain_std;
    vector[N] mass_std;
}
parameters{
    real a;
    real b;
    real log_sigma;
}
model{
    vector[N] mu;
    log_sigma ~ normal( 0 , 1 );
    b ~ normal( 0 , 10 );
    a ~ normal( 0.5 , 1 );
    for ( i in 1:N ) {
        mu[i] = a + b * mass_std[i];
    }
    brain_std ~ normal( mu , exp(log_sigma) );
}

> precis(m7.1u, depth=2)
           mean   sd  5.5% 94.5% n_eff Rhat4
a          0.53 0.12  0.36  0.70   792     1
b          0.17 0.12 -0.02  0.35   912     1
log_sigma -1.37 0.40 -1.94 -0.68   531     1
> 

A big difference between quap() and ulam() in R.

Using StanSample.jl:

stan7_1a = "
data {
 int < lower = 1 > N; 			// Sample size
 vector[N] brain; 				// Outcome
 vector[N] mass; 				// Predictor
}

parameters {
 real a; // Intercept
 real bA; // Slope (regression coefficients)
 real log_sigma; 
}

model {
  vector[N] mu;               	// mu is a vector
  a ~ normal(0.5, 1);         	//Priors
  bA ~ normal(0, 10);
  log_sigma ~ normal(0, 1);
  mu = a + bA * mass;
  brain ~ normal(mu , exp(log_sigma));   // Likelihood
}
";
q7_1as, m7_1as, o7_1as = stan_quap("m7.1as", stan7_1a; data, init);

If I look at the quap result:

┌───────────┬─────────────────────────────────────────────────────────┐
│     param │    mean     std     5.5%      50%    94.5%    histogram │
├───────────┼─────────────────────────────────────────────────────────┤
│         a │  0.5274  0.1119   0.3496   0.5289   0.7042   ▁▁▁▃▇█▅▂▁▁ │
│         b │  0.1674  0.1236  -0.0292   0.1677   0.3652  ▁▁▁▂▅█▇▃▁▁▁ │
│ log_sigma │ -1.6947  0.4783  -2.4648  -1.6915  -0.9295     ▁▁▅█▆▂▁▁ │
└───────────┴─────────────────────────────────────────────────────────┘

and the StanSample draws:

┌───────────┬────────────────────────────────────────────────────────────┐
│     param │    mean     std     5.5%      50%    94.5%       histogram │
├───────────┼────────────────────────────────────────────────────────────┤
│         a │  0.5293  0.1117   0.3636   0.5304   0.6989  ▁▁▁▁▁▁▂▆█▄▁▁▁▁ │
│         b │  0.1651  0.1222  -0.0238   0.1666   0.3472        ▁▁▁▁█▅▁▁ │
│ log_sigma │ -1.3871  0.3705  -1.9238  -1.4192  -0.7578          ▁██▃▁▁ │
└───────────┴────────────────────────────────────────────────────────────┘
 

A similar difference between quap and proper sampling. So maybe the normality assumption for quadratic approximation is not appropriate?

@goedman
Copy link
Member

goedman commented Sep 8, 2021

From your log-sigma plot it is also clear that this is not normal. I confirmed your result with Turing but haven't yet looked at Turing's quap estimate. Would be interesting showing the density plots superimposed. I do believe there is at least 1 observation that might have a lot of influence.

@goedman
Copy link
Member

goedman commented Sep 8, 2021

Using Turing I get (for proper draws):

julia> include("/Users/rob/Projects/Julia/rethinking-2ed-julia/ch7-issue/ch7.jl");
┌───────┬───────────────────────────────────────────────────────────┐
│ param │    mean     std     5.5%      50%   94.5%       histogram │
├───────┼───────────────────────────────────────────────────────────┤
│     a │   0.528    0.11   0.3646   0.5263   0.696  ▁▁▁▁▁▂▆█▄▁▁▁▁▁ │
│     b │  0.1652  0.1133  -0.0118    0.165  0.3355    ▁▁▁▁▄█▆▂▁▁▁▁ │
│ log_σ │ -1.3932  0.3669  -1.9109  -1.4263  -0.743          ▁██▃▁▁ │
└───────┴───────────────────────────────────────────────────────────┘

Which matches above results. Trying to get MAP() to work in Turing.

@goedman
Copy link
Member

goedman commented Sep 8, 2021

Aaah, got something working:

julia> map7_1_estimate = optimize(model_m7_1(d.mass_std, d.brain_std), MAP())
ModeResult with maximized lp of -3.65
3-element Named Vector{Float64}
A      │ 
───────┼─────────
:a     │ 0.528543
:b     │ 0.167109
:log_σ │ -1.70671

which is close to above stan_quap() estimates. For it to work it needs using Optim.

@Shmuma
Copy link
Author

Shmuma commented Sep 10, 2021

@goedman Any ideas how to dissect it further?

So, Stan and Turing are converge to -1.39 but quap and numpyro are giving -1.70.

In the data there is one outlier point, going to try to remove it and check the convergence.
In addition, going to do a brute-force on seed value on Turing. Maybe some seed values will converge to -1.70.

Besides this simple experiments, I'm totally stuck :(. So, next logical step will be to open the issue in Turing.jl and check what developers are thinking about this

@goedman
Copy link
Member

goedman commented Sep 10, 2021

Don't know numpyro but I would be surprised if the -1.70 is not numpyro's estimate of the MAP, not the mean of the mcmc draws.

The quap (MAP) estimate is often different from the mcmc mean and in this case both the Stan (in Julia and in R) and the Turing (very different formulated) models give the exact same optimized MAP estimate using Optim (and R's equivalent optimizer).

This is maybe a very good example why you don't always want Normal priors (where MAP and MLE are good estimators). Particularly for sigma or later on covariance matrices.

No seed value will make the 2 values converge. Now if you would use a dataset (or manipulate this particular dataset by dropping observations - never a good idea) to data that is more Normal-ly correlated the 2 values will get closer.

That's how I explain this result.

@Shmuma
Copy link
Author

Shmuma commented Sep 11, 2021

Thanks a lot for the explanation!

@Shmuma Shmuma closed this as completed Sep 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants