Strange convergency of model m7.1 #7

Shmuma · 2021-09-07T14:53:21Z

Hi Rob!

Recently I sumbled on a strange issue related to the model m7.1 in the book. The problem is log_sigma, which converges to the different distribution when I use Turing versus numpyro/quap.

Turing model converges to this:

But R result is different:

Very similar to R results are produced by numpyro version of this model (https://fehiepsi.github.io/rethinking-numpyro/07-ulysses-compass.html):

The difference is significant enough to produce different LPPD values (my model have almost 2 times lower than reported in the book).

If I enforce the log_sigma close to the book (by changing priors, for example), everything becomes the same. But after tons of experimentation with different samplers I still getting this -1.39 mean :)

As both R and numpyro are the some, I have a feeling that this is some subtle Turing problem, but run out of ideas how it could be :).

As I can see in the comments, you also have "fun" with chapter 7, so, if you have any ideas/suggestions about this issue, they will be very appreciated.

goedman · 2021-09-07T16:15:27Z

Hi Max, I'll have a look in about an hour or so. I do seem to recall Turing has always had an issue with some of these wide priors in the past, but will check that. I've also seen some cases where in R a specific seed is used/needed to get the books results.

Shmuma · 2021-09-07T16:43:24Z

In this case, I don't think this is a "lucky seed" problem, tried several seeds and the resulting posterior for log_sigma is always very different. Even the shape of the postrior is different :). Turing posterior having much fatter right tail. Maybe exponentiation influences it somehow, I don't know.

Shmuma · 2021-09-07T16:47:37Z

You can find my three versions (Julia, R and numpyro) here: https://github.com/Shmuma/rethinking-2ed-julia/tree/main/ch7-issue

goedman · 2021-09-07T18:14:38Z

Hi Max, can you display/check the mu values? E.g. I get with StanSample.jl:

a
0.531499 ± 0.118
bA
0.169704 ± 0.123
sigma
0.169111 ± 0.169

mu.1
0.39922 ± 0.16
mu.2
0.375877 ± 0.172
mu.3
0.360315 ± 0.18
mu.4
0.46925 ± 0.131
mu.5
0.687122 ± 0.154
mu.6
0.772715 ± 0.2
mu.7
0.655998 ± 0.141

Which corresponds to log sigma ~ -1.77 for this run

Shmuma · 2021-09-07T19:18:55Z

for mu I have this distribution

But mu is a function of a and b, and their distribution matches R and numpyro. Only log_sigma differes :)

Could you please share the code? If stan produces -1.7 for log_sigma it reinforces my suspicion of some subtle Turing bug

goedman · 2021-09-08T00:21:21Z

Hi Max, I think we're comparing different models and I am worried about the MVNormal in your model above.

I'll compile the models I've tried and send them tomorrow.

Shmuma · 2021-09-08T06:38:14Z

Yes, sure, no rush. In terms of the model, I’m checking model m7.1 from code piece 7.3 (chapter 7). I’m using MvNormal because during the model fitting we have several values of mu to be fitted to the target values. Alternative will be to iterate over input batch and use `y[i] ~ Normal(mu[i], exp(log_sigma))` in the loop. I checked both approaches on several models in prior chapters and they produced the same results. But thanks for suggestion, I’ll check this on this model. Ср, 8 сент. 2021 г. в 02:21, Rob J Goedman ***@***.***>:

Hi Max, I think we're comparing different models and I am worried about the MVNormal in your model above. I'll compile the models I've tried and send them tomorrow. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAQE2WJHU6IVZWL55NTKO3UA2UAZANCNFSM5DSSKUXA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- wbr, Max Lapan

Shmuma · 2021-09-08T12:35:07Z

Results of MvNormal replaced with Normal:

So, result with Turing is still the same weird -1.39 value.

goedman · 2021-09-08T13:07:15Z

From R:

m7.1 <- quap(
+     alist(
+         brain_std ~ dnorm( mu , exp(log_sigma) ),
+         mu <- a + b*mass_std,
+         a ~ dnorm( 0.5 , 1 ),
+         b ~ dnorm( 0 , 10 ),
+         log_sigma ~ dnorm( 0 , 1 )
+     ), data=d )

> precis(m7.1)
           mean   sd  5.5% 94.5%
a          0.53 0.07  0.42  0.64
b          0.17 0.07  0.05  0.29
log_sigma -1.71 0.29 -2.18 -1.24

> stancode(m7.1u)
data{
    int N;
    vector[N] mass;
    int brain[N];
    vector[N] brain_std;
    vector[N] mass_std;
}
parameters{
    real a;
    real b;
    real log_sigma;
}
model{
    vector[N] mu;
    log_sigma ~ normal( 0 , 1 );
    b ~ normal( 0 , 10 );
    a ~ normal( 0.5 , 1 );
    for ( i in 1:N ) {
        mu[i] = a + b * mass_std[i];
    }
    brain_std ~ normal( mu , exp(log_sigma) );
}

> precis(m7.1u, depth=2)
           mean   sd  5.5% 94.5% n_eff Rhat4
a          0.53 0.12  0.36  0.70   792     1
b          0.17 0.12 -0.02  0.35   912     1
log_sigma -1.37 0.40 -1.94 -0.68   531     1
>

A big difference between quap() and ulam() in R.

Using StanSample.jl:

stan7_1a = "
data {
 int < lower = 1 > N; 			// Sample size
 vector[N] brain; 				// Outcome
 vector[N] mass; 				// Predictor
}

parameters {
 real a; // Intercept
 real bA; // Slope (regression coefficients)
 real log_sigma; 
}

model {
  vector[N] mu;               	// mu is a vector
  a ~ normal(0.5, 1);         	//Priors
  bA ~ normal(0, 10);
  log_sigma ~ normal(0, 1);
  mu = a + bA * mass;
  brain ~ normal(mu , exp(log_sigma));   // Likelihood
}
";
q7_1as, m7_1as, o7_1as = stan_quap("m7.1as", stan7_1a; data, init);

If I look at the quap result:

┌───────────┬─────────────────────────────────────────────────────────┐
│     param │    mean     std     5.5%      50%    94.5%    histogram │
├───────────┼─────────────────────────────────────────────────────────┤
│         a │  0.5274  0.1119   0.3496   0.5289   0.7042   ▁▁▁▃▇█▅▂▁▁ │
│         b │  0.1674  0.1236  -0.0292   0.1677   0.3652  ▁▁▁▂▅█▇▃▁▁▁ │
│ log_sigma │ -1.6947  0.4783  -2.4648  -1.6915  -0.9295     ▁▁▅█▆▂▁▁ │
└───────────┴─────────────────────────────────────────────────────────┘

and the StanSample draws:

┌───────────┬────────────────────────────────────────────────────────────┐
│     param │    mean     std     5.5%      50%    94.5%       histogram │
├───────────┼────────────────────────────────────────────────────────────┤
│         a │  0.5293  0.1117   0.3636   0.5304   0.6989  ▁▁▁▁▁▁▂▆█▄▁▁▁▁ │
│         b │  0.1651  0.1222  -0.0238   0.1666   0.3472        ▁▁▁▁█▅▁▁ │
│ log_sigma │ -1.3871  0.3705  -1.9238  -1.4192  -0.7578          ▁██▃▁▁ │
└───────────┴────────────────────────────────────────────────────────────┘

A similar difference between quap and proper sampling. So maybe the normality assumption for quadratic approximation is not appropriate?

goedman · 2021-09-08T14:57:31Z

From your log-sigma plot it is also clear that this is not normal. I confirmed your result with Turing but haven't yet looked at Turing's quap estimate. Would be interesting showing the density plots superimposed. I do believe there is at least 1 observation that might have a lot of influence.

goedman · 2021-09-08T16:57:31Z

Using Turing I get (for proper draws):

julia> include("/Users/rob/Projects/Julia/rethinking-2ed-julia/ch7-issue/ch7.jl");
┌───────┬───────────────────────────────────────────────────────────┐
│ param │    mean     std     5.5%      50%   94.5%       histogram │
├───────┼───────────────────────────────────────────────────────────┤
│     a │   0.528    0.11   0.3646   0.5263   0.696  ▁▁▁▁▁▂▆█▄▁▁▁▁▁ │
│     b │  0.1652  0.1133  -0.0118    0.165  0.3355    ▁▁▁▁▄█▆▂▁▁▁▁ │
│ log_σ │ -1.3932  0.3669  -1.9109  -1.4263  -0.743          ▁██▃▁▁ │
└───────┴───────────────────────────────────────────────────────────┘

Which matches above results. Trying to get MAP() to work in Turing.

goedman · 2021-09-08T17:15:19Z

Aaah, got something working:

julia> map7_1_estimate = optimize(model_m7_1(d.mass_std, d.brain_std), MAP())
ModeResult with maximized lp of -3.65
3-element Named Vector{Float64}
A      │ 
───────┼─────────
:a     │ 0.528543
:b     │ 0.167109
:log_σ │ -1.70671

which is close to above stan_quap() estimates. For it to work it needs using Optim.

Shmuma · 2021-09-10T18:31:08Z

@goedman Any ideas how to dissect it further?

So, Stan and Turing are converge to -1.39 but quap and numpyro are giving -1.70.

In the data there is one outlier point, going to try to remove it and check the convergence.
In addition, going to do a brute-force on seed value on Turing. Maybe some seed values will converge to -1.70.

Besides this simple experiments, I'm totally stuck :(. So, next logical step will be to open the issue in Turing.jl and check what developers are thinking about this

goedman · 2021-09-10T19:31:13Z

Don't know numpyro but I would be surprised if the -1.70 is not numpyro's estimate of the MAP, not the mean of the mcmc draws.

The quap (MAP) estimate is often different from the mcmc mean and in this case both the Stan (in Julia and in R) and the Turing (very different formulated) models give the exact same optimized MAP estimate using Optim (and R's equivalent optimizer).

This is maybe a very good example why you don't always want Normal priors (where MAP and MLE are good estimators). Particularly for sigma or later on covariance matrices.

No seed value will make the 2 values converge. Now if you would use a dataset (or manipulate this particular dataset by dropping observations - never a good idea) to data that is more Normal-ly correlated the 2 values will get closer.

That's how I explain this result.

Shmuma · 2021-09-11T11:20:08Z

Thanks a lot for the explanation!

Shmuma closed this as completed Sep 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange convergency of model m7.1 #7

Strange convergency of model m7.1 #7

Shmuma commented Sep 7, 2021

goedman commented Sep 7, 2021

Shmuma commented Sep 7, 2021

Shmuma commented Sep 7, 2021

goedman commented Sep 7, 2021 •

edited

Shmuma commented Sep 7, 2021 •

edited

goedman commented Sep 8, 2021

Shmuma commented Sep 8, 2021 via email

Shmuma commented Sep 8, 2021

goedman commented Sep 8, 2021 •

edited

goedman commented Sep 8, 2021

goedman commented Sep 8, 2021

goedman commented Sep 8, 2021 •

edited

Shmuma commented Sep 10, 2021

goedman commented Sep 10, 2021

Shmuma commented Sep 11, 2021

Strange convergency of model m7.1 #7

Strange convergency of model m7.1 #7

Comments

Shmuma commented Sep 7, 2021

goedman commented Sep 7, 2021

Shmuma commented Sep 7, 2021

Shmuma commented Sep 7, 2021

goedman commented Sep 7, 2021 • edited

Shmuma commented Sep 7, 2021 • edited

goedman commented Sep 8, 2021

Shmuma commented Sep 8, 2021 via email

Shmuma commented Sep 8, 2021

goedman commented Sep 8, 2021 • edited

goedman commented Sep 8, 2021

goedman commented Sep 8, 2021

goedman commented Sep 8, 2021 • edited

Shmuma commented Sep 10, 2021

goedman commented Sep 10, 2021

Shmuma commented Sep 11, 2021

goedman commented Sep 7, 2021 •

edited

Shmuma commented Sep 7, 2021 •

edited

goedman commented Sep 8, 2021 •

edited

goedman commented Sep 8, 2021 •

edited