Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for emcee in arviz #279

Closed
ColCarroll opened this issue Sep 29, 2018 · 13 comments
Closed

Added support for emcee in arviz #279

ColCarroll opened this issue Sep 29, 2018 · 13 comments

Comments

@ColCarroll
Copy link
Contributor

Wanted to let you know we have added support for emcee to the arviz library - see arviz-devs/arviz#302 for a description of the change.

Roughly, the plan is for arviz to replace plotting and diagnostics for pymc3 (so we do not have to rewrite it for pymc4), by converting data to a common data format (netCDF/xarray datasets), and then implementing all the plots and diagnostics on that object. emcee is the third library to have support (pymc3 and pystan), after hearing so many astronomers saying nice things about it at a conference.

In addition to the plots you can see in the pull request above, you can also calculate things like the effective sample size and the gelman rubin statistic. The summary function below shows a number of these.

Some questions I had -

  • we store sample_stats from pystan and pymc3. Is there a way to access whether a sample was accepted at a per-draw basis in emcee?
  • Are there particular diagnostics or plots you would like to see?
import arviz as az
from arviz.tests import helpers

sampler = helpers.emcee_linear_model(None, 500, 10)  # this is the sampler from http://dfm.io/emcee/current/user/line/

# This converter turns the data into a netCDF object that can be serialized and shared
data = az.from_emcee(sampler, var_names=['ln(f)', 'b', 'm'])

# all arviz functions should work with this object
az.summary(data).to_dataframe()
ln(f) b m
metric
mean -0.901523 4.186736 -0.493168
standard deviation 0.083640 0.387793 0.146651
mc error 0.002052 0.008238 0.003727
hpd 3.00% -1.061138 3.448212 -0.759357
hpd 97.00% -0.742526 4.923531 -0.201992
effective samples 1429.248983 1408.863122 1287.373414
gelman-rubin statistic 1.030000 1.030000 1.040000
@dfm
Copy link
Owner

dfm commented Oct 1, 2018

Hey Colin, thanks for this!

We don't currently store per-draw stats for emcee, but I would like to include it at some point. It probably won't make it into v3.0, but hopefully soon! I'll keep you posted.

The plot that I always struggle to make with pymc3 in a consistent way is a nice corner plot (like your pair_plot but using corner.py). I haven't looked at arviz in detail, but an interface that could work easily with that would be awesome. (The problem occurs when some of the variables have non-scalar shapes - something that you must have an interface to deal with anyways!)

Finally - it's worth noting that the chains produced by emcee are not independent so some care must be taken when doing convergence analysis. I don't think that this is necessarily a big problem, but we might want to include some sort of disclaimer so that people don't take the G-R statistic/n_eff without the proper caution.

@Gabriel-p
Copy link

Quick question @dfm : I'm currently using PyMC3's effective_n function to estimate the effective sample size for samplers of both emcee and ptemcee. Is this not allowed? Or is it a reasonable estimate with some caveats?

@dfm
Copy link
Owner

dfm commented Oct 1, 2018

I haven't tested it thoroughly, but the assumptions are not satisfied. I'd recommend checking out this blog post for more discussion and recommended methods for ensemble methods: https://dfm.io/posts/autocorr/

@Gabriel-p
Copy link

Thank you Dan. If I understand what is stated in that post then I should simply estimate the effective sample size as N/tau, where tau must be obtained with emcee's (or ptemcee's) own autocorr functions. Is this correct?

@ColCarroll
Copy link
Contributor Author

Closing since this was just informative (though thanks for pointing out the effective_n calculation is wrong for this sampler!)

@OriolAbril
Copy link
Contributor

Hey @dfm, I have been doing some experiments (posted on my blog) comparing autocorrelation/ess values using the algorithm in ArviZ and the one in emcee, and it looks like they both converge to the same value (and that the condition on N/tau seems to hold for both).

Could you indicate which assumptions are not satisfied in the Gelman-Rubin algorithm? Maybe it affects only R_hat values? I tried to read the references, but I could not figure out the assumptions that must be satisfied for the MCMC samples for the algorithms to be applied, in fact both of them seem to incorporate empirical/practical cutoffs or approximations without any proof.

Any help or hints would be greatly appreciated. Thanks!

@dfm
Copy link
Owner

dfm commented Aug 2, 2019

@OriolAbril: That post looks great! All I meant by that is that the walkers in the stretch move are not formally independent so you'll be overly confident about your convergence if you treat them as independent. I'm sure it can still be a useful diagnostic!

@OriolAbril
Copy link
Contributor

Yes! I understand this point, which is why I expected ArviZ version to underestimate the autocorrelation time. Thus, the effective sample size would be overestimated, and we would end up being overconfident with convergence. But what happens in all the examples I have looked into is the opposite of this. That is why i am having trouble trying to understand what exactly is going on.

@OriolAbril
Copy link
Contributor

This may also be relevant to my previous question: https://discourse.mc-stan.org/t/does-effective-sample-size-estimation-require-independent-chains/

@dfm
Copy link
Owner

dfm commented Aug 6, 2019

Yes - that response there is exactly what I was saying! The chains do communicate at every step so they don't satisfy the assumptions in that work. It is interesting that the method works well, but it's not obvious that it will work well in less trivial experiments...

@OriolAbril
Copy link
Contributor

I know, what still confuses me is why do both methods converge to the same value and even worse, why does "Gelman" version overestimate the autocorrelation time instead of
underestimating it.

I will try to work out the maths in both approaches to see exactly where they differ and see if I can get why my intuition is not working.

@OriolAbril
Copy link
Contributor

Both algorithms are doing the same, with only two differences, one of them I think is not relevant. My notation is N indicates draws, M indicates chains/walkers and their lowercase version is their loop variable; c_f^m(tau) indicates the autocovariance of chain/walker m (I can write a proper blog post of latex document and send it to you if you prefer).

The differences are the following:

  1. The normalization of the autocovariance/autocorrelation is different: in emcee, the normalization is done with c_f^m(tau) / c_f^m(0) and then average over each chain m (autocorrelations are used) whereas the other algorithm normalizes with 1 - (W - C_f(tau)) / var_plus (an overconservative version of autocorrelation is used).
    • W = N / (N-1) * 1/M * sum^M(c_f^m(0)) (an average of each chain variance)
    • C_f(tau) = 1/M * sum^M(c_f^m(tau)) the average of the autocovariances of each chain
    • var_plus = C_f(0) + 1/N B being B the between chain variance, thus, var_plus >= C_f(0)
  2. As expected, both algorithms truncate the autocorrelation sum, but they choose this threshold to truncate differently. I think this difference only has numerical implications, as they try to estimate the same values with the minimum noise. If you think this is relevant I can comment this in more detail.

I think that the normalization difference has no implication on whether the chains used for the autocovariance/autocorrelation computation must or must not be independent, so I am more confused than ever. I mean, I have understood why "Gelman" algorithm gives autocorrelation times larger or equal than emcee's version, but how is the dependence taken into account (if taken into account)? Or what is the same, where is independence required (if required)?

Thanks for all your time! I hope I am not too annoying.

@nedahejazi
Copy link

Hello,
I seriously need help with emcee. I am trying to fit model spectra to observed ones. After working hard on code, the results are very strange. The chain values of parameters are the same for each walker. I mean the initial values do not change over the chain. Do you have any clue? If needed, I can send the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants