I've been running some tests of some MCMC variations with emcee, and have been using the autocorrelation time to estimate the speedup. I was surprised how many steps it took to get a consistent autocorrelation time from EnsembleSampler.acor. So I tried a different script I had for the autocorrelation time, and got much more stable results. I am pretty sure the chief reason for the difference is that my routine calculates the autocorrelation function for each walker, then takes the mean. In contrast emcee first takes the mean of the walkers, and then computes the autocorrelation of this mean chain. It seems like the latter method is washing out the very fluctuations that the autocorrelation is supposed to measure.
Here is an example that compares the two approaches, using an autoregressive model where the exact autocorrelation time is known. Typical results look like this:
True autocorr length: 20.0084033613
Mean, std, min, max of two methods:
Corr-first approach: 19.7929288976 0.358159093104 18.9992952537 20.8373795818
Mean-first approach: 19.685956288 5.69231566791 11.0983690515 48.3939875026
My own routine for the autocorrelation time has its own drawbacks (not fast, questionable window choice) and I'm not suggesting it as a direct replacement, but it does seem to win easily in terms of precision. I think theoretically the advantage should be a factor sqrt(Nwalkers), though in practice it seems slightly larger.
I don't think the autocorrelation or the number of walkers or steps in this example is too unreasonable, compared to what people often run. The behavior is similar with chains actually generated from EnsembleSampler from the Gaussian posterior function in the intro docs, though in that case we don't know the true value for comparison.
I saw a note somewhere in the emcee repository that the reason for taking the mean is explained in Goodman and Weare, but I didn't follow the reasoning. Whatever the motivation is, I wonder if it justifies the much greater variance in the autocorrelation time. While acor is not the primary thing anyone's interested in, estimating it badly can lead to running the simulation much longer than necessary, or stopping it too soon and thus undersampling the posterior.

"""Tests of autocorrelation length.
"""
import numpy as np
import matplotlib.pyplot as plt
from emcee import ensemble
def autoregressive():
m_walkers = 64
n_steps = 10000
R1 = 0.9048 # length = 20
# n_steps = 10000
# R1 = 0.8 # length = 9
# n_steps = 30000
# R1 = 0.9355 # length = 30
sig_tot = 1.0
sig1 = sig_tot * np.sqrt(1. -R1**2)
truelength = (R1**np.abs(np.arange(-1000,1000))).sum()
# print 'true length: ', truelength
chain = np.zeros((m_walkers, n_steps))
chain[:,0] = sig_tot * np.random.normal(size=m_walkers)
for i in xrange(1,n_steps):
chain[:,i] = R1 * chain[:,i-1] + sig1 * np.random.normal(size=m_walkers)
# print 'mean, expected: ', chain.mean(), 0.
# print 'standard deviation, expected: ', chain.std(), sig_tot
chain.shape = (m_walkers, n_steps, 1) # make last index param dimension (1-d here)
return chain, truelength
def findauto(params):
ns = params.shape[0]
mc = params.shape[1]
nparams = params.shape[2]
assert nparams == 1
ncorr = min([400, (ns-10)])
for k in xrange(nparams):
chain = params[:,:,k]
xsig = chain.std()
chaincorr = np.zeros((ncorr,mc))
for j in xrange(mc):
for i in xrange(ncorr):
nsample = len(chain[i:,j])
chaincorr[i,j] = (chain[:nsample,j] * chain[i:,j]).mean()
chainmean_a = chain[:nsample,j].mean()
chainmean_b = chain[i:,j].mean()
chaincorr[i,j] -= chainmean_a * chainmean_b
corr = chaincorr.mean(axis=1)
corr /= corr[0]
index = np.arange(ncorr)
nclust_cum = 2.*corr.cumsum() - 1.
end = index > 3.*nclust_cum
if end.any():
imax = np.where(end)[0][0]
nclust = nclust_cum[imax-1]
else:
imax = -1
nclust = nclust_cum[-1]
print 'WARNING: autocorr not converged.'
return nclust # hack for my test-1-param usage here
def findautofromemcee(chain):
m_walkers, n_steps, dim = chain.shape
def foofn():
pass
sampler = ensemble.EnsembleSampler(m_walkers, dim, foofn, a=2.0)
sampler._chain = chain
return sampler.acor[0]
def do():
ntrial = 100
# ntrial = 5
corrfirst = np.zeros(ntrial)
meanfirst = np.zeros(ntrial)
for i in xrange(ntrial):
if i % 10 == 0: print i
chain, truelength = autoregressive()
biechain = chain[:,:,0].T[:,:,np.newaxis]
corrfirst[i] = findauto(biechain)
meanfirst[i] = findautofromemcee(chain)
print 'True autocorr length: ', truelength
print 'Mean, std, min, max of two methods:'
print 'Corr-first approach: ', corrfirst.mean(), corrfirst.std(), corrfirst.min(), corrfirst.max()
print 'Mean-first approach: ', meanfirst.mean(), meanfirst.std(), meanfirst.min(), meanfirst.max()
a,b,c = plt.hist(meanfirst, histtype='step', color='b', bins=20, label='Mean-first approach')
a,b,c = plt.hist(corrfirst, histtype='step', color='r', bins=20, label='Corr-first approach')
ylim = np.array(plt.ylim()); plt.plot(np.ones(2)+truelength, ylim, 'g--')
plt.legend()
plt.show()
I've been running some tests of some MCMC variations with emcee, and have been using the autocorrelation time to estimate the speedup. I was surprised how many steps it took to get a consistent autocorrelation time from EnsembleSampler.acor. So I tried a different script I had for the autocorrelation time, and got much more stable results. I am pretty sure the chief reason for the difference is that my routine calculates the autocorrelation function for each walker, then takes the mean. In contrast emcee first takes the mean of the walkers, and then computes the autocorrelation of this mean chain. It seems like the latter method is washing out the very fluctuations that the autocorrelation is supposed to measure.
Here is an example that compares the two approaches, using an autoregressive model where the exact autocorrelation time is known. Typical results look like this:
My own routine for the autocorrelation time has its own drawbacks (not fast, questionable window choice) and I'm not suggesting it as a direct replacement, but it does seem to win easily in terms of precision. I think theoretically the advantage should be a factor sqrt(Nwalkers), though in practice it seems slightly larger.
I don't think the autocorrelation or the number of walkers or steps in this example is too unreasonable, compared to what people often run. The behavior is similar with chains actually generated from EnsembleSampler from the Gaussian posterior function in the intro docs, though in that case we don't know the true value for comparison.
I saw a note somewhere in the emcee repository that the reason for taking the mean is explained in Goodman and Weare, but I didn't follow the reasoning. Whatever the motivation is, I wonder if it justifies the much greater variance in the autocorrelation time. While acor is not the primary thing anyone's interested in, estimating it badly can lead to running the simulation much longer than necessary, or stopping it too soon and thus undersampling the posterior.