Autocorrelation time: is taking the mean the right thing to do? #209

fardal · 2017-01-13T04:16:17Z

I've been running some tests of some MCMC variations with emcee, and have been using the autocorrelation time to estimate the speedup. I was surprised how many steps it took to get a consistent autocorrelation time from EnsembleSampler.acor. So I tried a different script I had for the autocorrelation time, and got much more stable results. I am pretty sure the chief reason for the difference is that my routine calculates the autocorrelation function for each walker, then takes the mean. In contrast emcee first takes the mean of the walkers, and then computes the autocorrelation of this mean chain. It seems like the latter method is washing out the very fluctuations that the autocorrelation is supposed to measure.

Here is an example that compares the two approaches, using an autoregressive model where the exact autocorrelation time is known. Typical results look like this:

True autocorr length:  20.0084033613
Mean, std, min, max of two methods:
Corr-first approach:  19.7929288976 0.358159093104 18.9992952537 20.8373795818
Mean-first approach:  19.685956288 5.69231566791 11.0983690515 48.3939875026

My own routine for the autocorrelation time has its own drawbacks (not fast, questionable window choice) and I'm not suggesting it as a direct replacement, but it does seem to win easily in terms of precision. I think theoretically the advantage should be a factor sqrt(Nwalkers), though in practice it seems slightly larger.

I don't think the autocorrelation or the number of walkers or steps in this example is too unreasonable, compared to what people often run. The behavior is similar with chains actually generated from EnsembleSampler from the Gaussian posterior function in the intro docs, though in that case we don't know the true value for comparison.

I saw a note somewhere in the emcee repository that the reason for taking the mean is explained in Goodman and Weare, but I didn't follow the reasoning. Whatever the motivation is, I wonder if it justifies the much greater variance in the autocorrelation time. While acor is not the primary thing anyone's interested in, estimating it badly can lead to running the simulation much longer than necessary, or stopping it too soon and thus undersampling the posterior.

"""Tests of autocorrelation length.
"""
import numpy as np
import matplotlib.pyplot as plt
from emcee import ensemble


def autoregressive():
    m_walkers = 64

    n_steps = 10000
    R1 = 0.9048    # length = 20

    # n_steps = 10000
    # R1 = 0.8    # length = 9

    # n_steps = 30000
    # R1 = 0.9355  # length = 30

    sig_tot = 1.0
    sig1 = sig_tot * np.sqrt(1. -R1**2)
    truelength = (R1**np.abs(np.arange(-1000,1000))).sum()
    # print 'true length: ', truelength
    chain = np.zeros((m_walkers, n_steps))
    chain[:,0] = sig_tot * np.random.normal(size=m_walkers)
    for i in xrange(1,n_steps):
        chain[:,i] = R1 * chain[:,i-1] + sig1 * np.random.normal(size=m_walkers)
    # print 'mean, expected: ', chain.mean(), 0.
    # print 'standard deviation, expected: ', chain.std(), sig_tot
    chain.shape = (m_walkers, n_steps, 1)  # make last index param dimension (1-d here)
    return chain, truelength


def findauto(params):
    ns = params.shape[0]
    mc = params.shape[1]
    nparams = params.shape[2]
    assert nparams == 1
    ncorr = min([400, (ns-10)])
    for k in xrange(nparams):
        chain = params[:,:,k]
        xsig = chain.std()
        chaincorr = np.zeros((ncorr,mc))
        for j in xrange(mc):
            for i in xrange(ncorr):
                nsample = len(chain[i:,j])
                chaincorr[i,j] = (chain[:nsample,j] * chain[i:,j]).mean()
                chainmean_a = chain[:nsample,j].mean()
                chainmean_b = chain[i:,j].mean()
                chaincorr[i,j] -= chainmean_a * chainmean_b
        corr = chaincorr.mean(axis=1)
        corr /= corr[0]
        index = np.arange(ncorr)
        nclust_cum = 2.*corr.cumsum() - 1.
        end = index > 3.*nclust_cum
        if end.any():
            imax = np.where(end)[0][0]
            nclust = nclust_cum[imax-1]
        else:
            imax = -1
            nclust = nclust_cum[-1]
            print 'WARNING: autocorr not converged.'
            
        return nclust  # hack for my test-1-param usage here

    
def findautofromemcee(chain):
    m_walkers, n_steps, dim = chain.shape
    def foofn():
        pass
    sampler = ensemble.EnsembleSampler(m_walkers, dim, foofn, a=2.0)
    sampler._chain = chain
    return sampler.acor[0]

    
def do():
    ntrial = 100
    # ntrial = 5
    corrfirst = np.zeros(ntrial)
    meanfirst = np.zeros(ntrial)
    for i in xrange(ntrial):
        if i % 10 == 0: print i
        chain, truelength = autoregressive()
        biechain = chain[:,:,0].T[:,:,np.newaxis]
        corrfirst[i] = findauto(biechain)
        meanfirst[i] = findautofromemcee(chain)
    print 'True autocorr length: ', truelength
    print 'Mean, std, min, max of two methods:'
    print 'Corr-first approach: ', corrfirst.mean(), corrfirst.std(), corrfirst.min(), corrfirst.max()
    print 'Mean-first approach: ', meanfirst.mean(), meanfirst.std(), meanfirst.min(), meanfirst.max()

    a,b,c = plt.hist(meanfirst, histtype='step', color='b', bins=20, label='Mean-first approach')
    a,b,c = plt.hist(corrfirst, histtype='step', color='r', bins=20, label='Corr-first approach')
    ylim = np.array(plt.ylim()); plt.plot(np.ones(2)+truelength, ylim, 'g--')
    plt.legend()
    plt.show()

The text was updated successfully, but these errors were encountered:

dfm · 2017-01-13T14:27:19Z

It's an interesting question - thanks for sharing. I don't have a strong opinion about this and this is a really nice demo. I have already put more options for autocorrelation time calculations in the dev version of emcee (dfm/emcee3) and the acor property won't exist anymore because users should know that they're making a choice but I'll make sure that "mean_after" is included as an option. Thanks!

For anyone's future reference, @fardal's method can be computed using emcee as follows:

import numpy as np
from emcee.autocorr import integrated_time
chain = sampler.chain
tau = np.mean([integrated_time(walker) for walker in chain], axis=0)

fardal · 2017-01-13T14:53:36Z

Thanks for looking into this.

I think I'm doing something slightly different from your code snippet - I'm averaging the autocorrelation functions from the different chains together, then computing the autocorrelation time based on that average. For long chains there may not be much of a difference, but for short chains I worry that the window selection will go to hell due to the noise in the individual chains.

I think this method can also be computed with emcee...

dfm · 2017-01-13T19:21:55Z

Oops! Good catch - it was early in the morning in the PNW :-)

Yeah that would also be possible but it would require a bit more work. I'll play around with this a bit. Thanks!

fardal · 2017-01-13T20:52:53Z

Another minor thing: I gathered you used the notes by Sokal to choose the conservative default of c=10. But I think his definition of autocorrelation time is a factor 2 different than yours so maybe it should be 5? Not very important but could prevent some complaints about short chains.

Really this is all part of a grand campaign to get all those people with <1 min likelihood functions to run fewer steps and stop making me look bad 😇

dfm · 2017-01-18T16:15:10Z

Haha. Yeah. I feel you! I guarantee that you're not running your chains to convergence though (no one really does - even those with <1min likelihood costs). I'm fighting the opposite battle of trying to get people to be more careful about convergence! That being says, it looks like you're right about the factor of 2 so I'll run some tests and consider changing the default.

jmatta1 · 2017-10-31T21:35:52Z

@fardal, I rather like the improved variance of your example. I took your example and tweaked it a bit to use FFT methods for calculating the autocovariance function and to handle means that are not the same as zero.

Another difference I noticed between the two methods, in addition to the larger variance of the "average the walkers first" technique is that the "average the walkers first" method seems to generate a skewed gaussian instead of a symmetric one. I found this when, out of curiosity, I was pushing the number of trials absurdly high (using multiprocessing to get some parallelism). I explain a little more in this gist I made which also has the two scripts I used for the tests and plot generation. I this into two scripts so I could run the actual autocorrelation time calculation separately on a big 48-core machine in the cluster we have here.

dfm · 2017-10-31T23:01:15Z

People who find this issue might also enjoy: http://dfm.io/posts/autocorr/

…

On Tue, Oct 31, 2017 at 5:35 PM James T. Matta ***@***.***> wrote: @fardal <https://github.com/fardal>, I rather like the improved variance of your example. I took your example and tweaked it a bit to use FFT methods for calculating the autocovariance function and to handle means that are not the same as zero. Another difference I noticed between the two methods, in addition to the larger variance of the "average the walkers first" technique is that the "average the walkers first" method seems to generate a skewed gaussian instead of a symmetric one. I found this when, out of curiosity, I was pushing the number of trials absurdly high (using multiprocessing to get some parallelism). I had to split the code into two files, I explain a little more in this gist I made <https://gist.github.com/jmatta1/6283bbd504d90aafdb5ff2542e4b7787> which also has the two scripts I used for the tests and plot generation. [image: acor_9] <https://user-images.githubusercontent.com/5047947/32250197-ea3f2b4c-be61-11e7-9372-1f69087347d6.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#209 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAVYSscOpg5YtLD0NhrL5gGRzE-lEKd0ks5sx5K4gaJpZM4LigoH> .

dfm · 2017-11-06T20:45:44Z

Thanks to this issue, a procedure much like @fardal's will be used in the forthcoming 3.0 release.

gmorello · 2018-02-07T17:03:48Z

I have been using the Fardal's method to estimate the autocorrelation time of my chains:
tau = np.mean([integrated_time(walker) for walker in chain], axis=0)

I noted that in some fits with many parameters I need to run very long chains in order not to get the message "chain too short".
For example for 200,000 iterations I may get "chain too short" but then if I run 500,000 iterations I get tau = 650, which is much smaller than the original 200,000 iterations (The same happen if I take a shorter subchain of the longer chain).
I also noted that increasing the number of walkers tends to slightly decrease the value of tau, but I need more iterations not to get the message "chain too short".
Is this behaviour expected?

JohannesBuchner · 2023-12-11T17:26:45Z

Perhaps of interest to readers coming from the 2017 autocorr blog post:

This page: https://johannesbuchner.github.io/autoemcee/mcmc-ensemble-convergence.html
presents a way to compute the R-hat convergence per walker and across ensembles, and should be a better convergence test, robust against the issues identified by Huijser et al for the affine-invariant ensemble sampler implemented by emcee.

dfm closed this as completed Nov 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autocorrelation time: is taking the mean the right thing to do? #209

Autocorrelation time: is taking the mean the right thing to do? #209

fardal commented Jan 13, 2017

dfm commented Jan 13, 2017

fardal commented Jan 13, 2017

dfm commented Jan 13, 2017

fardal commented Jan 13, 2017

dfm commented Jan 18, 2017

jmatta1 commented Oct 31, 2017 •

edited

Loading

dfm commented Oct 31, 2017 via email

dfm commented Nov 6, 2017

gmorello commented Feb 7, 2018

JohannesBuchner commented Dec 11, 2023

Autocorrelation time: is taking the mean the right thing to do? #209

Autocorrelation time: is taking the mean the right thing to do? #209

Comments

fardal commented Jan 13, 2017

dfm commented Jan 13, 2017

fardal commented Jan 13, 2017

dfm commented Jan 13, 2017

fardal commented Jan 13, 2017

dfm commented Jan 18, 2017

jmatta1 commented Oct 31, 2017 • edited Loading

dfm commented Oct 31, 2017 via email

dfm commented Nov 6, 2017

gmorello commented Feb 7, 2018

JohannesBuchner commented Dec 11, 2023

jmatta1 commented Oct 31, 2017 •

edited

Loading