Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autocorrelation time: is taking the mean the right thing to do? #209

Closed
fardal opened this issue Jan 13, 2017 · 10 comments
Closed

Autocorrelation time: is taking the mean the right thing to do? #209

fardal opened this issue Jan 13, 2017 · 10 comments

Comments

@fardal
Copy link

fardal commented Jan 13, 2017

I've been running some tests of some MCMC variations with emcee, and have been using the autocorrelation time to estimate the speedup. I was surprised how many steps it took to get a consistent autocorrelation time from EnsembleSampler.acor. So I tried a different script I had for the autocorrelation time, and got much more stable results. I am pretty sure the chief reason for the difference is that my routine calculates the autocorrelation function for each walker, then takes the mean. In contrast emcee first takes the mean of the walkers, and then computes the autocorrelation of this mean chain. It seems like the latter method is washing out the very fluctuations that the autocorrelation is supposed to measure.

Here is an example that compares the two approaches, using an autoregressive model where the exact autocorrelation time is known. Typical results look like this:

True autocorr length:  20.0084033613
Mean, std, min, max of two methods:
Corr-first approach:  19.7929288976 0.358159093104 18.9992952537 20.8373795818
Mean-first approach:  19.685956288 5.69231566791 11.0983690515 48.3939875026

My own routine for the autocorrelation time has its own drawbacks (not fast, questionable window choice) and I'm not suggesting it as a direct replacement, but it does seem to win easily in terms of precision. I think theoretically the advantage should be a factor sqrt(Nwalkers), though in practice it seems slightly larger.

I don't think the autocorrelation or the number of walkers or steps in this example is too unreasonable, compared to what people often run. The behavior is similar with chains actually generated from EnsembleSampler from the Gaussian posterior function in the intro docs, though in that case we don't know the true value for comparison.

I saw a note somewhere in the emcee repository that the reason for taking the mean is explained in Goodman and Weare, but I didn't follow the reasoning. Whatever the motivation is, I wonder if it justifies the much greater variance in the autocorrelation time. While acor is not the primary thing anyone's interested in, estimating it badly can lead to running the simulation much longer than necessary, or stopping it too soon and thus undersampling the posterior.

autocorr

"""Tests of autocorrelation length.
"""
import numpy as np
import matplotlib.pyplot as plt
from emcee import ensemble


def autoregressive():
    m_walkers = 64

    n_steps = 10000
    R1 = 0.9048    # length = 20

    # n_steps = 10000
    # R1 = 0.8    # length = 9

    # n_steps = 30000
    # R1 = 0.9355  # length = 30

    sig_tot = 1.0
    sig1 = sig_tot * np.sqrt(1. -R1**2)
    truelength = (R1**np.abs(np.arange(-1000,1000))).sum()
    # print 'true length: ', truelength
    chain = np.zeros((m_walkers, n_steps))
    chain[:,0] = sig_tot * np.random.normal(size=m_walkers)
    for i in xrange(1,n_steps):
        chain[:,i] = R1 * chain[:,i-1] + sig1 * np.random.normal(size=m_walkers)
    # print 'mean, expected: ', chain.mean(), 0.
    # print 'standard deviation, expected: ', chain.std(), sig_tot
    chain.shape = (m_walkers, n_steps, 1)  # make last index param dimension (1-d here)
    return chain, truelength


def findauto(params):
    ns = params.shape[0]
    mc = params.shape[1]
    nparams = params.shape[2]
    assert nparams == 1
    ncorr = min([400, (ns-10)])
    for k in xrange(nparams):
        chain = params[:,:,k]
        xsig = chain.std()
        chaincorr = np.zeros((ncorr,mc))
        for j in xrange(mc):
            for i in xrange(ncorr):
                nsample = len(chain[i:,j])
                chaincorr[i,j] = (chain[:nsample,j] * chain[i:,j]).mean()
                chainmean_a = chain[:nsample,j].mean()
                chainmean_b = chain[i:,j].mean()
                chaincorr[i,j] -= chainmean_a * chainmean_b
        corr = chaincorr.mean(axis=1)
        corr /= corr[0]
        index = np.arange(ncorr)
        nclust_cum = 2.*corr.cumsum() - 1.
        end = index > 3.*nclust_cum
        if end.any():
            imax = np.where(end)[0][0]
            nclust = nclust_cum[imax-1]
        else:
            imax = -1
            nclust = nclust_cum[-1]
            print 'WARNING: autocorr not converged.'
            
        return nclust  # hack for my test-1-param usage here

    
def findautofromemcee(chain):
    m_walkers, n_steps, dim = chain.shape
    def foofn():
        pass
    sampler = ensemble.EnsembleSampler(m_walkers, dim, foofn, a=2.0)
    sampler._chain = chain
    return sampler.acor[0]

    
def do():
    ntrial = 100
    # ntrial = 5
    corrfirst = np.zeros(ntrial)
    meanfirst = np.zeros(ntrial)
    for i in xrange(ntrial):
        if i % 10 == 0: print i
        chain, truelength = autoregressive()
        biechain = chain[:,:,0].T[:,:,np.newaxis]
        corrfirst[i] = findauto(biechain)
        meanfirst[i] = findautofromemcee(chain)
    print 'True autocorr length: ', truelength
    print 'Mean, std, min, max of two methods:'
    print 'Corr-first approach: ', corrfirst.mean(), corrfirst.std(), corrfirst.min(), corrfirst.max()
    print 'Mean-first approach: ', meanfirst.mean(), meanfirst.std(), meanfirst.min(), meanfirst.max()

    a,b,c = plt.hist(meanfirst, histtype='step', color='b', bins=20, label='Mean-first approach')
    a,b,c = plt.hist(corrfirst, histtype='step', color='r', bins=20, label='Corr-first approach')
    ylim = np.array(plt.ylim()); plt.plot(np.ones(2)+truelength, ylim, 'g--')
    plt.legend()
    plt.show()
@dfm
Copy link
Owner

dfm commented Jan 13, 2017

It's an interesting question - thanks for sharing. I don't have a strong opinion about this and this is a really nice demo. I have already put more options for autocorrelation time calculations in the dev version of emcee (dfm/emcee3) and the acor property won't exist anymore because users should know that they're making a choice but I'll make sure that "mean_after" is included as an option. Thanks!

For anyone's future reference, @fardal's method can be computed using emcee as follows:

import numpy as np
from emcee.autocorr import integrated_time
chain = sampler.chain
tau = np.mean([integrated_time(walker) for walker in chain], axis=0)

@fardal
Copy link
Author

fardal commented Jan 13, 2017

Thanks for looking into this.

I think I'm doing something slightly different from your code snippet - I'm averaging the autocorrelation functions from the different chains together, then computing the autocorrelation time based on that average. For long chains there may not be much of a difference, but for short chains I worry that the window selection will go to hell due to the noise in the individual chains.

I think this method can also be computed with emcee...

@dfm
Copy link
Owner

dfm commented Jan 13, 2017

Oops! Good catch - it was early in the morning in the PNW :-)

Yeah that would also be possible but it would require a bit more work. I'll play around with this a bit. Thanks!

@fardal
Copy link
Author

fardal commented Jan 13, 2017

Another minor thing: I gathered you used the notes by Sokal to choose the conservative default of c=10. But I think his definition of autocorrelation time is a factor 2 different than yours so maybe it should be 5? Not very important but could prevent some complaints about short chains.

Really this is all part of a grand campaign to get all those people with <1 min likelihood functions to run fewer steps and stop making me look bad 😇

@dfm
Copy link
Owner

dfm commented Jan 18, 2017

Haha. Yeah. I feel you! I guarantee that you're not running your chains to convergence though (no one really does - even those with <1min likelihood costs). I'm fighting the opposite battle of trying to get people to be more careful about convergence! That being says, it looks like you're right about the factor of 2 so I'll run some tests and consider changing the default.

@jmatta1
Copy link

jmatta1 commented Oct 31, 2017

@fardal, I rather like the improved variance of your example. I took your example and tweaked it a bit to use FFT methods for calculating the autocovariance function and to handle means that are not the same as zero.

Another difference I noticed between the two methods, in addition to the larger variance of the "average the walkers first" technique is that the "average the walkers first" method seems to generate a skewed gaussian instead of a symmetric one. I found this when, out of curiosity, I was pushing the number of trials absurdly high (using multiprocessing to get some parallelism). I explain a little more in this gist I made which also has the two scripts I used for the tests and plot generation. I this into two scripts so I could run the actual autocorrelation time calculation separately on a big 48-core machine in the cluster we have here.

acor_9

@dfm
Copy link
Owner

dfm commented Oct 31, 2017 via email

@dfm
Copy link
Owner

dfm commented Nov 6, 2017

Thanks to this issue, a procedure much like @fardal's will be used in the forthcoming 3.0 release.

@dfm dfm closed this as completed Nov 6, 2017
@gmorello
Copy link

gmorello commented Feb 7, 2018

I have been using the Fardal's method to estimate the autocorrelation time of my chains:
tau = np.mean([integrated_time(walker) for walker in chain], axis=0)

I noted that in some fits with many parameters I need to run very long chains in order not to get the message "chain too short".
For example for 200,000 iterations I may get "chain too short" but then if I run 500,000 iterations I get tau = 650, which is much smaller than the original 200,000 iterations (The same happen if I take a shorter subchain of the longer chain).
I also noted that increasing the number of walkers tends to slightly decrease the value of tau, but I need more iterations not to get the message "chain too short".
Is this behaviour expected?

@JohannesBuchner
Copy link

Perhaps of interest to readers coming from the 2017 autocorr blog post:

This page: https://johannesbuchner.github.io/autoemcee/mcmc-ensemble-convergence.html
presents a way to compute the R-hat convergence per walker and across ensembles, and should be a better convergence test, robust against the issues identified by Huijser et al for the affine-invariant ensemble sampler implemented by emcee.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants