Allow and Provide Different Estimators for Autocovariance #774

ParadaCarleton · 2022-03-21T21:16:24Z

As mentioned in #273, there are many possible estimators or calculations for the autocovariance function. For example, the autocovariance can be estimated using:

A denominator of n (or n-1 with Bessel's correction) for all lags and the naive algorithm
A denominator of n-k (or n-k-1 with Bessel's correction) for all lags k and the naive algorithm
An approximate algorithm based on the FFT, which can be calculated in O(n*log(n)) time. (I will defer to @devmotion on this, as he implemented this in MCMCChains and knows more about it than me.)
Some other estimator -- for example, Geyer's initial positive sequence estimator finds the smallest lag k where γ[k] + γ[k+1] < 0, then sets γ[k]=0 for all larger k values. (The idea being that negative autocovariance is rare, so negative values reflect noise; observing a negative value implies the noise is greater than the signal.)

I think the complexity of autocovariance estimation warrants allowing different estimators for autocovariance (like we do for the covariance) and also providing at least a couple of these.

The text was updated successfully, but these errors were encountered:

nalimilan · 2022-03-23T09:01:41Z

It would indeed make sense to support some simple variants like we do for covariance. The more complex variants could be supported via custom AbstractCovarianceEstimator types provided by other package (again like for covariance).

devmotion · 2022-03-23T09:51:59Z

In MCMCChains/MCMCDiagnosticTools, we use a bit special setup, due to our application and for performance reasons, so I'm not completely sure how generally useful it is.

The implementation there, and some of the algorithms, are based on Stan and https://arxiv.org/pdf/1903.08008.pdf. The We only use the initial positive sequence estimator, which also implies that generally we only need a few lags. Hence, in contrast to what is reported in literature and done in Stan, the FFT method usually is not the most performant alternative. This is also consistent with the observation in MCMCDiagnostics, and one additional reason for why it's not the default method (another one is that it requires loading additional packages). As suggested by Geyer we also don't use denominators of n - k (or n - k - 1) in the default and the FFT method, which should basically give the same results. I don't remember the normalization of the variogram estimator from BDA right now, but I'm certain we didn't add any customizations there.

ParadaCarleton · 2022-03-23T17:28:55Z

In MCMCChains/MCMCDiagnosticTools, we use a bit special setup, due to our application and for performance reasons, so I'm not completely sure how generally useful it is.

The implementation there, and some of the algorithms, are based on Stan and https://arxiv.org/pdf/1903.08008.pdf. The We only use the initial positive sequence estimator, which also implies that generally we only need a few lags. Hence, in contrast to what is reported in literature and done in Stan, the FFT method usually is not the most performant alternative. This is also consistent with the observation in MCMCDiagnostics, and one additional reason for why it's not the default method (another one is that it requires loading additional packages). As suggested by Geyer we also don't use denominators of n - k (or n - k - 1) in the default and the FFT method, which should basically give the same results. I don't remember the normalization of the variogram estimator from BDA right now, but I'm certain we didn't add any customizations there.

The variogram estimator always uses a denominator of n-h, which makes it much less downward-biased than the sample covariance matrix with denominator of n (what we use now).

devmotion · 2022-03-23T17:49:53Z

As said, in the literature the use of n (and not n - k) is recommended for MCMC analysis. Surely not everyone agrees - but the nice thing is that you can just choose whatever you prefer in MCMCChains and MCMCDiagnosticTools 🙂

ParadaCarleton · 2022-03-25T01:48:47Z

As said, in the literature the use of n (and not n - k) is recommended for MCMC analysis. Surely not everyone agrees - but the nice thing is that you can just choose whatever you prefer in MCMCChains and MCMCDiagnosticTools slightly_smiling_face

For the sample autocovariance, yes, n is better than n-k; the variogram estimator combined with the Geyer stopping rule is different though, because the differencing in the variogram yields more stable estimates. From BDA3:

devmotion · 2022-03-25T21:35:31Z

Not sure what you want to say unfortunately. We implement exactly the version from BDA.

ParadaCarleton · 2022-03-26T16:08:20Z

Not sure what you want to say unfortunately. We implement exactly the version from BDA.

I'm saying that the version from BDA uses n-k instead of n. Actually, this might explain why (when testing ParetoSmooth) I found it gives significantly different ESS estimates than Stan (as in, noticeably different, not practically different). I'll make an issue.

devmotion · 2022-03-26T16:13:23Z

I'm saying that the version from BDA uses n-k instead of n.

Sure, but that's what's implemented in MCMCDiagnosticTools. I don't understand where the problem is.

devmotion · 2022-03-26T16:14:21Z

Stan uses the FFT approach, so any other comparison will be flawed.

This was referenced Mar 21, 2022

autocov incorrect? #273

Closed

Package functionality review JuliaStats/TimeSeries.jl#482

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow and Provide Different Estimators for Autocovariance #774

Allow and Provide Different Estimators for Autocovariance #774

ParadaCarleton commented Mar 21, 2022

nalimilan commented Mar 23, 2022

devmotion commented Mar 23, 2022

ParadaCarleton commented Mar 23, 2022 •

edited

devmotion commented Mar 23, 2022

ParadaCarleton commented Mar 25, 2022 •

edited

devmotion commented Mar 25, 2022

ParadaCarleton commented Mar 26, 2022

devmotion commented Mar 26, 2022

devmotion commented Mar 26, 2022

Allow and Provide Different Estimators for Autocovariance #774

Allow and Provide Different Estimators for Autocovariance #774

Comments

ParadaCarleton commented Mar 21, 2022

nalimilan commented Mar 23, 2022

devmotion commented Mar 23, 2022

ParadaCarleton commented Mar 23, 2022 • edited

devmotion commented Mar 23, 2022

ParadaCarleton commented Mar 25, 2022 • edited

devmotion commented Mar 25, 2022

ParadaCarleton commented Mar 26, 2022

devmotion commented Mar 26, 2022

devmotion commented Mar 26, 2022

ParadaCarleton commented Mar 23, 2022 •

edited

ParadaCarleton commented Mar 25, 2022 •

edited