# MCMC3.5: Jackknife Resampling

To estimate the error of observables accurately, it is recommended to use the Jackknife resampling method.

## Binning

Previously, we faced a large errorbar problem at low temperature in this program:

In [1]:
using ResumableFunctions
using SparseArrays
using LinearAlgebra
const Jx = 1 / 3 # oppposite sign to Motome's
const Jy = 1 / 3
const Jz = 1 / 3
function Metropolis(βF::Float64, βFnew::Float64)::Bool
    βF - βFnew > log(rand())
end
function openhoneycomb(Lx::Int64, Ly::Int64)::Tuple
    N = 2Lx * Ly
    nnx = zip(1 : 2 : (N - 1), 2 : 2 : N)
    nny = Iterators.flatten(zip((1 + 2i) : 2Lx : (2Lx * (Ly - 1)  + 1 + 2i), 2i : 2Lx : (2Lx * (Ly - 1)  + 2i)) for i in 1 : (Lx - 1))
    nnz = zip(1 : 2 : (N - 1), Iterators.flatten(((2Lx + 2) : 2 : N, 2 : 2 : 2Lx)))
    plaquette = Iterators.flatten(zip((Lx * (i - 1) + 1) : (Lx * (i - 1) + Lx - 1), (Lx * (i - 1) + 2) : (Lx * (i - 1) + Lx)) for i in 1 : Ly)
    N, nnx, nny, nnz, plaquette
end
@resumable function measurementflux(method::Function, lattice::Function, β::Float64, Lx::Int64, Ly::Int64)::Float64
    N, nnx, nny, nnz, plaquette = lattice(Lx, Ly)
    iter = Iterators.flatten((Iterators.product(J, nn) for (J, nn) in [(Jx, nnx), (Jy, nny), (Jz, nnz)]))
    h = spzeros(Complex{Float64}, N, N)
    for (J, nn) in iter
        h[nn[1], nn[2]] = 0.5im * J
        h[nn[2], nn[1]] = -0.5im * J
    end
    NNz = collect(nnz)
    Nz = length(NNz)
    η = ones(Int64, Nz)
    βF = 0.0
    hdense = Array(h)
    plaq = collect(plaquette)
    Np = length(plaq)
    while true
        for i in 1 : Nz
            j = rand(1 : Nz)
            hdense[NNz[j][1], NNz[j][2]] = -hdense[NNz[j][1], NNz[j][2]]
            hdense[NNz[j][2], NNz[j][1]] = -hdense[NNz[j][2], NNz[j][1]]
            ev = eigvals(Hermitian(hdense))
            positiveev = Iterators.drop(ev, N >> 1)
            βFnew = -sum((log(2.0 * cosh(β * ϵ / 2.0)) for ϵ in positiveev))
            if method(βF, βFnew)
                η[j] = -η[j]
                βF = βFnew
            else
                hdense[NNz[j][1], NNz[j][2]] = -hdense[NNz[j][1], NNz[j][2]]
                hdense[NNz[j][2], NNz[j][1]] = -hdense[NNz[j][2], NNz[j][1]]
            end
        end
        @yield sum((η[k] * η[l] for (k, l) in plaq)) / Np
    end
end

measurementflux (generic function with 1 method)

One reason is the discreteness of the returned value "flux."

In [2]:
mcstep = Iterators.drop(measurementflux(Metropolis, openhoneycomb, 100.0, 4, 4), 10000)
foreach(println, Iterators.take(mcstep, 10))

0.0
0.6666666666666666
0.5
0.3333333333333333
0.16666666666666666
0.3333333333333333
0.3333333333333333
0.5
0.5
0.3333333333333333


We have to flatten these quantized values by binning.

In [3]:
using Statistics
Nsample = 10000
Nbin = 10
Nbinsize = Nsample ÷ Nbin
iter = Iterators.partition(Iterators.take(mcstep, Nsample), Nbinsize)
bin = collect(map(mean, iter))

10-element Array{Float64,1}:
 0.30449999999999994
 0.29566666666666663
 0.30600000000000005
 0.2941666666666667 
 0.30616666666666664
 0.301              
 0.2953333333333334 
 0.29749999999999993
 0.2909999999999999 
 0.29850000000000004

Now it works!

In [4]:
m = mean(bin)
s = stdm(bin, m) / sqrt(length(bin))
println("$m ± $s")

0.2989833333333333 ± 0.0016634134917225033


Nbinsize has to be determined based on the autocorrelation. In order to reduce Nbin to get a more acculate result at low temperature, we need to implement some global updating algorithm.

## Delete-1 jackknife resampling

Delete-1 jackknife resampling is simply implemented in https://github.com/ararslan/Jackknife.jl. However, the function is limited, so I will newly define functions for the jackknife resampling.

In [5]:
function leaveoneout(before::Function, after::Function, v::AbstractVector)
    ind = eachindex(v)
    map(i -> after(mean(map(before, view(v, filter(!isequal(i), ind))))), ind)
end
meanJ(b::Function, a::Function, v::AbstractVector) = mean(leaveoneout(b, a, v))
stdmJ(b::Function, a::Function, v::AbstractVector, m) = stdm(leaveoneout(b, a, v), m, corrected = false) * sqrt(length(v) - 1)
stdJ(b::Function, a::Function, v::AbstractVector) = stdmJ(b, a, v, meanJ(b, a, v))

stdJ (generic function with 1 method)

The functions are based on Statistics.jl and Jackknife.jl, so please see their reference to know how it works. I again use measurementEf for the demonstration.

In [6]:
@resumable function measurementEf(method::Function, lattice::Function, β::Float64, Lx::Int64, Ly::Int64)::Vector{Float64}
    N, nnx, nny, nnz, plaquette = lattice(Lx, Ly)
    iter = Iterators.flatten((Iterators.product(J, nn) for (J, nn) in [(Jx, nnx), (Jy, nny), (Jz, nnz)]))
    h = spzeros(Complex{Float64}, N, N)
    for (J, nn) in iter
        h[nn[1], nn[2]] = 0.5im * J
        h[nn[2], nn[1]] = -0.5im * J
    end
    NNz = collect(nnz)
    Nz = length(NNz)
    η = ones(Int64, Nz)
    βF = 0.0
    β₂ = β * 0.5
    hdense = Array(h)
    plaq = collect(plaquette)
    Np = length(plaq)
    ev = zeros(Float64, N)
    while true
        for i in 1 : Nz
            j = rand(1 : Nz)
            hdense[NNz[j][1], NNz[j][2]] = -hdense[NNz[j][1], NNz[j][2]]
            hdense[NNz[j][2], NNz[j][1]] = -hdense[NNz[j][2], NNz[j][1]]
            evnew = eigvals(Hermitian(hdense))
            βFnew = -sum(@. log(exp(β₂ * evnew[(N >> 1 + 1) : end]) + exp(-β₂ * evnew[(N >> 1 + 1) : end])))
            if method(βF, βFnew)
                η[j] = -η[j]
                βF = βFnew
                ev .= evnew
            else
                hdense[NNz[j][1], NNz[j][2]] = -hdense[NNz[j][1], NNz[j][2]]
                hdense[NNz[j][2], NNz[j][1]] = -hdense[NNz[j][2], NNz[j][1]]
            end
        end
        Ef = -sum(@. ev[(N >> 1 + 1) : end] * tanh(β₂ * ev[(N >> 1 + 1) : end] )) * 0.5
        ∂Ef∂β = -sum(@. (ev[(N >> 1 + 1) : end] ^ 2) * (sech(β₂ * ev[(N >> 1 + 1) : end]) ^ 2)) * 0.25
        @yield [Ef, ∂Ef∂β]
    end
end

measurementEf (generic function with 1 method)

It is ok to first assume the bin size to be 1.

In [7]:
const β = 10.0
mcstep2 = Iterators.drop(measurementEf(Metropolis, openhoneycomb, β, 4, 4), 10000)
iter2 = Iterators.take(mcstep2, Nsample)
data = collect(iter2)

10000-element Array{Array{Float64,1},1}:
 [-1.67955, -0.0459012]
 [-1.68042, -0.0454951]
 [-1.68061, -0.0461468]
 [-1.69079, -0.0476151]
 [-1.68974, -0.0471794]
 [-1.66923, -0.0439858]
 [-1.67866, -0.0457279]
 [-1.67923, -0.045783] 
 [-1.68206, -0.045995] 
 [-1.68034, -0.0462027]
 [-1.68725, -0.0469354]
 [-1.68973, -0.0472026]
 [-1.68161, -0.0459624]
 ⋮                     
 [-1.67742, -0.0459742]
 [-1.67879, -0.0455914]
 [-1.67047, -0.0442746]
 [-1.68544, -0.047168] 
 [-1.67892, -0.0456496]
 [-1.67125, -0.0438094]
 [-1.67359, -0.0446046]
 [-1.69354, -0.0483625]
 [-1.69757, -0.0491828]
 [-1.67501, -0.0449956]
 [-1.67029, -0.0442268]
 [-1.67533, -0.0453594]

By setting after = before = identity, meanJ and stdmJ work in the same way as mean and stdm, respectively.

In [8]:
m2 = meanJ(identity, identity, data)
s2 = stdmJ(identity, identity, data, m2)
println("Ef = $(m2[1]) ± $(s2[1]), ∂Ef∂β = $(m2[2]) ± $(s2[2])")

Ef = -1.6833785106409933 ± 7.463948517278772e-5, ∂Ef∂β = -0.046464582244579264 ± 1.3244045674343758e-5


This agrees with the standard estimation method for the error bars.

In [9]:
std(data) / sqrt(length(data))

2-element Array{Float64,1}:
 7.463948517357956e-5 
 1.3244045674355459e-5

For such mean values (i.e. op = mean), the jackknife resampling is apparently overkill. However, to estimate the error for the values like the specific heat, the jackknife resampling is very effective.

In [10]:
TTCv(v::Vector{Float64}) = [v[1] ^ 2 - v[2], v[1]]
Cv(meanTTCv::Vector{Float64}) = (β ^ 2) * (meanTTCv[1] - meanTTCv[2] ^ 2)
m3 = meanJ(TTCv, Cv, data)
s3 = stdmJ(TTCv, Cv, data, m3)
println("Cv = $m3 ± $s3")

Cv = 4.652028720043983 ± 0.001331184883892336


## Autocorrelation

The simplest way to estimate autocorrelation is by changing the size of binning and estimating its errors by jackknife resampling.

In [11]:
binning = Iterators.partition(data, 2)
bin2 = collect(map(mean, binning))

5000-element Array{Array{Float64,1},1}:
 [-1.67998, -0.0456982]
 [-1.6857, -0.0468809] 
 [-1.67949, -0.0455826]
 [-1.67895, -0.0457555]
 [-1.6812, -0.0460989] 
 [-1.68849, -0.047069] 
 [-1.68463, -0.0465323]
 [-1.67762, -0.0453116]
 [-1.66897, -0.043752] 
 [-1.68925, -0.0473729]
 [-1.68507, -0.0466188]
 [-1.68129, -0.0461204]
 [-1.68117, -0.04608]  
 ⋮                     
 [-1.67879, -0.0455905]
 [-1.69421, -0.0483458]
 [-1.68543, -0.0467656]
 [-1.67631, -0.0448894]
 [-1.68359, -0.0461298]
 [-1.68278, -0.0461704]
 [-1.6781, -0.0457828] 
 [-1.67796, -0.0457213]
 [-1.67508, -0.0447295]
 [-1.68357, -0.0464836]
 [-1.68629, -0.0470892]
 [-1.67281, -0.0447931]

In [12]:
m4 = meanJ(TTCv, Cv, bin2)
s4 = stdmJ(TTCv, Cv, bin2, m4)
println("Cv = $m4 ± $s4")

Cv = 4.649268265920356 ± 0.0013326136745145157


The fact that the binsize does not affect the expectation value (or the errorbar) too much means that the autocorrelation length is about 1-2. Note that at low temperature the binsize strongly affects the expectation value, which means that the binsize must be taken to be large enough. Here we estimate the autocorrelation length by another method.

In [13]:
using StatsBase
StatsBase.autocor(map(x -> x[1], data))

41-element Array{Float64,1}:
  1.0                  
  0.005785340597454017 
 -0.012701855980102697 
  0.003024499249016354 
  0.007486714860878442 
 -0.011480196186071979 
  0.0049574107906732515
  0.014928839434444563 
  0.02138090308221022  
  0.002366229931924727 
  0.005585001331053724 
  0.005887855285759829 
 -0.025051713680914686 
  ⋮                    
 -0.015333271116125322 
  0.006918569359666667 
 -0.008744459184033335 
 -0.0065239113943678   
 -0.0011707483607242342
  0.004924185140070785 
 -0.018725194266972795 
 -0.006164068938450176 
 -0.02036304331139929  
  0.01569271061380316  
  0.0001249229534574266
 -0.0115216294333615   

Rapid decay in the autocorrelation function means that the autocorrelation length is about 1.