# MCMC3.5: Jackknife Resampling

To estimate the error of observables accurately, it is recommended to use the Jackknife resampling method.

## Binning

In [1]:
using ResumableFunctions
using SparseArrays
using LinearAlgebra
const Jx = 1 / 3 # oppposite sign to Motome's
const Jy = 1 / 3
const Jz = 1 / 3
"""
Metropolis method
"""
function Metropolis(βF::Float64, βFnew::Float64)::Bool
    βF - βFnew > log(rand())
end
"""
Generating a honeycomb lattice with an open boundary condition.
"""
function openhoneycomb(Lx::Int64, Ly::Int64)::Tuple
    N = 2Lx * Ly
    nnx = zip(1 : 2 : (N - 1), 2 : 2 : N)
    nny = Iterators.flatten((zip((1 + 2i) : 2Lx : (2Lx * (Ly - 1)  + 1 + 2i), 2i : 2Lx : (2Lx * (Ly - 1)  + 2i)) for i in 1 : (Lx - 1)))
    nnz = zip(1 : 2 : (N - 1), Iterators.flatten(((2Lx + 2) : 2 : N, 2 : 2 : 2Lx)))
    plaquette = Iterators.flatten(zip((Lx * (i - 1) + 1) : (Lx * (i - 1) + Lx - 1), (Lx * (i - 1) + 2) : (Lx * (i - 1) + Lx)) for i in 1 : Ly)
    N, nnx, nny, nnz, plaquette
end
@resumable function measurementflux(method::Function, lattice::Function, β::Float64, Lx::Int64, Ly::Int64)::Float64
    N, nnx, nny, nnz, plaquette = lattice(Lx, Ly)
    iter = Iterators.flatten((Iterators.product(J, nn) for (J, nn) in [(Jx, nnx), (Jy, nny), (Jz, nnz)]))
    h = spzeros(Complex{Float64}, N, N)
    for (J, nn) in iter
        h[nn[1], nn[2]] = 2.0im * J
        h[nn[2], nn[1]] = -2.0im * J
    end
    NNz = collect(nnz)
    Nz = length(NNz)
    η = ones(Int64, Nz)
    βF = 0.0
    hdense = Array(h)
    plaq = collect(plaquette)
    Np = length(plaq)
    while true
        for i in 1 : Nz
            j = rand(1 : Nz)
            hdense[NNz[j][1], NNz[j][2]] = -hdense[NNz[j][1], NNz[j][2]]
            hdense[NNz[j][2], NNz[j][1]] = -hdense[NNz[j][2], NNz[j][1]]
            ev = eigvals(Hermitian(hdense))
            positiveev = Iterators.drop(ev, N >> 1)
            βFnew = -sum((log(2.0 * cosh(β * ϵ / 2.0)) for ϵ in positiveev))
            if method(βF, βFnew)
                η[j] = -η[j]
                βF = βFnew
            else
                hdense[NNz[j][1], NNz[j][2]] = -hdense[NNz[j][1], NNz[j][2]]
                hdense[NNz[j][2], NNz[j][1]] = -hdense[NNz[j][2], NNz[j][1]]
            end
        end
        @yield sum((η[k] * η[l] for (k, l) in plaq)) / Np
    end
end

measurementflux (generic function with 1 method)

One reason of the error bar is the discreteness of the returned value `flux`.

In [2]:
mcstep = Iterators.drop(measurementflux(Metropolis, openhoneycomb, 25.0, 4, 4), 10000)
foreach(println, Iterators.take(mcstep, 10))

0.5
-0.16666666666666666
0.6666666666666666
0.0
0.6666666666666666
0.6666666666666666
0.16666666666666666
0.3333333333333333
0.6666666666666666
0.8333333333333334


We have to flatten these quantized values by binning.

In [3]:
using Statistics
Nsample = 10000
Nbin = 10
Nbinsize = Nsample ÷ Nbin
iter = Iterators.partition(Iterators.take(mcstep, Nsample), Nbinsize)
bin = collect(map(mean, iter))

10-element Array{Float64,1}:
 0.2941666666666667 
 0.27316666666666667
 0.2955             
 0.2854999999999999 
 0.3018333333333334 
 0.307              
 0.2776666666666667 
 0.29766666666666663
 0.30616666666666664
 0.2734999999999999 

Now it works!

In [4]:
m = mean(bin)
s = stdm(bin, m) / sqrt(length(bin))
println("$m ± $s")

0.2912166666666666 ± 0.0040959969919452735


`Nbinsize` has to be determined based on the autocorrelation. In order to reduce `Nbin` to get a more acculate result at low temperature, we need to implement some global updating algorithm.

## Delete-1 jackknife resampling

Delete-1 jackknife resampling is simply implemented in https://github.com/ararslan/Jackknife.jl. However, the function is limited, so I will newly define functions for the jackknife resampling.

In [5]:
"""
Delete one sample from given samples. before and after are applied before and after taking an expectation value.
"""
function leaveoneout(before::Function, after::Function, v::AbstractVector)
    ind = eachindex(v)
    map(i -> after(mean(map(before, view(v, filter(!isequal(i), ind))))), ind)
end
"""
Calculate an expectation value by jackknife.
"""
meanJ(b::Function, a::Function, v::AbstractVector) = mean(leaveoneout(b, a, v))
"""
Calculate an error bar with given mean by jackknife.
"""
stdmJ(b::Function, a::Function, v::AbstractVector, m) = stdm(leaveoneout(b, a, v), m, corrected = false) * sqrt(length(v) - 1)
"""
Calculate an error bar by jackknife.
"""
stdJ(b::Function, a::Function, v::AbstractVector) = stdmJ(b, a, v, meanJ(b, a, v))

stdJ

The functions are based on Statistics.jl and Jackknife.jl, so please see their reference to know how it works. I again use `measurementEf` as a demonstration.

In [6]:
@resumable function measurementEf(method::Function, lattice::Function, β::Float64, Lx::Int64, Ly::Int64)::Vector{Float64}
    N, nnx, nny, nnz, plaquette = lattice(Lx, Ly)
    iter = Iterators.flatten((Iterators.product(J, nn) for (J, nn) in [(Jx, nnx), (Jy, nny), (Jz, nnz)]))
    h = spzeros(Complex{Float64}, N, N)
    for (J, nn) in iter
        h[nn[1], nn[2]] = 2.0im * J
        h[nn[2], nn[1]] = -2.0im * J
    end
    NNz = collect(nnz)
    Nz = length(NNz)
    η = ones(Int64, Nz)
    βF = 0.0
    β₂ = β * 0.5
    hdense = Array(h)
    ev = zeros(Float64, N)
    while true
        for i in 1 : Nz
            j = rand(1 : Nz)
            hdense[NNz[j][1], NNz[j][2]] = -hdense[NNz[j][1], NNz[j][2]]
            hdense[NNz[j][2], NNz[j][1]] = -hdense[NNz[j][2], NNz[j][1]]
            evnew = eigvals(Hermitian(hdense))
            βFnew = -sum(@. log(exp(β₂ * evnew[(N >> 1 + 1) : end]) + exp(-β₂ * evnew[(N >> 1 + 1) : end])))
            if method(βF, βFnew)
                η[j] = -η[j]
                βF = βFnew
                ev .= evnew
            else
                hdense[NNz[j][1], NNz[j][2]] = -hdense[NNz[j][1], NNz[j][2]]
                hdense[NNz[j][2], NNz[j][1]] = -hdense[NNz[j][2], NNz[j][1]]
            end
        end
        Ef = -sum(@. ev[(N >> 1 + 1) : end] * tanh(β₂ * ev[(N >> 1 + 1) : end] )) * 0.5
        ∂Ef∂β = -sum(@. (ev[(N >> 1 + 1) : end] ^ 2) * (sech(β₂ * ev[(N >> 1 + 1) : end]) ^ 2)) * 0.25
        @yield [Ef, ∂Ef∂β]
    end
end

measurementEf (generic function with 1 method)

It is ok to first assume the bin size to be 1.

In [7]:
const β = 10.0
mcstep2 = Iterators.drop(measurementEf(Metropolis, openhoneycomb, β, 4, 4), 10000)
iter2 = Iterators.take(mcstep2, Nsample)
data = collect(iter2)

10000-element Array{Array{Float64,1},1}:
 [-7.65199, -0.00681468]
 [-7.71633, -0.0109046] 
 [-7.6112, -0.00884734] 
 [-7.62375, -0.0140109] 
 [-7.66819, -0.00665184]
 [-7.72563, -0.0118945] 
 [-7.69165, -0.0119024] 
 [-7.72551, -0.0172385] 
 [-7.7481, -0.0151541]  
 [-7.61735, -0.00808208]
 [-7.72379, -0.0172919] 
 [-7.64852, -0.012708]  
 [-7.76376, -0.0145314] 
 ⋮                      
 [-7.65265, -0.00731072]
 [-7.63267, -0.0111899] 
 [-7.70305, -0.0115475] 
 [-7.74876, -0.010281]  
 [-7.59937, -0.00871572]
 [-7.72178, -0.011896]  
 [-7.66303, -0.00911828]
 [-7.67054, -0.00882789]
 [-7.75, -0.0154435]    
 [-7.70757, -0.0053772] 
 [-7.70305, -0.0115475] 
 [-7.70387, -0.0108522] 

By setting `after` = `before` = `identity`, `meanJ` and `stdmJ `work in the same way as `mean` and `stdm`, respectively.

In [8]:
m2 = meanJ(identity, identity, data)
s2 = stdmJ(identity, identity, data, m2)
println("Ef = $(m2[1]) ± $(s2[1]), ∂Ef∂β = $(m2[2]) ± $(s2[2])")

Ef = -7.69119858657557 ± 0.000666412231282982, ∂Ef∂β = -0.010792833024182637 ± 2.615725444754717e-5


This agrees with the standard estimation method for the error bars.

In [9]:
std(data) / sqrt(length(data))

2-element Array{Float64,1}:
 0.0006664122312814925
 2.615725444754927e-5 

For such mean values, the jackknife resampling is apparently overkill. However, to estimate the error for the values like the specific heat, the jackknife resampling is very effective.

In [10]:
"""
Function applied before mean to calculate heat capacity.
"""
TTCv(v::Vector{Float64}) = [v[1] ^ 2 - v[2], v[1]]
"""
Function applied after mean to calculate heat capacity.
"""
function Cv(β::Float64)::Function
    β² = β ^ 2
    meanTTCv::Vector{Float64} -> β² * (meanTTCv[1] - meanTTCv[2] ^ 2)
end
m3 = meanJ(TTCv, Cv(β), data)
s3 = stdmJ(TTCv, Cv(β), data, m3)
println("Cv = $m3 ± $s3")

Cv = 1.5233441494605942 ± 0.006114923879971831


## Autocorrelation

The simplest way to estimate autocorrelation is by changing the size of binning and estimating its errors by jackknife resampling.

In [11]:
binning = Iterators.partition(data, 2)
bin2 = mean.(TTCv, binning)

5000-element Array{Array{Float64,1},1}:
 [59.0562, -7.68416]
 [58.0374, -7.61747]
 [59.2525, -7.69691]
 [59.4371, -7.70858]
 [59.0402, -7.68273]
 [59.0934, -7.68615]
 [60.3524, -7.76779]
 [58.6589, -7.65778]
 [58.7264, -7.66285]
 [60.1119, -7.75262]
 [60.1304, -7.75363]
 [58.4184, -7.64244]
 [59.6763, -7.72421]
 ⋮                  
 [59.3942, -7.70598]
 [57.9105, -7.60882]
 [59.6925, -7.72535]
 [59.5882, -7.71856]
 [59.8915, -7.73805]
 [59.9906, -7.74449]
 [58.4196, -7.64266]
 [59.7011, -7.72591]
 [58.6985, -7.66058]
 [58.7886, -7.66678]
 [59.7449, -7.72878]
 [59.3545, -7.70346]

In [12]:
m4 = meanJ(identity, Cv(β), bin2)
s4 = stdmJ(identity, Cv(β), bin2, m4)
println("Cv = $m4 ± $s4")

Cv = 1.523344144830859 ± 0.006104176605477246


The fact that the binsize does not affect the expectation value (or the errorbar) too much means that the autocorrelation length is about 1 step. Note that at low temperature the binsize strongly affects the expectation value, which means that the binsize must be taken to be large enough. Here I estimate the autocorrelation length by another method.

In [13]:
using StatsBase
StatsBase.autocor(first.(data))

41-element Array{Float64,1}:
  1.0                  
  0.025491435519586603 
 -0.006579779196292195 
  0.01972258574897958  
  0.009057397719596889 
  0.012365335991191074 
 -0.002196061267901087 
  0.0007310655723478059
 -0.026119823427892427 
  0.0006152457871164597
  0.013781980507166112 
  0.008433945828489973 
  0.01660415867547176  
  ⋮                    
 -0.004763807913707725 
  0.015589687368502034 
  0.010084632767619416 
 -0.006740418152846602 
  0.012665278597385149 
 -0.004961264798734328 
  0.014650798162212458 
  0.003923249971725114 
 -0.008457426685980993 
  0.01704902845265379  
  0.01526012826675791  
 -0.014536009967628142 

Rapid decay in the autocorrelation function means that the autocorrelation length is less than 1.

## Bootstrap method

~ under construction ~

**Exercise**: implement a Bootstrap method.