# HMC in ND with BFGS or SR1 autotuning of the mass matrix

This notebook implements an autotuning HMC for an N-dimensional distribution based on BFGS or SR1 updating of the inverse Hessian. The SR1 updating is not possible in the factorised versions because it is not positive definite.

## 0. Import packages

In [None]:
import time
import numpy as np
import matplotlib.pyplot as plt
from importlib import reload

import testfunctions
import samplestatistics

plt.rcParams["font.family"] = "Times"
plt.rcParams.update({'font.size': 50})
plt.rcParams['xtick.major.pad']='12'
plt.rcParams['ytick.major.pad']='12'

## 1. Input

We first define several input parameters, including the model space dimension, the initial inverse mass matrix $\mathbf{M}^{-1}$, the total number of samples, the number of leapfrog timesteps, and the length of the timestep.

In [None]:
import input_parameters
reload(input_parameters)
test_function,dim,N,Nit,dt,m0,Minv,autotune,ell,update_interval,preco,S0_min,plot_interval,dimension1,dimension2,m1_min,m1_max,m2_min,m2_max=input_parameters.input_parameters()

## 2. Classes for quasi-Newton autotuning

For later convenience, we introduce classes that perform quasi-Newton updating of the inverse Hessian $\mathbf{H}^{-1}$, which serves as inverse mass matrix, $\mathbf{M}^{-1}$. The classes performs an update of $\mathbf{M}^{-1}$ at each new sample and then computes the Cholesky decomposition of the update. The class is written for arbitrary dimension, but it is clear that the brute-force Cholesky decomposition will only be feasible for low dimensions. (For higher dimensions, the Cholesky decomposition will still yield some output, but it will not be very accurate and useful.) 

**Call to caution**: There is an experimental component in this class. In principle, BFGS updates can only be made when $\mathbf{s}_k^T\mathbf{y}>0$. However, when this quantity is very small, the resulting Hessian approximation may still be close to singular. Empirically, it is better for stability to choose $\mathbf{s}_k^T\mathbf{y}>\gamma$, with some tuning parameter $\gamma>0$. For many examples, $\gamma=2$ works very well.

## 2.1. BFGS updating

In [None]:
class bfgs:
    
    def __init__(self,dim,Minv,m,g):
        """
        Initialise the BFGS iteration.
        
        :param dim: number of model-space dimensions
        :param Minv: initial inverse mass matrix
        :param m: current model vector
        :param g: current gradient
        
        The matrix Minv plays the role of the inverse mass matrix, which ideally is the inverse Hessian, i.e., the covariance matrix.
        """
        
        self.dim=dim
        self.m=m
        self.g=g
        
        # Initial mass matrix.
        self.Minv=Minv
        
        # Initial factorisation.
        LT=np.linalg.cholesky(self.Minv).transpose()
        self.LTinv=np.linalg.inv(LT)
        
        
    def update(self,m,g):
        """
        Update BFGS matrix and perform Cholesky decomposition.
        
        :param m: current model vector
        :param g: current gradient
        """
        
        # Compute differences and update vectors.
        s=m-self.m
        y=g-self.g
        
        # BFGS check.
        check=np.dot(s,y)
        print(check)

        if check>2.0:
        
            self.m=m
            self.g=g
        
            # Compute update of BFGS matrix.
            rho=1.0/np.dot(s,y)
            I=np.identity(self.dim)
            sy=rho*np.tensordot(s,y,axes=0)
            ss=rho*np.tensordot(s,s,axes=0)
            self.Minv=np.matmul(np.matmul((I-sy),self.Minv),(I-sy.transpose()))+ss
        
            # Compute Cholesky decomposition.
            LT=np.linalg.cholesky(self.Minv).transpose()
            self.LTinv=np.linalg.inv(LT)
            
        else: 
            rhoinv=np.dot(s,y)
            print('BFGS check failed (1/rho=%f)' % rhoinv)

## 2.2. SR1 updating

In [None]:
class sr1:
    
    def __init__(self,dim,Minv,m,g):
        """
        Initialise the SR1 iteration.
        
        :param dim: number of model-space dimensions
        :param Minv: initial mass matrix inverse 
        :param m: current model vector
        :param g: current gradient
        
        The matrix Minv plays the role of the inverse mass matrix, which ideally is the inverse Hessian, i.e., the covariance matrix.
        """
        
        self.dim=dim
        self.m=m
        self.g=g
        
        # Initial mass matrix.
        self.Minv=Minv
        
        # Initial factorisation.
        LT=np.linalg.cholesky(self.Minv).transpose()
        self.LTinv=np.linalg.inv(LT)
        
    def update(self,m,g):
        """
        Update SR1 matrix and perform Cholesky decomposition.
        
        :param m: current model vector
        :param g: current gradient
        """
        
        # Compute differences and update vectors.
        s=m-self.m
        y=g-self.g
        
        self.m=m
        self.g=g
        
        # Compute update of SR1 matrix.
        Ay=np.dot(self.Minv,y)
        x=s-Ay
        n=np.dot(x,y)
        if np.abs(n)>0.01*np.linalg.norm(x)*np.linalg.norm(y):
            self.Minv=self.Minv+np.tensordot(x,x,axes=0)/n
                    
            # Compute Cholesky decomposition.
            LT=np.linalg.cholesky(self.Minv).transpose()
            self.LTinv=np.linalg.inv(LT)
            
        else: print('check failed')

## 4. Leapfrog integrator

For clarity, we define the leap-frog integrator as a separate function.

In [None]:
def leapfrog(m,p,Nt,dt,Minv,fct,plot=False):
    
    # Plot probability density in the background.
    if plot:
        fct.plotU(dim,dimension1,dimension2,m1_min,m1_max,m2_min,m2_max)
        plt.plot(m[dimension1],m[dimension2],'bo',MarkerSize=15)
    
    # Evaluate initial gradient.
    J=fct.J(m)
    
    # Determine randomised integration length.
    Nti=np.int(Nt*(1.0-0.5*np.random.rand()))
    
    # Leapfrog integration.
    for k in range(Nti):
        
        if plot: m_old=m.copy()
        
        p=p-0.5*dt*J
        m=m+dt*Minv.dot(p)
        J=fct.J(m)
        p=p-0.5*dt*J
        
        # Plot trajectory segment.
        if plot: 
            if k==0: print('number of time steps: %d' % Nti)
            plt.plot([m_old[dimension1],m[dimension1]],[m_old[dimension2],m[dimension2]],'r',Linewidth=3)
            plt.plot(m[dimension1],m[dimension2],'kx')
        
    return m, p

## 6. HMC initialisations

Before running the actual HMC sampler, we perform several initialisations. This includes the test function class, the first random model $\mathbf{m}$, and the corresponding gradient of the potential energy $\mathbf{g}=\nabla U$. With this, we can initialise the BFGS class, which takes $\mathbf{m}$ and $\mathbf{g}$ as input.

In [None]:
# Initialisation. =============================================================

# Test function class.
fct=testfunctions.f(dim,test_function)

# Number of accepted models.
accept=0

# Initial model.
m=m0

# Posterior statistics.
s=samplestatistics.stats(dimension1,dimension2,N)
s.get(m,0.0,0)

# Initialise BFGS matrix.
g=fct.J(m)
if autotune=='BFGS':
    M=bfgs(dim,Minv,m,g)
else:
    M=sr1(dim,Minv,m,g)
    
m11=Minv[dimension1,dimension1]*np.ones(N)
m22=Minv[dimension2,dimension2]*np.ones(N)

In [None]:
fct.plotU(dim,dimension1,dimension2,m1_min,m1_max,m2_min,m2_max)

## 7. Run HMC

We finally run the HMC sampler. In each iteration, we first produce radom momenta $\mathbf{p}$ from a normal distribution with covariance chosen to be the BFGS-updated inverse mass matrix $\mathbf{M}^{-1}$, which is defined to be the inverse Hessian $\mathbf{H}^{-1}$ of the potential energy $U$. 

Using the mass matrix, we compute energies and run a leapfrog iteration to solve Hamilton's equations. Following this, we compute the energies of the proposed model and evaluate the modified Metropolis rule (in logarithimic form, to avoid over- or under-flow).

In [None]:
accept=0
start=time.time()

# Sampling. ===================================================================
for it in range(N-1):
    
    # Randomly choose momentum.
    p=np.random.randn(dim)
    p=M.LTinv.dot(p)
    
    # Evaluate energies.
    U=fct.U(m)
    K=0.5*np.dot(p,np.dot(M.Minv,p))
    H=U+K
    
    # Check if models and trajectories should be plotted.
    if (not it % plot_interval) and it>0: 
        plot=True
        print('iteration: %d' % it)
    else:
        plot=False
    
    # Run leapfrog iteration.
    m_new,p_new=leapfrog(m,p,Nit,dt,M.Minv,fct,plot)
    if plot:
        filename='OUTPUT/trajectory_'+str(it)+'.png'
        plt.savefig(filename, bbox_inches='tight', format='png')
        plt.show()
    
    # Plot proposed models.
    if plot:
        plt.subplots(1, figsize=(30,10))
        plt.plot(m_new)
        plt.xlabel('model parameter index')
        plt.show()
    
    # Evaluate new energies.
    U_new=fct.U(m_new)
    K_new=0.5*np.dot(p_new,M.Minv.dot(p_new))
    H_new=U_new+K_new
    
    # Evaluate Metropolis rule in logarithmic form.
    alpha=np.minimum(0.0,H-H_new)
    if alpha>=np.log(np.random.rand(1)):
        # Update model.
        m=m_new
        accept+=1
        # Update BFGS matrix.
        if (autotune=='BFGS' or autotune=='SR1'):
            g=fct.J(m)
            M.update(m,g)
    
    # Accumulate on-the-fly statistics
    s.get(m,0.0,it+1)
    m11[it+1]=M.Minv[dimension1,dimension1]
    m22[it+1]=M.Minv[dimension2,dimension2]

stop=time.time()
print('acceptance rate: %f (%d of %d samples)' % (np.float(accept)/np.float(N),accept,N))
print('elapsed time: %f s' % (stop-start))

## 8. Analyse results

### 8.1. Sample statistics collected on the fly

In [None]:
s.display()

In [None]:
u,v=np.linalg.eig(M.Minv)

In [None]:
plt.subplots(1, figsize=(20,20))
plt.semilogy(np.abs(u))
plt.show()

### 8.2. Analysis of the mass matrix

In [None]:
plt.subplots(1, figsize=(20,20))
plt.pcolor(Minv,cmap='Blues')
plt.title('initial inverse mass matrix',pad=20)
plt.colorbar()
plt.show()

plt.subplots(1, figsize=(20,20))
plt.pcolor(M.Minv,cmap='Blues')
plt.title('final inverse mass matrix',pad=20)
plt.colorbar()
plt.show()

plt.subplots(1, figsize=(20,10))
plt.plot(np.diag(M.Minv),'k',linewidth=4)
plt.plot(np.diag(Minv),'r',linewidth=4)
plt.xlabel('index')
plt.title('diagonal of inverse mass matrix (final=black, initial=red)')
plt.grid()
plt.show()

plt.subplots(1, figsize=(20,10))
plt.plot(m11,'k',linewidth=4)
plt.plot(m22,'r',linewidth=4)
plt.xlabel('iteration')
plt.title('diagonal elements (black=parameter1, red=parameter2)')
plt.grid()
plt.show()