# Recovering Gauss coefficients by (local) Metropolis-Hastings sampling

In this notebook, we compute the posterior distribution for the geomagnetic problem using Metropolis-Hastings sampling.

# 0. Python packages and figure embellishments

In [None]:
# Some Python packages.
import magnetic as magnetic
import random
import numpy as np
import matplotlib.pyplot as plt
import time

# Set some parameters to make plots nicer.
plt.rcParams["font.family"] = "serif"
plt.rcParams.update({'font.size': 25})

# Set specific random seed to make simulations comparable.
np.random.seed(0)

# 1. Input parameters

Observation points at the surface of the Earth.

In [None]:
# Observation points.
theta_obs=np.pi*np.random.rand(20)
phi_obs=2.0*np.pi*np.random.rand(20)

Gauss coefficients to include (to compute artificial and trial data).

In [None]:
# Maximum degree.
ell_max=2

Metropolis-Hastings parameters.

In [None]:
# Proposal radius, step length.
sigma=200.0

# Number of Metropolis-Hastings samples.
N=10000

# 2. Initialisations

Read the Gauss coefficients from the IGRF13 model. These will be used as ground-thruth parameters that we try to estimate.

In [None]:
# Read Gauss coefficients from IGRF13.
g_igrf13,h_igrf13=magnetic.read_coefficients(verbose=False)

To accelerate the evaluation of the forward model, we precompute the Schmidt quasi-normalised associated Legendre functions.

In [None]:
Pnmi=magnetic.Pnmi(theta_obs,ell_max)

Compute artificial observations for a chosen set of Gauss coefficients and synthesise the magnetic field from them.

In [None]:
# Compute the magnetic field values for the observation points.
d_obs=magnetic.B(phi_obs,theta_obs,g_igrf13,h_igrf13,Pnmi,ell_max=ell_max)

# Compute magnetic field for longitude and colatitude arrays.
theta=np.arange(0.0,np.pi,0.05)
phi=np.arange(0.0,2.0*np.pi,0.05)

d_plot=magnetic.B_field(phi,theta,g_igrf13,h_igrf13,ell_max=ell_max)

Plot the ground-truth magnetic field and the observation points.

In [None]:
# Plot radial component of the magnetic field.
colat,lon=np.meshgrid(phi,theta)

plt.subplots(1, figsize=(22,10))
plt.gca().invert_yaxis()
plt.pcolor(180.0*colat/np.pi,180.0*lon/np.pi,d_plot, cmap=plt.cm.get_cmap('Greys'))
plt.colorbar()
plt.contour(180.0*colat/np.pi,180.0*lon/np.pi,d_plot, colors='k')
plt.plot(180.0*phi_obs/np.pi,180.0*theta_obs/np.pi,'ro',markersize=10)
plt.grid()
plt.xlabel('longitude [°]',labelpad=15)
plt.ylabel('colatitude [°]',labelpad=15)
plt.title('magnetic field, radial component',pad=20)
plt.show()

# 3. Sampling

Before actually sampling the posterior distribution, we need to choose an initial location of the random walker. This can be done entirely randomly or already with some prior idea about useful parameters in mind. The performance of the sampler will depend on how well the initial position is chosen. (It will mostly affect the length of the burn-in phase.) 

In [None]:
# Initial values.
g=np.zeros(np.shape(g_igrf13))
h=np.zeros(np.shape(h_igrf13))

for i in range(1,ell_max+1):
    for j in range(0,i+1):
        
        # Totally random selection of the initial model parameters.
        #g[i,j]=5000.0*np.random.randn()
        #h[i,j]=5000.0*np.random.randn()
        
        # Selection of initial model parameters near the ground-truth values.
        g[i,j]=g_igrf13[i,j]+sigma*np.random.randn()
        h[i,j]=h_igrf13[i,j]+sigma*np.random.randn()

# Evaluate initial probability density.
rho=magnetic.log_posterior(d_obs,phi_obs,theta_obs,g,h,Pnmi,ell_max)

To avoid excessive storage requirements, we will only store all samples for two of the model parameters. The corresponding vectors and the number of accepted moves are initialised below.

In [None]:
# Initialise number of accepted models.
accept=0

# Initialise arrays for the collection of samples.
s1=[]
s2=[]

Now we perform the actual random walk.

In [None]:
# Initialise proposal vectors.
g_prop=np.zeros(np.shape(g_igrf13))
h_prop=np.zeros(np.shape(h_igrf13))

t1=time.time()

# Iterate.
for it in range(0,N):
    
    for n in range(1,ell_max+1):
        for m in range(0,n+1):
            g_prop[n,m]=g[n,m]+sigma*np.random.randn()
            h_prop[n,m]=h[n,m]+sigma*np.random.randn()
            
    # Evaluate probability of the proposal.
    rho_prop=magnetic.log_posterior(d_obs,phi_obs,theta_obs,g_prop,h_prop,Pnmi,ell_max=ell_max)
    
    # Compute Metropolis ratio.
    r=np.exp(rho_prop-rho)
    
    # Evaluate Metropolis rule
    if r>np.random.rand():
        # Make move to proposed position.
        g=g_prop.copy()
        h=h_prop.copy()
        rho=rho_prop
        # Increase number of accepted models.
        accept+=1
    
    # Collect the samples.
    # Here you may change the model parameters that are being considered.
    s1.append(g[2,1])
    s2.append(g[1,1])
        
t2=time.time()
print(t2-t1)

# 4. Output and analysis

Following the sampling, we plot the results and perform some analyses. We start with the acceptance rate.

In [None]:
# Acceptance rate.
print('acceptance rate: %f ' % (accept/N))

Trace plots of the two selected model parameters. They should look like a hairy caterpillar.

In [None]:
# Trace plots.
plt.figure(figsize=(15,8))
plt.plot(s1,'k',linewidth=2)
plt.xlim([0,N])
plt.grid()
plt.xlabel('samples',labelpad=15)
plt.title('trace plot parameter 1',pad=15)
plt.savefig('MH_traceplot_1.pdf')
plt.show()

plt.figure(figsize=(15,8))
plt.plot(s2,'k',linewidth=2)
plt.xlim([0,N])
plt.grid()
plt.xlabel('samples',labelpad=15)
plt.title('trace plot parameter 2',pad=15)
plt.show()

Auto-correlation functions and derived from them, the effective sample size.

In [None]:
# Auto-correlations.
cc1=np.correlate(s1-np.mean(s1),s1-np.mean(s1),'full')/np.sum((s1-np.mean(s1))**2)
cc1=cc1[N-1:]

cc2=np.correlate(s2-np.mean(s2),s2-np.mean(s2),'full')/np.sum((s2-np.mean(s2))**2)
cc2=cc2[N-1:]

# Estimate of the effective sample size (Gelman et al., 2013).
Neff1=0.0
for i in range(N-1):
    if (cc1[i]+cc1[i+1]>0.0):
        Neff1+=cc1[i]
        
Neff1=N/(1.0+2.0*Neff1)
print('effective sample size (parameter 1): %f' % Neff1)

Neff2=0.0
for i in range(N-1):
    if (cc2[i]+cc2[i+1]>0.0):
        Neff2+=cc2[i]
        
Neff2=N/(1.0+2.0*Neff2)
print('effective sample size (parameter 2): %f' % Neff2)

# Plot autocorrelation function.
plt.figure(figsize=(15,8))
plt.plot(cc1[0:N],'k',linewidth=2)
plt.xlabel('samples',labelpad=15)
plt.xlim([0,N])
plt.title('auto-correlation (parameter 1)',pad=15)
plt.grid()
plt.show()

plt.figure(figsize=(15,8))
plt.plot(cc2[0:N],'k',linewidth=2)
plt.xlabel('samples',labelpad=15)
plt.xlim([0,N])
plt.title('auto-correlation (parameter 2)',pad=15)
plt.grid()
plt.show()

1-D marginals of the two selected model parameters.

In [None]:
plt.figure(figsize=(10,10))
n, bins, patches = plt.hist(s1, 20, density=True, facecolor='k', alpha=1.0)
plt.xlabel('parameter 1',labelpad=15)
plt.title('1-D marginal (parameter 1)',pad=15)
plt.grid()
plt.xlim([0.0,6000.0])
plt.show()

plt.figure(figsize=(10,10))
n, bins, patches = plt.hist(s2, 20, density=True, facecolor='k', alpha=1.0)
plt.xlabel('parameter 2',labelpad=15)
plt.title('1-D marginal (parameter 2)',pad=15)
plt.grid()
plt.show()

2-D marginal of the selected model parameters.

In [None]:
plt.figure(figsize=(10,10))
plt.hist2d(s1, s2, bins=20, density=True, cmap='Greys')
plt.xlabel('parameter 1',labelpad=15)
plt.ylabel('parameter 2',labelpad=15)
plt.title('1-D marginal (parameter 1)',pad=15)
plt.xlim([1500.0,4500.0])
plt.ylim([-3000.0,0.0])
plt.grid()
plt.show()

# 5. Exercises

**Exercise 1**: Estimate by trial and error a search radius $\sigma$ that maximises the effective sample size. Over how many samples is the sample chain correlated (just roughly)?

**Exercise 2**: With this nearly optimal choice of $\sigma$, investigate the appearance of the posterior marginals as a function of the number of samples. How many samples would you recommend to obtain a 'useful' result.

**Exercise 3**: Keeping $\sigma$ and the total number of samples as above, (**a**) reduce the number of observations to 10 and (**b**) increase it to 50. How does this affect the quality of your inference? How would you measure quality?

**Exercise 4**: In the examples above, the initial model parameter values, i.e., the starting position of the random walker, is chosen somewhat optimistically near the ground-thruth values. Investigate the more realistic case where the initial values are chosen completely randomly. What is the effect on the length of the burn-in phase? Modify the code such that the samples of the burn-in phase are ignored in the calculation of the posterior marginals.