# Bayesian analysis of the empirical saturation point (MC version)

This notebook provides an independent implementation of our saturation analysis using brute-force Monte Carlo sampling. It can be used to check and generalize our analysis using conjugate priors.


In [1]:
import emcee as mc
import numpy as np
import pandas as pd
import numba as nb
from scipy.optimize import minimize
from scipy.stats import invwishart, multivariate_normal, multivariate_t

To test the analysis, let's create some dummy data from the prior we have used before:

In [2]:
prior_params = {"mu": np.array([0.16, -15.9]),
                "Lambda": np.array([[0.01**2, 0], [0, 0.32**2]]),
                "kappa": 1, "nu": 3}
np.random.seed(42)
tmp = multivariate_normal.rvs(mean=prior_params["mu"], cov=prior_params["Lambda"], size=1000)
data = pd.DataFrame(data={"n0": tmp[:, 0], "E0": tmp[:, 1]})

## Bayes' theorem

Our analysis is based on Bayes' theorem. We need to define the prior, posterior, and likelihood. We use the (natural) logarithms of these quantities. The variables we will do the inference for are agglomerated in the parameter vector $\theta = (\mu_0, \mu_1; \Sigma_{0,0}, \Sigma_{0,1}, \Sigma_{1,1})$. To test the implementation, we define the $\theta$ associated with the prior:

In [3]:
theta_prior = (prior_params["mu"][0], prior_params["mu"][1],
           prior_params["Lambda"][0, 0], prior_params["Lambda"][0, 1],
           prior_params["Lambda"][1, 1])

In [56]:
@nb.jit(nopython=True, fastmath=True)
def validate_matrix(mat, raise_error=False, atol_sym=1e-08, atol_eval=1e-06):
    """
    Checks that the 2x2 matrix `mat` is symmetric and positive definite
    :param mat: 2x2 matrix
    :param raise_error: raise error if not symmetric and positive definite
    :param atol_sym: absolute tolerance used for comparison (symmetry)
    :param atol_eval: absolute tolerance used for comparison (positiveness)
    :return: returns boolean result of the validation
    """
    stat_sym = np.abs(mat[0,1]-mat[1,0]) < atol_sym
    #v, _ = np.linalg.eig(mat) # only `eig` supported by numba
    eigvals = 0.5 * (mat[0,0] - np.array([1., -1]) * np.sqrt(4.* mat[0,1]*mat[1,0] + (mat[0,0] - mat[1,1])**2) + mat[1,1])
    stat_pos_def = (eigvals > atol_eval).all()
    stat = stat_sym and stat_pos_def
    if not stat and raise_error:
        raise ValueError("Non-symmetric 2x2 matrix encountered.")
    else:
        return stat

In [57]:
@nb.jit(nopython=True)
def parse_theta_vec(theta):
    """
    converts the array/vector `theta` into a mean value and (symmetric) covariance matrix
    :param theta: contains the components of the mean vector and the non-redundant components of the (symmetric) covariance matrix
    :return: mean vector, full covariance matrix
    """
    mu = np.array([theta[0], theta[1]])
    cov = np.array([[theta[2], theta[3]], [theta[3], theta[4]]])
    return mu, cov

In [58]:
def log_prior(theta):
    """
    computes the log pdf of our chosen prior distribution, the normal-inverse-Wishart distribution
    Note: we can't use here `numba` straightforwardly because it does not support the distribution functions; `numba-scipy` and `numba-stats` address this issue but the inverse Wishart distribution has not been implemented in these libraries
    :param theta: parameter vector
    :return: log pdf associated with the prior
    """
    mu, sigma = parse_theta_vec(theta)
    if not validate_matrix(sigma):
        return -np.inf
    log_norm = multivariate_normal.logpdf(x=mu, mean=prior_params["mu"], cov=sigma/prior_params["kappa"])
    log_invwishart = invwishart.logpdf(x=sigma, df=prior_params["nu"], scale=prior_params["Lambda"])
    return log_norm + log_invwishart

In [59]:
@nb.jit(nopython=True, fastmath=True)
def log_likelihood(theta, sat_data):
    """
    computes the logarithm of the likelihood function, the standard Chi-squared likelihood function
    :param theta: parameter vector
    :param sat_data: array (number of point, 2) containing the datasets for the saturation density and energy
    :return: logarithm of the likelihood function
    """
    mu, sigma = parse_theta_vec(theta)
    # exclude non-positive-definite covariance matrices
    if not validate_matrix(sigma):
        #print('covariance not valid')
        return -np.inf
    chi2 = 0.
    sigma_inv = np.linalg.inv(sigma)
    for row in sat_data:
        diff = (row - mu)
        chi2 += np.dot(diff, sigma_inv @ diff)
    return -0.5 * (chi2 + len(sat_data) * np.log(np.abs(np.linalg.det(sigma))))

In [50]:
def log_posterior(theta, sat_data):
    """
    computes the log posterior, which is in this particular case also the normal-inverse-Wishart distribution (conjugagy).
    :param theta: parameter vector
    :param sat_data: pandas dataframe containing the datasets for the saturation density and energy
    :return: log pdf associated with the posterior
    """
    lp = log_prior(theta)
    if not np.isfinite(lp):
        return -np.inf
    return lp + log_likelihood(theta, sat_data)

### Testing our implementation

## Bayesian inference

### Determine initial walker position using MLE or MAP

We need to specify the initial position of the random walkers for Monte Carlo sampling. We use here either the MAP or MLE.

In [27]:
def get_initial_walker_pos(function=log_posterior, method="Nelder-Mead", return_theta=False, **kwargs):
    """
    finds the maximum of the argument `function`; to be used to determine either the maximum a posterior (MAP) or maximum likelihood estimation (MLE)
    :param function: either likelihood function or posterior or prior
    :param method: method used for optimization, see scipy documentation
    :param kwargs: options passed to the optimizer, see scipy documentation
    :param return_theta: boolean to request solution vector or (mu, cov)
    :return: mean value and covariance associated with the found maximum or found maximum (if return_theta)
    """
    nll = lambda *args: -function(*args)
    args = () if function == log_prior else (data.to_numpy(),)
    sol = minimize(nll, theta_prior, args=args, method=method, options=kwargs)
    mu, cov = parse_theta_vec(sol.x)
    if not validate_matrix(cov):
        raise ValueError("Covariance matrix not positive definite")
    return sol.x if return_theta else (mu, cov)

In [28]:
map_estimate = get_initial_walker_pos(function=log_posterior, method="Nelder-Mead", maxiter=10000000)
mle_estimate = get_initial_walker_pos(function=log_likelihood, method="SLSQP", maxiter=10000000)

We expect both the MAP and MLE to be close to the (input) prior, so let's compute the difference of the respective mean values and covariance matrices:

In [29]:
for lbl, est in (("map", map_estimate), ("mle", mle_estimate)):
    print(lbl)
    mu, cov = est
    print(mu-prior_params["mu"])
    print(cov-prior_params["Lambda"])
    print("estimated covariance matrix checks out:", validate_matrix(cov))

map
[0.00057203 0.01061376]
[[ 2.28988501e-06  7.91196393e-06]
 [ 7.91196393e-06 -8.34970776e-03]]
estimated covariance matrix checks out: True
mle
[0.0005698  0.01061878]
[[ 2.90476630e-06  8.08915365e-06]
 [ 8.08915365e-06 -7.80861038e-03]]
estimated covariance matrix checks out: True


Sinc the prior is quite uninformed, the MLE and MAP are very close to each other, as expected.

### Monte Carlo sampling

We use the library `multiprocess` because `multiprocessing` has issues with Jupyter notebooks.

Warning: runtime might be several hours depending on the configuration.

In [12]:
import multiprocess as mp
def run_mcmc_sampler(sat_data, nwalkers=1000, nsteps=50, nsteps_burn=50, log_prob_fn=log_posterior, num_threads=4, ndim=5, offset=1e-2):
    """
    samples the function provided by `log_prob_fn`
    :param sat_data: datasets for saturation point
    :param nwalkers: number of walkers
    :param nsteps: number of MCMC steps after burn-in phase
    :param nsteps_burn: number of MCMC steps used for burn in
    :param log_prob_fn: function to be sampled
    :param num_threads: number of threads to be used
    :param ndim: dimension of parameter vector `theta` (here: 5)
    :param offset: magnitude of random displacement from walkers' initial positions
    :return: sampler, pos, prob, state (see emcee documentation)
    """
    if log_prob_fn == log_likelihood:
        init_pos = get_initial_walker_pos(function=log_prob_fn, method="SLSQP",
        return_theta=True)
    else:
        init_pos = get_initial_walker_pos(function=log_posterior, method="Nelder-Mead", return_theta=True)

    if num_threads is None:
        num_threads = mp.cpu_count()

    args = None if log_prob_fn == log_prior else (sat_data.to_numpy(),)
    with mp.Pool(num_threads) as pool:
        # for multiprocessing with emcee, see https://emcee.readthedocs.io/en/stable/tutorials/parallel/#multiprocessing
        # https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror
        sampler = mc.EnsembleSampler(nwalkers=nwalkers, ndim=ndim,
                                     log_prob_fn=log_prob_fn, args=args, pool=pool)
        pos = init_pos + offset * np.random.randn(nwalkers, ndim)
        if nsteps_burn > 0:
            pos, _, _ = sampler.run_mcmc(initial_state=pos, nsteps=nsteps_burn, progress=True)
            sampler.reset()
        pos, prob, state = sampler.run_mcmc(initial_state=pos, nsteps=nsteps, progress=True)
    return sampler, pos, prob, state

### Testing

We test here that that the implemented distribution functions match the analytically known results for the mean values.

We begin the testing with the prior distribution:

In [60]:
sampler, pos, prob, state = run_mcmc_sampler(sat_data=data, log_prob_fn=log_prior, nwalkers=10, nsteps=100000, nsteps_burn=10000)
samples = sampler.flatchain
samples.mean(axis=0)

  lnpdiff = f + nlp - state.log_prob[j]
100%|██████████| 10000/10000 [01:14<00:00, 134.90it/s]
100%|██████████| 100000/100000 [12:22<00:00, 134.65it/s]


array([ 1.60878782e-01, -1.58889536e+01,  1.42354180e-03, -4.85025238e-04,
        3.77786437e-01])

Our expectation for the mean values from the prior distribution is a follows:
$$
E(\mu) = \mu_0 \;, \qquad
E(\Sigma) = \frac{\Sigma}{\nu - p -1} \quad \mathrm{for} \; \nu > p + 1
$$
with $p=2$ being associated with rank of the $p \times p$ covariance matrix $\Sigma$. Hence, the mean value of the covariance matrix is _undefined_ in our case.

In [13]:
def mean_val_inv_wish(Lambda, nu):
    """
    computes the mean value of the Inverse-Wishart distribution, see https://en.wikipedia.org/wiki/Inverse-Wishart_distribution
    :param Lambda: covariance matrix
    :param nu: degree of freedom
    :return: mean value of the associated Inverse-Wishart distribution
    """
    p = len(Lambda)
    return Lambda / (nu - p - 1) if nu > p+1 else np.inf

In [14]:
mean_mean_vec = prior_params["mu"]
mean_cov = mean_val_inv_wish(prior_params["Lambda"], prior_params["nu"])
print(f"mean of mean vector: {mean_mean_vec}")
print(f"mean of covariance matrix: {mean_cov}")

mean of mean vector: [  0.16 -15.9 ]
mean of covariance matrix: inf


The mean vector matches well the one from the prior.

Next, let's consider the posterior distribution:

In [64]:
sampler, pos, prob, state = run_mcmc_sampler(sat_data=data, log_prob_fn=log_posterior, nwalkers=10, nsteps=100000, nsteps_burn=5000)

  lnpdiff = f + nlp - state.log_prob[j]
100%|██████████| 5000/5000 [00:40<00:00, 122.34it/s]
100%|██████████| 100000/100000 [14:03<00:00, 118.48it/s]


In [65]:
samples = sampler.flatchain
samples.mean(axis=0)

array([ 1.61370948e-01, -1.58900320e+01, -2.27777506e-03,  3.45362686e-04,
        9.51105654e-02])

Our expectation for the mean values from the updated prior distribution due to conjugacy is as follows (see our manuscript for the equations):

In [66]:
ret = dict()
ret["kappa"] = prior_params["kappa"] + len(data)
ret["nu"] = prior_params["nu"] + len(data)
ret["sample_mean"] = data.mean().to_numpy()
ret["mu"] = np.array([prior_params["kappa"], len(data)]) @ np.array([prior_params["mu"], ret["sample_mean"]]) /ret["kappa"]
diff = ret["sample_mean"] - prior_params["mu"]
tmp = data - ret["sample_mean"]
ret["sum_squared_dev"] = np.sum([np.outer(row.to_numpy(), row.to_numpy()) for irow, row in tmp.iterrows()], axis=0)
ret["Lambda"] = prior_params["Lambda"] + ret["sum_squared_dev"] + (prior_params["kappa"] * len(data) / ret["kappa"]) * np.outer(diff, diff)
print("expected mean vector", ret["mu"])
print("expected covariance matrix:", mean_val_inv_wish(ret["Lambda"], ret["nu"]))

expected mean vector [  0.16056925 -15.88939104]
expected covariance matrix: [[1.03011545e-04 8.06869820e-06]
 [8.06869820e-06 9.46951596e-02]]


Note that $\Sigma_{1,1}$ (lower right) is large \[Note that the matrix elements above include the denominator from the mean value\]. Does that make sense? But keep in mind that $\Sigma/\kappa_n$ will be used as covariance matrix in the normal distribution (of the Normal-Inverse Wishart)---but not in the inverse Wishart distribution.

The mean vector looks like a match. Except for the $\Sigma_{11}$ component, it does not look like a good match for $\Sigma$. Here are the ratios of the matrix elements:

In [67]:
m, c = parse_theta_vec(samples.mean(axis=0))
c/mean_val_inv_wish(ret["Lambda"], ret["nu"])

array([[-22.11184254,  42.80277646],
       [ 42.80277646,   1.00438677]])

## Useful links

* [Covariances and Pearson correlation](https://cxwangyi.wordpress.com/2010/08/29/pearson-correlation-coefficient-covariance-matrix-and-linear-dependency/)
* [emcee tutorial for fitting (1)](https://prappleizer.github.io/Tutorials/MCMC/MCMC_Tutorial_Solution.html)
* [emcee tutorial for fitting (2)](https://users.obs.carnegiescience.edu/cburns/ipynbs/Emcee.html)
* [emcee tutorial for fitting (3)](https://emcee.readthedocs.io/en/stable/tutorials/line/)
* [multivariate normal distribution](https://online.stat.psu.edu/stat505/book/export/html/636)
* [A Note on Wishart and Inverse Wishart Priors for Covariance Matrix](https://jbds.isdsa.org/public/journals/1/html/v1n2/p2/)