# A Bayesian mixture model approach to quantifying the empirical nuclear saturation point (simplified MC version)

This notebook provides the starting point for an independent implementation of our saturation analysis using brute-force Monte Carlo sampling. It could be used to check and generalize our analysis based on conjugate distributions. This notebook was not used in our analysis.


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import emcee as mc
import numpy as np
import pandas as pd
import numba as nb
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import invwishart, multivariate_normal, multivariate_t
from modules.plot_helpers import plot_confregion_bivariate_t
%matplotlib inline

To test the analysis, let's create some dummy data from the prior we have used before:

In [3]:
prior_params = {"mu": np.array([0.16, -15.9]),
                "Psi": np.array([[0.01**2, 0], [0, 0.32**2]]),  # upper triangle will be ignored
                "kappa": 1, "nu": 4}
np.random.seed(42)
tmp = multivariate_normal.rvs(mean=prior_params["mu"], cov=prior_params["Psi"], size=1000)
data = pd.DataFrame(data={"n0": tmp[:, 0], "E0": tmp[:, 1]})

## Bayes' theorem

Our analysis is based on Bayes' theorem. We need to define the prior, posterior, and likelihood. We use the logarithms of these quantities. The variables we will do the inference for are agglomerated in the parameter vector $\theta = (\mu_0, \mu_1; \Sigma_{0,0}, \Sigma_{0,1}, \Sigma_{1,1})$. To test the implementation, we define the $\theta$ associated with the prior:

In [4]:
theta_prior = (prior_params["mu"][0], prior_params["mu"][1],
           prior_params["Psi"][0, 0], prior_params["Psi"][0, 1],
           prior_params["Psi"][1, 1])

In [5]:
@nb.jit(nopython=True, fastmath=True)
def validate_matrix(mat, raise_error=False, atol_sym=1e-08, atol_eval=1e-06):
    """
    Checks that the 2x2 matrix `mat` is symmetric and positive definite
    :param mat: 2x2 matrix
    :param raise_error: raise error if not symmetric and positive definite
    :param atol_sym: absolute tolerance used for comparison (symmetry)
    :param atol_eval: absolute tolerance used for comparison (positiveness)
    :return: returns boolean result of the validation
    """
    stat_sym = np.abs(mat[0,1]-mat[1,0]) < atol_sym
    #v, _ = np.linalg.eig(mat) # only `eig` supported by numba
    eigvals = 0.5 * (mat[0,0] - np.array([1., -1]) * np.sqrt(4.* mat[0,1]*mat[1,0] + (mat[0,0] - mat[1,1])**2) + mat[1,1])
    stat_pos_def = (eigvals > atol_eval).all()
    stat = stat_sym and stat_pos_def
    if not stat and raise_error:
        raise ValueError("Non-symmetric and/or non-positive-definite 2x2 matrix encountered.")
    else:
        return stat

In [6]:
@nb.jit(nopython=True)
def parse_theta_vec(theta):
    """
    converts the array/vector `theta` into a mean value and (symmetric) covariance matrix
    :param theta: encodes mean vector (first two components) and the (symmetric) covariance matrix (last 3 components);
    :return: mean vector, full covariance matrix
    """
    mu = np.array([theta[0], theta[1]])
    cov = np.array([[theta[2], theta[3]], [theta[3], theta[4]]])
     
    # One might think of using the Choleslky decomposition to enforce positive (semi-)definitess of the returned matrix
    ## cov[0,1]=0 # treat components as lower triangle matrix
    ## cov = cov @ cov.T # use Cholesky decomposition to ensure at at least positive semi-definiteness
    # but this causes issues with the MLE and MAP solvers below
    
    return mu, cov

In [7]:
def log_prior(theta):
    """
    computes the log pdf of our chosen prior distribution, the normal-inverse-Wishart distribution
    Note: we can't use here `numba` straightforwardly because it does not support the distribution functions; `numba-scipy` and `numba-stats` address this issue but the inverse Wishart distribution has not been implemented in these libraries
    :param theta: parameter vector
    :return: log pdf associated with the prior
    """
    mu, sigma = parse_theta_vec(theta)
    if not validate_matrix(sigma):
        return -np.inf
    log_norm = multivariate_normal.logpdf(x=mu, mean=prior_params["mu"], cov=sigma/prior_params["kappa"])
    log_invwishart = invwishart.logpdf(x=sigma, df=prior_params["nu"], scale=prior_params["Psi"])
    return log_norm + log_invwishart

In [8]:
import math
def fast_log_2D_norm_invwishart(mu, Sigma, delta, gamma, Psi, alpha):
    det_Sigma = -Sigma[0, 1] * Sigma[1, 0] + Sigma[0, 0] * Sigma[1, 1]
    det_Psi = -Psi[0, 1] * Psi[1, 0] + Psi[0, 0] * Psi[1, 1]
    Sigma_inv = np.array([[Sigma[1, 1], -Sigma[0, 1]], [-Sigma[1, 0], Sigma[0, 0]]]) / det_Sigma
    invwishart_term = np.trace(Psi @ Sigma_inv)
    diff = mu - delta
    normal_term = gamma * np.dot(diff, np.dot(Sigma_inv, diff))
    D = 2
    norm_term = D / 2 * np.log(gamma)
    norm_term += alpha / 2 * np.log(det_Psi)
    norm_term -= (alpha + D + 2) / 2 * np.log(det_Sigma)
    norm_term -= D / 2 * np.log(2. * np.pi)
    norm_term -= alpha * D / 2 * np.log(2)
    alpha_half = alpha / 2
    norm_term -= 0.5 * np.log(np.pi) + np.log(math.gamma(alpha_half)) + np.log(math.gamma(alpha_half - 0.5))
    return norm_term - 0.5 * (invwishart_term + normal_term)

# def log_prior(theta):
#     mu, sigma = parse_theta_vec(theta)
#     if not validate_matrix(sigma):
#         return -np.inf
#     return fast_log_2D_norm_invwishart(mu=mu, Sigma=sigma, delta=prior_params["mu"],
#                                        gamma=prior_params["kappa"], Psi=prior_params["Psi"], alpha=prior_params["nu"])

In [9]:
@nb.jit(nopython=True, fastmath=True)
def log_likelihood(theta, sat_data):
    """
    computes the logarithm of the likelihood function, the standard Chi-squared likelihood function
    :param theta: parameter vector
    :param sat_data: array (number of point, 2) containing the datasets for the saturation density and energy
    :return: logarithm of the likelihood function
    """
    mu, sigma = parse_theta_vec(theta)
    # exclude non-positive-definite covariance matrices
    if not validate_matrix(sigma):
        #print('covariance not valid')
        return -np.inf
    chi2 = 0.
    sigma_inv = np.linalg.inv(sigma)
    for row in sat_data:
        diff = (row - mu)
        chi2 += np.dot(diff, sigma_inv @ diff)
    return -0.5 * (chi2 + len(sat_data) * np.log(np.abs(np.linalg.det(sigma))))

In [10]:
def log_posterior(theta, sat_data):
    """
    computes the log posterior, which is in this particular case also the normal-inverse-Wishart distribution (conjugagy).
    :param theta: parameter vector
    :param sat_data: pandas dataframe containing the datasets for the saturation density and energy
    :return: log pdf associated with the posterior
    """
    lp = log_prior(theta)
    if not np.isfinite(lp):
        return -np.inf
    return lp + log_likelihood(theta, sat_data)

## Testing our implementation

## Bayesian inference

### Determine initial walker position using MLE or MAP

We need to specify the initial position of the random walkers for Monte Carlo sampling. We use here either the MAP or MLE.

In [11]:
def get_initial_walker_pos(function=log_posterior, method="Nelder-Mead", return_theta=False, **kwargs):
    """
    finds the maximum of the argument `function`; to be used to determine either the maximum a posterior (MAP) or maximum likelihood estimation (MLE)
    :param function: either likelihood function or posterior or prior
    :param method: method used for optimization, see scipy documentation
    :param kwargs: options passed to the optimizer, see scipy documentation
    :param return_theta: boolean to request solution vector or (mu, cov)
    :return: mean value and covariance associated with the found maximum or found maximum (if return_theta)
    """
    nll = lambda *args: -function(*args)
    args = () if function == log_prior else (data.to_numpy(),)
    sol = minimize(nll, theta_prior, args=args, method=method, tol=1e-12, options=kwargs)
    mu, cov = parse_theta_vec(sol.x)
    if not validate_matrix(cov):
        raise ValueError("Covariance matrix not positive definite")
    return sol.x if return_theta else (mu, cov)

In [12]:
map_estimate = get_initial_walker_pos(function=log_posterior, method="Nelder-Mead", maxiter=10000000)
mle_estimate = get_initial_walker_pos(function=log_likelihood, method="SLSQP", maxiter=10000000)

We expect both the MAP and MLE to be close to the (input) prior, so let's compute the difference of the respective mean values and covariance matrices:

In [13]:
for lbl, est in (("map", map_estimate), ("mle", mle_estimate)):
    print(lbl)
    mu, cov = est
    print(mu-prior_params["mu"])
    print(cov-prior_params["Psi"])
    print("estimated covariance matrix checks out:", validate_matrix(cov))

map
[0.00056925 0.01060896]
[[ 2.19399313e-06  8.00470285e-06]
 [ 8.00470285e-06 -8.45639101e-03]]
estimated covariance matrix checks out: True
mle
[0.0005698  0.01061878]
[[ 2.90476621e-06  8.08914970e-06]
 [ 8.08914970e-06 -7.80861025e-03]]
estimated covariance matrix checks out: True


Since the prior is only weakly informed, the MLE and MAP are very close to each other, as expected.

### Monte Carlo sampling

We use the library `multiprocess` because `multiprocessing` has issues with Jupyter notebooks.

Warning: runtime might be several hours depending on the configuration.

In [14]:
def perturb_init_pos_walkers(init_pos, nwalkers, enforce_posdef=True, std_perturbation=1e-2, delta_mean=0.02):
    """
    creates initial positions of `nwalkers` by randomly perturbing a given (single) initial position `init_pos` 
    :param init_pos: datasets for saturation point
    :param nwalkers: number of walkers requested
    :param enforce_posdef: enforce that matrix component in the output is positive (semi-)definite (in addition to symmetric)
    :param std_perturbation: standard deviation of the perturbation
    :param delta_mean: mean value of the normal distribution (delta_mean, std_perturbation**2) for the minimum eigenvalues used to obtain the nearest pos. def. matrix in Frobenius norm
    :return: array of shape (nwalkers, len(init_pos)) containing the perturbed initial positions of the random walkers
    """
    # perturbation preserves symmetry but not definiteness of matrix `init_pos`
    # return array will be component-wise distributed as "Normal(mu=init_pos, sigma^2=std_perturbation)"
    ret = init_pos + std_perturbation * np.random.randn(nwalkers, len(init_pos)) 
    
    # make matrix encoded in `init_pos` (i.e., theta) positive (semi-)definite after perturbation,
    # by finding the nearest sym. pos. def. matrix in Frobenius norm
    # https://nhigham.com/2021/01/26/what-is-the-nearest-positive-semidefinite-matrix/ (based on Cheng & Higham, 1998)
    if enforce_posdef:
        triu_indices = np.triu_indices(2)
        for row in ret:
            mu, mat = parse_theta_vec(row)
            if not validate_matrix(mat): # leave walker alone if its matrix is already pos. def.
                v, w = np.linalg.eig(mat)
                delta = np.abs(delta_mean+std_perturbation * np.random.randn())
                cov = w @ np.diag(np.clip(v, a_min=delta, a_max=None)) @ w.T  # increase eigenvalues if necessary
                validate_matrix(cov, raise_error=True)
                row[2:] = cov[triu_indices]
    return ret
# perturb_init_pos_walkers([1,1,2,3,4], 5)
# perturb_init_pos_walkers([1,1,1,0,0.001], 5, enforce_posdef=True)

In [15]:
import multiprocess as mp
def run_mcmc_sampler(sat_data, nwalkers=1000, nsteps=50, nsteps_burn=50, log_prob_fn=log_posterior, num_threads=4, offset=1e-2):
    """
    samples the function provided by `log_prob_fn`
    :param sat_data: datasets for saturation point
    :param nwalkers: number of walkers
    :param nsteps: number of MCMC steps after burn-in phase
    :param nsteps_burn: number of MCMC steps used for burn in
    :param log_prob_fn: function to be sampled
    :param num_threads: number of threads to be used
    :param offset: magnitude of random displacement from walkers' initial positions
    :return: sampler, pos, prob, state (see emcee documentation)
    """
    if log_prob_fn == log_likelihood:
        init_pos = get_initial_walker_pos(function=log_prob_fn, method="SLSQP",
        return_theta=True)
    else:
        init_pos = get_initial_walker_pos(function=log_posterior, method="Nelder-Mead", return_theta=True)

    if num_threads is None:
        num_threads = mp.cpu_count()

    ndim = len(init_pos)
    args = None if log_prob_fn == log_prior else (sat_data.to_numpy(),)
    with mp.Pool(num_threads) as pool:
        # for multiprocessing with emcee, see https://emcee.readthedocs.io/en/stable/tutorials/parallel/#multiprocessing
        # https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror
        sampler = mc.EnsembleSampler(nwalkers=nwalkers, ndim=ndim,
                                     log_prob_fn=log_prob_fn, args=args, pool=pool)
        
        # let each walker start at a slightly different initial position (random perturbation)
        pos = perturb_init_pos_walkers(init_pos, nwalkers, enforce_posdef=True, std_perturbation=offset)
    
        if nsteps_burn > 0:
            pos, _, _ = sampler.run_mcmc(initial_state=pos, nsteps=nsteps_burn, progress=True)
            sampler.reset()
        pos, prob, state = sampler.run_mcmc(initial_state=pos, nsteps=nsteps, progress=True)
    return sampler, pos, prob, state

### Testing

We test here that the implemented distribution functions match the analytically known results for the mean values.

We begin the testing with the **prior distribution**:

In [17]:
sampler, pos, prob, state = run_mcmc_sampler(sat_data=data, log_prob_fn=log_prior, nwalkers=10, nsteps=100000, nsteps_burn=5000)
samples_prior = sampler.flatchain
# sampler.chain[:,:,4].mean(axis=1)
samples_prior.mean(axis=0)

100%|██████████| 5000/5000 [00:39<00:00, 125.16it/s]
100%|██████████| 50000/50000 [06:46<00:00, 122.90it/s]


array([ 1.59939427e-01, -1.59030851e+01,  8.62527648e-05, -4.59882518e-05,
        1.04112204e-01])

Next, we can compare the estimated mean vector and covariance matrix with the expected values:

In [18]:
samples_prior.mean(axis=0)[:2] - prior_params["mu"]

array([-6.05727478e-05, -3.08505565e-03])

In [21]:
est_cov = np.cov(samples_prior[:,0], samples_prior[:,1])
nu = prior_params["nu"] - 2 + 1
pref = nu / (nu - 2)
est_cov - pref * prior_params["Psi"]/(prior_params["kappa"]*nu)

array([[-1.38596595e-05,  1.02884100e-04],
       [ 1.02884100e-04,  1.97642242e-02]])

## Useful links

* [Covariances and Pearson correlation](https://cxwangyi.wordpress.com/2010/08/29/pearson-correlation-coefficient-covariance-matrix-and-linear-dependency/)
* [emcee tutorial for fitting (1)](https://prappleizer.github.io/Tutorials/MCMC/MCMC_Tutorial_Solution.html)
* [emcee tutorial for fitting (2)](https://users.obs.carnegiescience.edu/cburns/ipynbs/Emcee.html)
* [emcee tutorial for fitting (3)](https://emcee.readthedocs.io/en/stable/tutorials/line/)
* [multivariate normal distribution](https://online.stat.psu.edu/stat505/book/export/html/636)
* [A Note on Wishart and Inverse Wishart Priors for Covariance Matrix](https://jbds.isdsa.org/public/journals/1/html/v1n2/p2/)