# `scippr` inference module

This notebook outlines the `scippr` inference procedure based on hierarchical inference.  To improve performance, all probabilities will be calculated as log probabilities.

In [None]:
import numpy as np
import astropy.cosmology as cosmology
from scipy.stats import norm
import emcee
from datetime import datetime

## Setting up the parameter space

First we set up the parameter space for the latent variables of supernova type $t$ and redshift $z$.  We consider $T=3$ types $\tau$ and $Z=100$ redshift bins $\Delta_{\zeta}$.

In [None]:
types = ['Ia', 'Ibc', 'II']
n_types = len(types)

n_zs = 100
redshift_bins = np.linspace(0.01, 1.201, num=n_zs, endpoint=True)

We do the same for the latent variable of distance modulus $\mu$.  We establish $D=100$ bins $\Delta_{\nu}$ using a fiducial cosmology with $H_{0}=70\ km/s/Mpc$ and $\Omega_{m, 0}=0.3$, knowing that this choice will not affect the inference.

In [None]:
cosmo = cosmology.FlatLambdaCDM(H0=70, Om0=0.3)

distancemodulus_bins = cosmo.distmod(redshift_bins)
n_mus = n_zs

In [None]:
# not sure exactly where to put this function
def mu_binner(mu):
    """
    takes a value of mu and returns a vector of length n_mus with 1 in the bin where mu falls and 0 elsewhere
    """
    vector = np.ones(n_mus)
    return vector

## Introducing the log-interim posteriors and interim hyperparameters

We introduce a placeholder for the catalog of log-interim posteriors $\ln[p(t_{n}, z_{n}, \mu_{n} | \underline{\ell}_{n}, \vec{m}_{n}, \vec{\theta}^{*}, \underline{\phi}^{*})]$ over these three latent variables for $N=10$ supernovae $n$.  The log-interim posteriors represent the three-dimensional log-probability distributions over the latent variables conditioned on the observed lightcurves $\underline{\ell}_{n}$ and host galaxy photometry $\vec{m}_{n}$ as well as the interim hyperparameters for the cosmology $\vec{\theta}^{*}$ and for the astrophysics $\underline{\phi}^{*}$.  The log-interim posteriors will be described in more details in a mock data generation notebook elsewhere.  However, we can say that the catalog shape is a four-dimensional $N\times T\times Z\times D$ array.

In [None]:
n_SNe = 10

lninterimposteriors = np.zeros((n_SNe, n_types, n_zs, n_mus))

The interim posteriors must always come with the interim hyperparameters $\vec{\theta}^{*}$ and $\underline{\phi}^{*}$ used to produce them.

In [None]:
n_of_z = np.ones(n_types, n_zs)
interimhyperparameters = {'theta': {'H0': 72}, 'phi': np.log(n_of_z)}

## Choosing the log-hyperprior probability distribution

As in any Bayesian inference, we must choose a hyperprior distribution over the hyperparameters $\vec{\theta}$ and $\underline{\phi}$ that we wish to estimate.  At this stage we will only attempt to infer $\vec{\theta}=H_{0}$.  We will choose a Gaussian hyperprior distribution over the hyperparameter, i.e. $H_{0}\sim\mathcal{N}(7
0, 2^{2})$.

In [None]:
priorhyperparameters = {'H0': {'mean':70, 'sigma':2}}

When we evaluate the log-hyperprior probability $\ln[p(\vec{\theta}, \underline{\phi})]$, we will then be evaluating the log-probability of the hyperprior distribution at the given values of the hyperparameters $\vec{\theta}$ and $\underline{\phi}$.  Thus, the log-hyperprior probability is a scalar as expected.

In [None]:
def lnprior(hyperparameters):
    H0 = hyperparameters[0]
    H0prior = norm.logpdf(H0, loc=priorhyperparameters['H0']['mean'], scale=priorhyperparameters['H0']['sigma'])
    return H0prior

## Calculating the log-hyperlikelihood

The log-hyperlikelihood $\ln[p(t_{n}, z_{n}, \mu_{n} | \vec{\theta}, \underline{\phi})] = \ln[p(t_{n}, z_{n} | \underline{\phi})]+\ln[p(\mu_{n} | z_{n}, \vec{\theta})]$ is the sum of two terms separable in the hyperparameters.  In our parametrization, the first is derived from a lookup table and the second is a $\delta$ function located at the `cosmo.distmod()` function evaluated at the given redshift where `cosmo` is defined using the cosmological parameters in $\vec{\theta}$.

In [None]:
def lnhyperlikelihood(hyperparameters):
    sample_cosmo = cosmology.FlatLambdaCDM(H0=hyperparameters['theta']['H0'], Om0=0.3)
    log_delta = np.log(mu_binner(sample_cosmo.distmod(redshift_bins)))
    return hyperparameters['phi'][:, :, np.newaxis] + log_delta[np.newaxis, np.newaxis, :]

Before we construct the log-posterior probability function, we note that we will be using the log-hyperlikelihood evaluated at the interim hyperparameter values at every evaluation of the log-posterior probability, so we define it as a constant now.

In [None]:
lninterimhyperlikelihood = lnhyperlikelihood(interimhyperparameters)

## Constructing the log-posterior probability

The full log-posterior probability takes the following form:
\begin{equation}
\ln[p(\vec{\theta}, \underline{\phi} | \{\underline{\ell}_{n}, \vec{m}_{n}\}_{N})] \propto \ln[p(\vec{\theta}, \underline{\phi})]+\sum_{n}^{N}\ \ln[\iiint\ \exp[\ln[p(t_{n}, z_{n}, \mu_{n} | \underline{\ell}_{n}, \vec{m}_{n}, \vec{\theta}^{*}, \underline{\phi}^{*})]+\ln[p(t_{n}, z_{n}, \mu_{n} | \vec{\theta}, \underline{\phi})]-\ln[p(t_{n}, z_{n}, \mu_{n} | \vec{\theta}^{*}, \underline{\phi}^{*})]]\ d\mu_{n}\ dz_{n}\ dt_{n}]
\end{equation}
In words, that's the sum of the log-prior probability and the sum of the logs of the integrals over the sum of the log-interim posteriors, the log-hyperlikelihood, and the negative log-interim hyperlikelihood.

In [None]:
def lnprob(hyperparameters):
    return lnprior(hyperparameters) + np.sum(np.log(np.sum(np.sum(np.sum(np.exp(
                        lninterimposteriors + lnhyperlikelihood(hyperparameters) - lninterimhyperlikelihood)
                                                        , axis=3), axis=2), axis=1)), axis=0)

# MCMC Sampler

In [None]:
mcrandstep = 10.**-1.
nthreads = 1
nwalkers = 150
nsteps = 2000
ninit = 200
output_chain = True
emcee_chain_output = "emcee_chain_testing.dat"

In [None]:
def run_MCMC(theta, lnprob):
    init_positions = [np.array(theta) + mcrandstep*np.random.randn(len(theta)) for i in range(int(nwalkers))]
    sampler = emcee.EnsembleSampler(int(nwalkers), len(theta), lnprob, threads=int(nthreads))
    for i, result in enumerate(sampler.sample(init_positions, iterations=ninit, storechain=False)):
        position = result[0]
        lnprob_pos = result[1]
        if (i+1) % 20 == 0:
            print("Burn-in: {0:.1f}%, ".format(100 * float(i+1) / ninit))
            print datetime.now()
        if output_chain:
            if i%10 == 0:
                f = open(emcee_chain_output, "a")
                for k in range(position.shape[0]):
                    for j in range(position.shape[1]):
                        f.write("{:+010.5f} ".format(position[k,j]))
                    f.write("{:+010.5f} ".format(lnprob_pos[k]))
                    f.write("\n")
                f.close()

In [None]:
lninterimposterior = np.zeros((num_SNe, len(types), len(redshift_bins), len(distancemodulus_bins)))

In [None]:
interimtheta = [72]
lninterimhyperlikelihood = lnhyperlikelihood(interimtheta)

In [None]:
run_MCMC(theta, lnprob)