# Tutorial: Fitting a Simple Microlensing Model to an OGLE Lightcurve

Here we'll pick up from the introduction in the [OGLE](ogle_lightcurve.pynb) notebook. Our goal will be to
1. see our simpleminded Metropolis implementation struggle with a real-world problem, and then
2. apply a more efficient implementation or algorithm (using a package) for comparison.

## Data and model

Let's read in the data, as in the last notebook.

In [None]:
exec(open('tbc.py').read()) # define TBC and TBC_above
import numpy as np
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
import scipy.stats as st
%matplotlib inline
import incredible as cr
from corner import corner

In [None]:
TBC()
# dat = np.loadtxt('../ignore/phot.dat') # edit path if needed

In [None]:
t = dat[:,0]
I = dat[:,1]
Ierr = dat[:,2]

t0 = 2450000.
t -= t0

For convenience, we'll organize the data in a dictionary as follows.

In [None]:
data = {'t':t, 'I':I, 'Ierr':Ierr, 't0':t0}

Next, copy over your model evaluation function from the [`ogle_lightcurve`](ogle_lightcurve.ipynb) notebook.

In [None]:
def model_I(t, I0, p, tmax, tE):
    """
    Return the model lightcurve in magnitude units, I(t)
    """
    TBC()

TBC_above()

Next, sketch the PGM and write out the probability expressions corresponding to the data set and the model given in the [`ogle_lightcurve`](ogle_lightcurve.ipynb) notebook. (We'll think about the priors below.)

> _TBC_

Finally, we need to chose priors. As always, you can experiment with different choices if you think they're justified. But for concreteness, and to enable comparison with a known solution, consider the following as a default.

This seems like a situation where uniform priors are reasonable for all parameters. Note that $p\geq0$ is a physical requirement of the model definition (and $p>0$ is a numerical requirement, to avoid dividing by zero). Bounds for the prior distributions in $I_0$, $t_\mathrm{max}$ and $t_\mathrm{E}$ may not be obvious (strictly) a priori, but could be based on an absolutely minimal use of the data. For example, given that these lightcurves correspond to intervals the OGLE pipeline believes it found a microlensing event, it's reasonable to assume that $t_\mathrm{max}$ lies somewhere within the lightcurve, and similarly that the width $t_\mathrm{E}$ be less than the duration of the lightcurve, and that, for e.g., $I_0$ lies between the minimum and maximum of the measured $I(t)$ (maybe with an extra buffer of 1-2 magnitudes, if you want). Write down your chosen priors here.

> _TBC_

## Model implementation

Implement log-prior, log-likelihood and log-posterior functions. The prototypes are of the same form we've been using, which is hopefully familiar now. For concreteness, and to agree with the argument list of `model_I`, let's call the parameters `I0`, `p`, `tmax` and `tE` in code, and also define a `params` dictionary as usual. The `data` argument will be the dictionary we actually called `data` above.

In [None]:
TBC() # params = {'I0': ... put in broadly reasonable starting parameters from the previous notebook

In [None]:
def log_prior(**params):
    TBC()
    
TBC_above()

In [None]:
# sanity check
log_prior(**params)

In [None]:
def log_likelihood(data, **params):
    TBC()
    
TBC_above()

In [None]:
# sanity check
log_likelihood(data, **params)

In [None]:
def log_posterior(data, **params):
    TBC()
    
TBC_above()

In [None]:
# sanity check
log_posterior(data, **params)

You can either use your guess as a starting point for a chain, or insert some cells here and use a numerical minimizer to get `params` closer to the best fit.

## Fitting with simple Metropolis

We'll first try to use your Metropolis implementation from the [AGN photometry tutorials](agn_photometry_metro.ipynb), and see how well that does.

Define a proposal distribution with guesses for step sizes for each parameter, as we did in that notebook.

In [None]:
TBC()
# proposal_distribution = {'I0': ...

You can copy over your `propose` and `step` functions from that notebook also.

In [None]:
def propose(current_params, dists):
    TBC()
    
def step(data, current_params, current_lnP, proposal_dists):
    TBC()
    
TBC_above()

Let's run a single chain to see how things are working. As before, you might want to go back and adjust the proposal distribution based on what you see below. There is nothing magic about the length of 10000 other than it's a nice round number that should be more than enough when we switch to a more advanced sampler. It should also be long enough for even a struggling sampler to move around at least a bit.

In [None]:
%%time
current_lnP = log_posterior(data, **params)

samples = np.zeros((10000, len(params)))
for i in range(samples.shape[0]):
    params, current_lnP = step(data, params, current_lnP, proposal_distribution)
    samples[i,:] = [params['I0'], params['p'], params['tmax'], params['tE']]

Here we plot the traces as usual.

In [None]:
param_labels = [r'$I_0$', r'$p$', r'$t_{max}$', r'$t_E$']

plt.rcParams['figure.figsize'] = (16.0, 12.0)
fig, ax = plt.subplots(len(param_labels), 1);
cr.plot_traces(samples, ax, labels=param_labels)

Assuming this looks broadly reasonable, we can now run a few more, with dispersed starting points. Remember that these chains don't need to look perfect, since we're also going to use a more advanced sampler below.

In fact, this is a pretty nasty parameter space, at least for my stupid Metropolis sampler, so dispersing starting points within very wide priors is a bad idea. The cell below disperses parameters by something like 5x the standard deviations of your test chain, which will hopefully be ok.

In [None]:
%%time
chains = [np.zeros((10000, len(params))) for j in range(4)]

for newsamples in chains:
    params = {'I0':st.norm.rvs(params['I0'], 5*np.std(samples[:,0])),
              'p':st.norm.rvs(params['p'], 5*np.std(samples[:,1])),
              'tmax':st.norm.rvs(params['tmax'], 5*np.std(samples[:,2])),
              'tE':st.norm.rvs(params['tE'], 5*np.std(samples[:,3]))
             }
    current_lnP = log_posterior(data, **params)
    for i in range(samples.shape[0]):
        params, current_lnP = step(data, params, current_lnP, proposal_distribution)
        newsamples[i,:] = [params['I0'], params['p'], params['tmax'], params['tE']]
    print("Done with a chain")

As always, our next move is to inspect the trace plots. Remember what we're looking for from the [MCMC Diagnostics notebook](mcmc_diagnostics.ipynb)?

In [None]:
plt.rcParams['figure.figsize'] = (16.0, 12.0)
fig, ax = plt.subplots(len(param_labels), 1);
cr.plot_traces(chains, ax, labels=param_labels, Line2D_kwargs={'markersize':1.0})

Remove burn-in:

In [None]:
TBC() # burn = ...
chains = [chain[burn:,:] for chain in chains]

Have a look at the other diagnostics we covered.

In [None]:
TBC() # compute the Gelman-Rubin criterion for each parameter

In [None]:
TBC() # compute the effective number of samples for each parameter

**Checkpoint:** Without too much fiddling, I was able to get good convergence, but not a particularly large $n_\mathrm{eff}$ (hundreds). If your chains are similar, you're in good shape, considering!

In fact, let's also quickly look at the posterior mean of each parameter as a cross check that your solution is broadly sound. Mine are about [19.822, 0.2662, 7434.4, 194.5]. Of course, this is only useful to know if you used the same data and priors.

In [None]:
np.mean(np.concatenate(chains, axis=0), axis=0)

For completeness, let's make a quick triangle plot. If you have as few effectively independent samples as I do, it will be ugly!

In [None]:
corner(np.concatenate(chains, axis=0), labels=param_labels);

## Fit with a better sampler

Now the fun, and more open-ended part! Fit the same data and model, but using a different sampler. This sampler can be provided by some Python package that you can `pip` or `conda` install. In fact, we encourage this, as it forces you to learn to use software that might be useful in general.

You can use a more efficient Metropolis-Hastings sampler (e.g. with adaptation) or some other sampling method entirely. However, we would discourage treating these things as black boxes, so stick to methods where you have a reasonable idea what's happening under the hood. We refer you to the notes on [More Sampling Methods](../notes/more_samplers.ipynb) and our incomplete list of [sampling packages](../notes/MC_packages.ipynb), though in principle you need not restrict yourself to these.

Once you've installed and figured out how to use one of these things, run several chains as we did above and look at the usual diagnostics. (This assumes that the concept of "multiple chains" makes sense for the method you're using. If not, show and discuss whatever diagnostics make sense for that method.)

Verify that you get essentially the same results as above (modulo the poorer sampling of simple our Metropolis implementation - a visual check is fine for this), and comment on the relative efficiency of the two algorithms.

For compatibility with the remainder of the notebook, store your final list of samples in a single $N\times4$ array called `samples`. For multiple chains arranged as we're used to, like in the previous section, this could be done by `samples = np.concatenate(chains, axis=0)` (after removing burn-in).

In [None]:
TBC() # all the stuff above

> _TBC comments on the efficiency, results_

## Compare the fitted model to the data

As a simple check of whether the fit is reasonable, the cell below will plot the model curve defined by the posterior mean over the data. We can't possibly claim to be finished without looking at this!

In [None]:
mean_params = np.mean(samples, axis=0)

plt.rcParams['figure.figsize'] = (7.0, 5.0)
plt.errorbar(t, I, yerr=Ierr, fmt='none');
plt.xlabel('HJD - '+str(t0), fontsize=14);
plt.ylabel('I-band magnitude', fontsize=14);
plt.gca().invert_yaxis();
tgrid = np.linspace(t.min(), t.max(), 1000)
plt.plot(tgrid, model_I(tgrid, *mean_params));

## Summary of results

Similarly, we're not done without finding the 1D marginalized best values and 68.3% credible intervals for each parameter. Do so.

In [None]:
TBC()

> $I_0 =$ this $\pm$ that, etc.

**Checkpoint:** For the reference data set and priors, I find $I_0=19.8230\pm0.0017$, $p=0.2663\pm0.0013$, $t_\mathrm{max}=7434.4\pm0.4$ and $t_\mathrm{E}=194.6\pm1.0$.