# Milestoning Calculations

In this notebook, we will use Markovian milestoning to estimate free energies and mean first passage times from short-trajectory data.

In [None]:
%matplotlib ipympl

import numpy as np
import matplotlib.pyplot as plt
from bkit.milestoning import (
    TrajectoryColoring, MarkovianMilestoningEstimator, MilestoneState)

First we load the trajectory data. This is assumed to have already been projected onto some reasonably low-dimensional collective variable ("coarse") space. Here, our coarse space is the subspace spanned by the first two principal components fit in the *define-milestones.ipynb* notebook.

In [None]:
npz = np.load('ptrajs.npz')
trajs = [npz[f] for f in npz.files]
print(f'Loaded {len(trajs)} trajectories.')

Next we load the anchors defined in *define-milestones.ipynb* and use them to construct a `TrajectoryColoring` object. This function/transformer object maintains an internal representation of the Voronoi tessellation generated by the anchor points. Following [TCC2020], we also specify a distance cutoff (here, the maximum distance to the nearest anchor). The region of state space beyond this cutoff is lumped into a single cell labeled `None`.

In [None]:
anchors = np.load('anchors.npy')
cutoff = 1.0
color = TrajectoryColoring(anchors, cutoff=cutoff)

Now we "color" each trajectory&mdash;that is, we map it to its milestone <a href="https://ncatlab.org/nlab/show/schedule">schedule</a> $((a_1, t_1),\dots,(a_n, t_n))$, where $a_1,\dots,a_n$ are the successive milestone states of the trajectory, and $t_1,\dots,t_n$ are the corresponding lifetimes.

In [None]:
schedules = [color(traj) for traj in trajs]

Let's take a look at one of the schedules:

In [None]:
print(schedules[0])

Here the milestone labels $a_1,\dots,a_n$ are objects of type `MilestoneState`, which inherits from Python's built-in `frozenset`. Each milestone corresponds to an unordered pair of cells in our state space partition.

The lifetimes $t_1,\dots,t_n$ are positive integers. Lifetimes of order 1 are of concern, as they indicate that we are bumping up against the time resolution $\Delta t$ of our trajectory data. This may be remedied by decreasing $\Delta t$ (e.g., if the data was obtained by subsampling) or choosing a coarser partition of state space. Note that in the calculations that follow, we  ignore the systematic error resulting from the finiteness of $\Delta t$. (As an alternative, one might try estimating the rate matrix of a Markovian milestoning model by methods such as those described <a href="https://msmtools.readthedocs.io/en/latest/api/generated/msmtools.estimation.rate_matrix.html">here</a>.)

We use the milestone schedule data to estimate the parameters of a Markovian milestoning model (a Markov jump process on the milestone states). This can be done by fitting a `MarkovianMilestoningEstimator`. The parameter  `reversible` indicates whether to require the estimates to satisfy detailed balance.

In [None]:
estimator = MarkovianMilestoningEstimator(reversible=True).fit(schedules)

**Remark:** In ordinary milestoning, the jump process on the milestone states is assumed to be a <a href="https://encyclopediaofmath.org/wiki/Semi-Markov_process">semi-Markov process</a>, which means that the sequence of milestones $a_1,a_2,\dots$ is governed by a <a href="https://en.wikipedia.org/wiki/Discrete-time_Markov_chain">discrete-time Markov chain</a>. Markovian milestoning goes a step further and assumes that this semi-Markov process is in fact a <a href="https://en.wikipedia.org/wiki/Continuous-time_Markov_chain">continuous-time Markov chain</a>. Such a process is determined by a "<a href="https://en.wikipedia.org/wiki/Hollow_matrix">hollow</a>" stochastic matrix $K\equiv (K_{ab})$ of jump probabilities and a vector $\mathbf{\tau}\equiv(\tau_a)$ of mean lifetimes, or equivalently by a rate matrix $Q$ with elements $Q_{ab} = \tau_a^{-1}(K_{ab} - \delta_{ab})$.

The maximum likelihood `MarkovianMilestoningModel` given the data may be accessed via the estimator's `max_likelihood_estimate()` method.

In [None]:
model = estimator.max_likelihood_estimate()
print(model.transition_kernel)

Using this model, we can do things like plot the free energy as a function of milestone index:

In [None]:
kT = 0.593
energy_unit = 'kcal/mol'
indices = range(model.n_states)

f = -kT * np.log(model.stationary_probability)

fig, ax = plt.subplots()
ax.plot(indices, f)
ax.set_ylabel(f'Free energy ({energy_unit})')
_ = ax.set_xlabel('Milestone index')

Or we can look at the mean first passage time (MFPT) to a set of target milestones:

In [None]:
observation_interval = 1e-10
time_unit = 'ms'
target = {MilestoneState(50, 51)}  # target set of MilestoneStates

mfpt = model.mfpt(target) * observation_interval

fig, ax = plt.subplots()
ax.plot(indices, mfpt)
ax.set_ylabel(f'MFPT ({time_unit})')
_ = ax.set_xlabel('Milestone index')

The maximum likelihood estimate is a single-point estimate&mdash;it does not tell us anything about statistical errors. To estimate statistical errors, we can draw a sample from the posterior probability distribution on the model parameter space.

In [None]:
sample = estimator.posterior_sample(size=1000)

Taking another look at the free energy:

In [None]:
fs = [-kT * np.log(model.stationary_probability) for model in sample]
f_mean = np.mean(fs, axis=0)
f_std = np.std(fs, axis=0)

fig, ax = plt.subplots()
ax.fill_between(indices, f_mean-f_std, f_mean+f_std, alpha=0.25)
ax.plot(indices, f_mean)
ax.set_ylabel(f'Free energy ({energy_unit})')
_ = ax.set_xlabel('Milestone index')

... and the MFPT:

In [None]:
mfpts = [model.mfpt(target) * observation_interval for model in sample]
mfpt_mean = np.mean(mfpts, axis=0)
mfpt_std = np.std(mfpts, axis=0)
lower, upper = mfpt_mean - mfpt_std, mfpt_mean + mfpt_std
lower[lower < 0] = 0

fig, ax = plt.subplots()
ax.fill_between(indices, lower, upper, alpha=0.25)
ax.plot(indices, mfpt_mean)
ax.set_ylabel(f'MFPT ({time_unit})')
_ = ax.set_xlabel('Milestone index')