# Milestoning Calculations

In this notebook, we will use Markovian milestoning to estimate free energies and mean first passage times from short-trajectory data.

In [1]:
%matplotlib ipympl
import bkit.milestoning as milestoning
import matplotlib.pyplot as plt
import numpy as np
import pint
ureg = pint.UnitRegistry()

First we load the trajectory data. This is assumed to have already been projected onto some reasonably low-dimensional collective variable space $\mathbb{Y}$. In our particular case, $\mathbb{Y}\equiv$ (PC1-PC2 space) is of dimension 2.

In [2]:
npz = np.load('ptrajs.npz')
trajs = [npz[f] for f in npz.files]
print(f'Loaded {len(trajs)} trajectories.')

Loaded 12020 trajectories.


We also need to keep track of the time resolution of the trajectory data (the observation interval $\Delta t$).

In [3]:
dt = 0.1 * ureg.ps

In the previous notebook, we defined anchor points in $\mathbb{Y}$. Here we use them to construct a `CoarseGraining` object, which maintains an internal representation of the Voronoi partition generated by the anchor points. Following [TCC2020], we also specify a distance cutoff (here, the maximum distance to the nearest anchor point). The region of state space beyond this cutoff is lumped into a single "boundary cell" with index &minus;1.

In [4]:
anchors = np.load('anchors.npy')
cutoff = 1.0
color = milestoning.TrajectoryColoring(anchors, cutoff=cutoff)

Using this coarse graining, we map each trajectory to a milestone <a href="https://ncatlab.org/nlab/show/schedule">schedule</a> $[(a_1, t_1),\dots,(a_N, t_N)]$. Here $a_1,\dots,a_N$ are the successive milestones visited, and $t_1,\dots,t_N$ are the corresponding lifetimes (in units of $\Delta t$).

In [None]:
schedules = [color(traj) for traj in trajs]

Note: For those who prefer to think of the coarse graining as a transformer (à la scikit-learn), as opposed to a function, the last line may also be written as `schedules = coarse_graining.transform(trajs)`.

Let's take a look at one of the schedules:

In [None]:
print(schedules[0])

The milestones $a_n$ are instances of `Milestone`, a class representing milestones indexed by a set (typically an unordered pair) of cell indices. 

The lifetimes $t_n$ are positive integers. Lifetimes of order 1 are of concern, as they indicate that we are bumping up against the time resolution $\Delta t$ of our trajectory data. This may be remedied by decreasing $\Delta t$ (e.g., if the data was obtained by subsampling) or choosing a coarser partition of state space. Note that in the calculations that follow, we simply ignore the systematic error resulting from the finiteness of $\Delta t$. (As an alternative, one might try estimating milestoning rate constants by methods such as those described <a href="https://msmtools.readthedocs.io/en/latest/api/generated/msmtools.estimation.rate_matrix.html">here</a>.)

We use the coarse-grained trajectory data to estimate the parameters of a Markovian milestoning model (i.e., a Markov jump process whose states are the milestones $a, b, \dots$). This can be done by fitting a `MarkovianMilestoningEstimator`. The  Boolean parameter `reversible` indicates whether to constrain the estimator to the space of reversible models, that is, those obeying detailed balance.

In [None]:
estimator = milestoning.MarkovianMilestoningEstimator(reversible=True).fit(schedules)

**Remark:** In ordinary milestoning, the milestoning process (whose realizations are represented here in the form of schedules $[(a_1,t_1),(a_2,t_2),\dots]$) is assumed to be a <a href="https://encyclopediaofmath.org/wiki/Semi-Markov_process">semi-Markov process</a>, which means that the sequence of milestones $a_1,a_2,\dots$ is governed by a <a href="https://en.wikipedia.org/wiki/Discrete-time_Markov_chain">discrete-time Markov chain</a>. Markovian milestoning goes a step further and assumes that this semi-Markov process is in fact a <a href="https://en.wikipedia.org/wiki/Continuous-time_Markov_chain">continuous-time Markov chain</a>. Such a process is characterized completely by a "<a href="https://en.wikipedia.org/wiki/Hollow_matrix">hollow</a>" stochastic matrix $K\equiv (K_{ab})$ of jump probabilities and a vector $\mathbf{\tau}\equiv(\tau_a)$ of mean lifetimes&mdash;or equivalently by a rate matrix $Q$ with elements $Q_{ab} = \tau_a^{-1}(K_{ab} - \delta_{ab})$.

The maximum likelihood `MarkovianMilestoningModel` given the data may be accessed via the estimator's `max_likelihood_estimate()` method.

In [None]:
model = estimator.max_likelihood_estimate()
print(model.transition_kernel)

Using this model, we can do things like plot the free energy as a function of milestone index:

In [None]:
kT = 0.593 * ureg.kcal / ureg.mol
f = -kT * np.log(model.stationary_probability)

fig, ax = plt.subplots()
ax.plot(range(model.n_states), f.magnitude)
ax.set_ylabel('Free energy ({:~})'.format(f.units))
_ = ax.set_xlabel('Milestone index')

Or we can look at the mean first passage times (MFPTs) to a target milestone (or set of milestones):

In [None]:
target = 50
mfpt = model.mfpt(target) * dt.to(ureg.microseconds)

fig, ax = plt.subplots()
ax.plot(range(model.n_states), mfpt.magnitude)
ax.set_ylabel('MFPT to milestone {} ({:~})'.format(target, mfpt.units))
_ = ax.set_xlabel('Milestone index')

In [None]:
mfpt[2]

Note that the maximum-likelihood model&mdash;a single-point estimate&mdash;does not tell us anything about statistical errors. To estimate statistical errors, we can draw a sample from the posterior probability distribution on the parameter space of (reversible) Markovian milestoning models.

In [None]:
sampled_models = estimator.sample_posterior(n_samples=1000)

In [None]:
fs = -kT * [np.log(model.stationary_probability) for model in sampled_models]
f_mean = np.mean(fs, axis=0)
f_std = np.std(fs, axis=0)

fig, ax = plt.subplots()
ax.errorbar(range(model.n_states), f_mean.magnitude, f_std.magnitude)
ax.set_ylabel('Free energy ({:~})'.format(f_mean.units))
_ = ax.set_xlabel('Milestone index')

In [None]:
mfpts = [model.mfpt(target) for model in sampled_models] * dt.to(ureg.microseconds)
mfpt_mean = np.mean(mfpts, axis=0)
mfpt_std = np.std(mfpts, axis=0)

fig, ax = plt.subplots()
ax.errorbar(range(model.n_states), mfpt_mean.magnitude, mfpt_std.magnitude)
ax.set_ylabel('MFPT to milestone {} ({:~})'.format(target, mfpt_mean.units))
_ = ax.set_xlabel('Milestone index')