# Coarse-graining with PCCA+

Given a Markov state model describing a reversible Markov chain, i.e., given transition matrix $P\in\mathbb{R}^{n\times n}$ with stationary distribution $\pi$ it obeys the detailed balance condition

$$ \pi_i P_{ij} = \pi_j P_{ji}, $$

its transition matrix can be coarse grained into metastable sets using the spectral clustering algorithm "Robust Perron Cluster Analysis" (PCCA+) <cite data-cite="nbpcca-roblitz2013fuzzy">(Röblitz, 2013)</cite>. The result of the clustering is a coarse-grained transition matrix

$$ P_\mathrm{macro}\in\mathbb{R}^{m\times m} $$

as well as membership probabilities $M\in\mathbb{R}^{n\times m}$ giving a probability distribution of membership for each macrostate to each microstate.

To illustrate the API (API docs [here](../api/generated/sktime.markov.msm.MarkovStateModel.rst#sktime.markov.msm.MarkovStateModel.pcca) to cluster an existing MSM and [here](../api/generated/sktime.markov.pcca.rst) for a direct and functional interface), let us consider a jump process between two ellipsoids in a two-dimensional space:

In [None]:
import numpy as np
import sktime

data = sktime.data.ellipsoids(laziness=0.97, seed=32).observations(n_steps=5000)

We can visualize the time series as follows. Note that the probability to jump from one ellipsoid to the other  is at $3\%$ for each frame.

In [None]:
import matplotlib as mpl
import matplotlib.pyplot as plt

f, ax = plt.subplots(1, 1, figsize=(12, 5))
ax.scatter(*data.T);

To arrive at a Markov state model, let us cluster the data and assign the frames to the cluster centers.

In [None]:
clustering = sktime.clustering.KmeansClustering(n_clusters=20, fixed_seed=13) \
    .fit(data).fetch_model()
assignments = clustering.transform(data)

In [None]:
f, ax = plt.subplots(1, 1, figsize=(12, 5))
ax.scatter(*data.T, c=assignments/np.max(assignments))
ax.scatter(*clustering.cluster_centers.T, marker="*", c='red', label='Cluster centers');
ax.set_title('cluster center assignments')

A Markov state model can be estimated from the assignments which has as many states as there are cluster centers.

In [None]:
msm = sktime.markov.msm.MaximumLikelihoodMSM() \
    .fit(assignments, lagtime=1).fetch_model()

print(f"Number of states: {msm.n_states}")

Plotting the transition matrix, it is not immediately obvious that there are two metastable sets. With a connectivity threshold discarding weaker connections, one can guess it (i.e., the degree of connectedness within one metastable set is much higher than between them), but this is not always possible and also not very rigorous.

In [None]:
import networkx as nx

fig, axes = plt.subplots(1, 2, figsize=(15, 10))

for i in range(len(axes)):
    threshold = 0 if i == 0 else 1e-2
    ax = axes[i]
    title = "Transition matrix"
    if i == 1:
        title += f" with connectivity threshold {threshold:.0e}"
    G = nx.DiGraph()
    ax.set_title(title)    
    for i in range(msm.n_states):
        G.add_node(i, title=f"{i+1}")
    for i in range(msm.n_states):
        for j in range(msm.n_states):
            if msm.transition_matrix[i, j] > threshold:
                G.add_edge(i, j, title=f"{msm.transition_matrix[i, j]:.3e}")

    edge_labels = nx.get_edge_attributes(G, 'title')
    pos = nx.fruchterman_reingold_layout(G)
    nx.draw_networkx_nodes(G, pos, ax=ax)
    nx.draw_networkx_labels(G, pos, ax=ax, labels=nx.get_node_attributes(G, 'title'));
    nx.draw_networkx_edges(G, pos, ax=ax, arrowstyle='-|>',
                           connectionstyle='arc3, rad=0.3');

To make it rigorous, let us apply PCCA+ with two metastable sets:

In [None]:
pcca = msm.pcca(n_metastable_sets=2)

We obtain a coarse transition matrix, where the jump probability between the two sets is roughly $3\%$:

In [None]:
pcca.coarse_grained_transition_matrix

The membership probabilities can be accessed via `pcca.memberships`, where each column corresponds to a metastable set defining a probability distribution over the microstates:

In [None]:
print(f"Memberships: {pcca.memberships.shape}")

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15, 10))

for i in range(len(axes)):
    ax = axes[i]
    ax.set_title(f"Metastable set {i+1} assignment probabilities")
    
    ax.scatter(*data.T, c=pcca.memberships[assignments, i], cmap=plt.cm.Blues)
norm = mpl.colors.Normalize(vmin=0, vmax=1)
fig.colorbar(plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.Blues), ax=axes, shrink=.8);

As expected, the assignments nicely disentangle the two ellipsoids.

One as also access to a coarse-grained stationary probability vector

In [None]:
pcca.coarse_grained_stationary_probability

Another offered quantity are the metastable distributions, i.e., the probability of metastable states to visit a micro state by PCCA+ $\mathbb{P}(\text{state}_{\text{micro}} | \text{state}_\text{pcca})$:

In [None]:
print("Metastable distributions shape:", pcca.metastable_distributions.shape)

In [None]:
plt.figure(figsize=(10, 5))
plt.plot(np.arange(msm.n_states)+1, pcca.metastable_distributions[0], 'o', label="Metastable set 1")
plt.plot(np.arange(msm.n_states)+1, pcca.metastable_distributions[1], 'x', label="Metastable set 2")
plt.xticks(np.arange(msm.n_states)+1)
plt.title("Metastable distributions")
plt.legend()
plt.xlabel('micro state')
plt.ylabel('probability');

For all micro states one of the distribution values is approximately zero whereas the other one is greater than zero. This is because micro states were separated well enough in the coarse-graining.

For visualization purposes, one can obtain the crisp assignments of microstates to macrostates through:

In [None]:
pcca.assignments

However caution is appropriate, one *cannot* compute any actual quantity of the coarse-grained kinetics without employing the fuzzy memberships.