Skip to content

Becksteinlab/PSAnalysisTutorial

Repository files navigation

Path Similarity Analysis Tutorial

zenodo

Author:Sean Seyler
Year:2015
License:GNU Public Licence, version 3 (or higher)
Copyright:© 2015 Sean Seyler
Citation:Seyler SL, Kumar A, Thorpe MF, Beckstein O (2015). Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLoS Comput Biol 11 (10): e1004568. doi: 10.1371/journal.pcbi.1004568

Summary

Path Similarity Analysis (PSA) comprises a computational framework designed to enhance the quantitative comparison of macromolecular transition paths [Seyler2015]. This tutorial provides several examples to demonstrate a comparison, using PSA, of closed to open adenylate kinase (AdK) transition paths generated by a selection of various algorithms [Seyler2014]. Hierarchical clustering is used as a simple, but powerful approach to exploratory data analysis by construction of a heat map-dendrogram representation of the quantitative comparison.

Background

PSA, or PSAnalysis, is based on measuring the geometric similarity of transition paths in configuration space using the Hausdorff and Fréchet path metrics. PSA takes advantage of MDAnalysis [Michaud-Agrawal2011] to provide a seamless interface to Python and NumPy arrays, and a mechanism for performing path comparisons using arbitrary atom selections. MDAnalysis also provides a format-agnostic framework for reading simulation trajectories, allowing rapid comparison of many different computational methods. More information about the method can be found in [Seyler2015].

Usage

This tutorial demonstrates a straightforward application of PSA to a set of transitions of the enzyme adenylate kinase (AdK) generated by a selection of methods (for more background on this particular example see [Seyler2014]). Two example python scripts are provided to generate an all-pairs distance comparison between the paths (i.e., all unique pairwise distances): a short version shows how to perform similarity analysis on a set of trajectories that have been pre-processed for proper (frame-by-frame) structural alignment; a full version additionally demonstrates, using the PSA framework, how an alignment procedure would be performed prior to similarity analysis. A third script demonstrates how to perform Hausdorff pairs analyses so that users can explore how paths differ from each other as a function of progress, as well as examine the pair of structures for each pair of paths that are responsible for the Hausdorff distance.

Scripts

Analyses are performed by executing the psa_short.py, psa_full.py, or psa_hausdorff-pairs.py python scripts, which automatically read trajectories from the methods directory into a PSA object and perform trajectory alignment (in the case of psa_full.py). psa_short.py and psa_full.py generate discrete Hausdorff and Fréchet distance matrices, and produce heat map-dendrograms and annotated heat maps representing the distance matrices after Ward hierarchical clustering. In psa_hausdorff-pairs.py, a Hausdorff pairs (nearest neighbor) analysis is performed, with two plots showing the nearest neighbor (structures) as a function of (normalized) frame progress for two pairs of paths (DIMS vs DIMS and DIMS vs rTMD-S).

Interactive notebooks

Also provided are Jupyter notebooks (with the .ipynb extension) that give users the option to perform the same analyses as performed by the scripts in an interactive, step-by-step manner.

PairID identfier class

The notebooks contain optional analyses (not in the scripts) demonstrating how to utilize a convenience class called PairID (provided in pair_id.py). PairID provides an intuitive interface to extract data generated by PSA; the Jupyter notebook called psa_identifier_example.ipynb demonstrates how it's used. All other notebooks make use of the PairID class.

Basic PSA

The psa_short.ipynb notebook goes through the basic steps of PSA:

  1. Prepare and superimpose trajectories appropriately.
  2. Compute Fréchet or Hausdorff distances between all trajectories and generate a clustered distance matrix.

It uses the same data that were used to prepare the comparison of multiple fast transition path sampling methods shown in Figure 6 in [Seyler2015].

Hausdorff-Pair analysis

The psa_hausdorff-pairs.ipynb notebook demonstrates how to extract molecular detail from a path comparison: It yields the two frames (one from each trajectory) that are responsible for the largest difference between the two trajectories, as described in more detail in [Seyler2015]. It then shows how to compare the distance between trajectories along a common order parameter.

Script usage

The scripts can be run directly using, for example,

python psa_short.py

and various settings can be customized, as described below. Furthermore, these scripts can be used as a basis to implement one's own custom analysis.

Customizing the examples

The user can also try adjusting settings in each file to change, for example, the:

  • path metric (default: discrete Fréchet [discrete_frechet])
  • linkage algorithm for hierarchical clustering (default: Ward)
  • name and location of the plot (default: df_ward_psa-[short/full].pdf)

These examples should serve as a sufficient basis for understanding PSA's framework. Some other techniques and analyses using PSA are described in [Seyler2015].

Dependencies

  • MDAnalysis: 0.11.1 or higher
  • pandas: 0.16.2 or higher
  • seaborn: 0.6.0 or higher

Help

If you have questions or problems using the package then ask on the MDAnalysis user mailing list: http://groups.google.com/group/mdnalysis-discussion

Contribution

This tutorial is still under revision and, although it will be updated to reflect changes in the MDAnalysis.analysis.psa module, improvements can always be made and bugs are likely to be present. Users are encouraged to devise their own analyses using the PSA framework. Feedback and issues to the tutorial and PSA are welcome and encouraged!

Implementation in MDAnalysis

If you want to write your own code using PSA then use the MDAnalysis.analysis.psa module, which is part of MDAnalysis (since release 0.10.0) and have a look at the documentation of the PSA module. This tutorial requires the PSA implementation in MDAnalysis release 0.11.1 for all features to work properly.

References

[Michaud-Agrawal2011]N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. MDAnalysis: A toolkit for the analysis of molecular dynamics simulations. J Comp Chem 32:2319-2327, 2011. doi:10.1002/jcc.21787. http://www.mdanalysis.org
[Seyler2014](1, 2) S.L. Seyler and O. Beckstein, Sampling large conformational transitions: adenylate kinase as a testing ground. Mol Simul 40:855–877, 2014. doi:10.1080/08927022.2014.919497
[Seyler2015](1, 2, 3, 4, 5) Seyler SL, Kumar A, Thorpe MF, Beckstein O. Path Similarity Analysis: A Method for Quantifying Macromolecular Pathways. PLoS Comput Biol 11 (10): e1004568, 2015. doi: 10.1371/journal.pcbi.1004568