Skip to content

Molecular Dynamics Spectral Clustering Toolkit [MDSCTK]

douradopalmares edited this page Aug 19, 2013 · 3 revisions

The MDSCTK is a suite of tools for performing clustering or dimensionality reduction of molecular dynamics simulations using spectral methods. The toolkit consists of a set of programs and scripts that work in a pipelined fashion, where the output from one program or script becomes input for another. This approach allows the user to perform processing on intermediate data or just use the parts of the pipeline which are useful for a particular task, and ignore the rest.

Thanks for considering MDSCTK for your spectral analysis needs!

The development of MDSCTK is mainly funded by academic research grants. To help us fund development, we humbly ask that you cite the MDSCTK papers:

Validating clustering of molecular dynamics simulations using polymer models. J. L. Phillips, M. E. Colvin and S. Newsam, BMC Bioinformatics, 12(1), 445. (2011) doi:10.1186/1471-2105-12-445

Analyzing dynamical simulations of intrinsically disordered proteins using spectral clustering. J. L. Phillips, M. E. Colvin, E. Y. Lau and S. Newsam, Proc. of the 2008 IEEE Int. Conf. on Bioinf. and Biomed. Workshops (pp. 17–24). Philadelphia, PA: IEEE. (2008) doi:10.1109/BIBMW.2008.4686204

Here are some of the things that MDSCTK does well:

  1. Parallel computation of RMSD or vector distance matrices.

  2. Sparse eigen decomposition on Compressed Sparse Column matrices.

  3. Simple Phi-Psi angle extraction from backbone coordinates.

  4. Nystrom approximation calculations for out-of-sample data sets.

Please view the README for more information about the software, necessary prerequisites, and installation instructions.

General Information

The MDSCTK provides a clean interface for performing spectral clustering on molecular dynamics trajectories. It assumes some basic experience with GROMACS (http://www.gromacs.org) and a little R (http://www.r-project.org/), and uses the ARPACK (http://www.caam.rice.edu/software/ARPACK/) routines for performing sparse eigen decomposition as well as database-driven (ORACLE Berkeley DB) methods for fast computation of large, sparse RMSD distance matrices.

MDSCTK is Open Source software, distributed under the GNU General Public License, and is provided AS-IS, with absolutely NO warranty.

If you want to distribute a modified version or use part of MDSCTK in your own program, remember that the entire modified code must be licensed under GPL, and that it must clearly be labeled as derived work. It should not use the name “MDSCTK”, and make sure support questions are directed to you instead of the MDSCTK developers.