Skip to content

markovmodel/mdshare

Repository files navigation

mdshare

Get access to our MD data files.

CircleCI Codacy Badge

This is a downloader for molecular dynamics (MD) data from a public FTP server at FU Berlin. See here for a full list of available datasets and terms of use.

Example

This code will download a file (if it does not already exist locally) with a featurized set of three alanine dipeptide MD trajectories and store its content of three numpy.ndarray objects (each of shape=[250000, 2], dtype=numpy.float32) in the list trajs:

import mdshare
import numpy as np

local_filename = mdshare.fetch('alanine-dipeptide-3x250ns-backbone-dihedrals.npz')
with np.load(local_filename) as fh:
    trajs = [fh[key] for key in sorted(fh.keys())]

By default, the mdshare.fetch() function will look in and download to the current directory (function parameter working_directory='.'). If you instead set this parameter to None ...

local_filename = mdshare.fetch(
    'alanine-dipeptide-3x250ns-backbone-dihedrals.npz',
    working_directory=None)

... the file will be downloaded to a temporary directory. In both cases, the function will return the path to the downloaded file.

Should the requested file already be present in the working_directory, the download is skipped.

Using mdshare.catalogue() to view the files and filesizes of the available trajectories ...

mdshare.catalogue()

... produces the output:

Repository: http://ftp.imp.fu-berlin.de/pub/cmb-data/
Files:
alanine-dipeptide-0-250ns-nowater.xtc                  42.9 MB
alanine-dipeptide-1-250ns-nowater.xtc                  42.9 MB
alanine-dipeptide-2-250ns-nowater.xtc                  42.9 MB
alanine-dipeptide-3x250ns-backbone-dihedrals.npz        6.0 MB
alanine-dipeptide-3x250ns-heavy-atom-distances.npz    135.0 MB
[...]
Containers:
mdshare-test.tar.gz                                   193.0 bytes
pyemma-tutorial-livecoms.tar.gz                       123.9 MB

Using mdshare.search(filename_pattern) to select for a given group of files ...

pentapeptide_xtcs = mdshare.search('penta*xtc')
print(pentapeptide_xtcs)

... produces the output:

['pentapeptide-00-500ns-impl-solv.xtc',
 'pentapeptide-01-500ns-impl-solv.xtc',
 'pentapeptide-02-500ns-impl-solv.xtc',
...
 'pentapeptide-22-500ns-impl-solv.xtc',
 'pentapeptide-23-500ns-impl-solv.xtc',
 'pentapeptide-24-500ns-impl-solv.xtc']