In [1]:
from htmd.ui import *
config(viewer='ngl')


Please cite -- HTMD: High-Throughput Molecular Dynamics for Molecular Discovery
J. Chem. Theory Comput., 2016, 12 (4), pp 1845-1852. 
http://pubs.acs.org/doi/abs/10.1021/acs.jctc.6b00049

You are on the latest HTMD version (1.5.4).



# Adaptive sampling

## Stefan Doerr
Universitat Pompeu Fabra & Acellera

## Generators folder structure

In [1]:
!tree generators | head -20

generators
├── 0
│   ├── input
│   ├── input.coor
│   ├── input.xsc
│   ├── parameters
│   ├── run.sh
│   ├── structure.pdb
│   └── structure.psf
├── ntl9_1ns_0
│   ├── input
│   ├── input.coor
│   ├── input.xsc
│   ├── parameters
│   ├── run.sh
│   ├── structure.pdb
│   └── structure.psf
├── ntl9_1ns_1
│   ├── input
│   ├── input.coor


## Adaptive classes

* AdaptiveMD (free exploration)
* AdaptiveGoal (exploration + exploitation)

## AdaptiveMD

* Setup the queue that will be used for simulations. 
* Tell it to store completed trajectories in the data folder as this is where AdaptiveMD expects them to be by default

In [4]:
queue = LocalGPUQueue()
queue.datadir = './data'

2016-11-08 21:51:38,202 - htmd.apps.acemdlocal - INFO - Found ACEMD at '/home/ec2-user/miniconda3/bin/acemd'
2016-11-08 21:51:38,213 - htmd.apps.localqueue - INFO - Using GPU devices 0


In [5]:
md = AdaptiveMD()
md.app = queue

* Set the `nmin`, `nmax` and `nepochs`

In [6]:
md.nmin=5
md.nmax=10
md.nepochs = 30

* Choose what projection to use for the construction of the Markov model

In [7]:
protsel = 'name CA'
ligsel = '(resname BEN) and ((name C7) or (name C6))'
md.projection = MetricDistance(protsel, ligsel, metric='contacts')

* Set the `updateperiod` of the Adaptive to define how often it will poll for completed simulations and redo the analysis

In [None]:
md.updateperiod = 14400 # execute every 4 hours
md.run()

## AdaptiveGoal

* Most of the class arguments are identical to AdaptiveMD

In [None]:
ad = AdaptiveGoal()
ad.app = queue
ad.nmin = 10
ad.nmax = 20
ad.nepochs = 5000
ad.generatorspath = '../../generators/'
ad.projection = MetricSelfDistance('protein and name CA')
ad.goalfunc = mygoalfunction

* It requires the `goalfunc` argument which defines a goal
* We can define a variety of different goal functions

## The goal function

The goal function will:
* take as input a `Molecule` object of a simulation and 
* produce as output a score for each frame of that simulation. 
* The higher the score, the more desirable that simulation frame for being respawned.

## RMSD goal function

In [None]:
ref = Molecule('./ntl9_crystal.pdb')
def mygoalfunction(mol):
    rmsd = MetricRmsd(ref, 'protein and name CA').project(mol)
    # We want low RMSD to give high score
    return -rmsd  # or even 1/rmsd

## Functions with multiple arguments

In [None]:
ref = Molecule('./ntl9_crystal.pdb')

def mygoalfunction(mol, ref):
    rmsd = MetricRmsd(ref, 'protein and name CA').project(mol)
    # We want low RMSD to give high score
    return -rmsd  # or even 1/rmsd

ad.goalfunc = (mygoalfunction, (ref,))

## Secondary structure goal function

In [None]:
def ssGoal(mol, crystal):
    crystalSS = MetricSecondaryStructure().project(crystal)[0]
    proj = MetricSecondaryStructure().project(mol)
    # How many crystal SS match with simulation SS
    ss_score = np.sum(proj == crystalSS, axis=1) / proj.shape[1]  
    return ss_score

## Contacts goal function

In [None]:
def contactGoal(mol, crystal):
    crystalCO = MetricSelfDistance('protein and name CA', pbc=False,
                                   metric='contacts', 
                                   threshold=10).project(crystal)
    proj = MetricSelfDistance('protein and name CA', 
                              metric='contacts', 
                              threshold=10).project(mol)
    # How many crystal contacts are seen?
    co_score = np.sum(proj[:, crystalCO] == 1, axis=1)
    co_score /= np.sum(crystalCO)
    return ss_score