# Running MD simulations

Running vanilla molecular dynamics (MD) simulations on protein-ligand complexes can provide information about the dynamics of a given ligand inside the binding site and can be used to assess the quality and stability of a proposed binding mode.

Below we will demonstrate an example of running MD simulations on a protein ligand complex, following docking, using the asapdiscovery framework. 

We will be running this example on an ASAP target, the SARS-CoV-2 nsp3 Mac1 macrodomain; this removes ADP ribose from viral and host cell proteins. The removal of this post-translational modification reduces the inflammatory and antiviral responses to infection — facilitating replication. For more information on Mac1, follow this [link](https://asapdiscovery.notion.site/Targeting-Opportunity-SARS-CoV-2-nsp3-Mac1-macrodomain-47af24638b994e8ba786303ec743926e).

### Required modules
To execute this example, the following asapdiscovery modules to be installed: 
- `data`,
- `docking`,
- `modeling`
- `simulation`
  
To enable the visualization of the docking results, the following modules will also need to be installed:
- `dataviz`
- `genetics`.

**Note** this example requires users have an [OpenEye](https://www.eyesopen.com/) license.

Please refer to the [installation instructions](https://github.com/choderalab/asapdiscovery?tab=readme-ov-file#installation) for more details.  

In [1]:
# import some dependencies
from asapdiscovery.data.testing.test_resources import fetch_test_file
from asapdiscovery.docking.docking import DockingInputPair
from asapdiscovery.docking.openeye import POSITDocker
from asapdiscovery.data.schema.complex import Complex, PreppedComplex
from asapdiscovery.data.schema.ligand import Ligand
from asapdiscovery.simulation.simulate import VanillaMDSimulator

# Docking an arbitrary ligand to Mac1
Let us fetch an example structure of Mac1, hosted in the ASAP testing repository.  For convenience, we will utilize the `fetch_test_file` function (part of the `data` module) to download this file and return its location.

In [2]:
# fetch a PDB file from the test suite, in this case a PDB from the COVID MOONSHOT.
protein_pdb_file = fetch_test_file("SARS2_Mac1A-A1013.pdb") 

# print out the location of the pdb file we just downloaded. 
print(protein_pdb_file) 

/Users/hugomacdermott/Library/Caches/asapdiscovery_testing/SARS2_Mac1A-A1013.pdb


Now we will create a `Complex` object from the .pdb file (see the tutorial on base level ASAP abstractions for more info)

In [3]:
# make a complex 
mac1_complex = Complex.from_pdb(protein_pdb_file, ligand_kwargs={"compound_name": "A1013"}, target_kwargs={"target_name": "SARS2_Mac1A"})

We will need to create a `Ligand` object to dock into the structure.  In this case we will generate the ligand from a SMILES string.  Note, in addition to SMILES, the ligands can also be created from InChI strings, SDF files, and OEMol instances. 

In [4]:
# make the ligand we want to dock, a simple alkane
ligand = Ligand.from_smiles("CCCCCCC", compound_name="alkane")

Next, we will run protein preparation.

In [5]:
# prepare our structure
prepped_mac1_complex_complex = PreppedComplex.from_complex(mac1_complex)
# pair it up with the ligand we want to dock.
docking_input_pair = DockingInputPair(complex=prepped_mac1_complex_complex, ligand=ligand)


DPI: 0.12, RFree: 0.28, Resolution: 1.48
Processing BU # 1 with title: ---_LIG, chains AB


Now, we dock it to our protein.  Note, `use_dask` is set to `False`, disabling parallel execution, as it is not required for this example (Note, `dask` is discussed at the end of this tutorial). 

In [6]:
# run OpenEye POSIT docking,
docker = POSITDocker(use_omega=False)
results = docker.dock([docking_input_pair], use_dask=False)

In [7]:
print(results[0].posed_ligand.tags)

{'docking-confidence-POSIT': 0.019999999552965164, '_POSIT_method': 'FRED'}


## Vizualise the docked pose
Let us now vizualise our results! For more information on vizualisations, see the vizualisation notebook hosted in the [examples directory](https://github.com/choderalab/asapdiscovery/tree/main/examples).

In [8]:
# create a visualization factory. 
from asapdiscovery.dataviz.html_viz import HTMLVisualizer

html_vizualizer = HTMLVisualizer(
        target="SARS-CoV-2-Mac1",
        color_method="subpockets",
        align=True,
        output_dir="tutorial_files/running_md_simulations/",
        write_to_disk=True,
    )
vizs_from_docked =  html_vizualizer.visualize(inputs=results, outpaths=["visualise_docked.html"], use_dask=False)

2024-05-10 11:57:34,904 [INFO] [plipcmd.py:124] plip.plipcmd: Protein-Ligand Interaction Profiler (PLIP) 2.3.0
2024-05-10 11:57:34,904 [INFO] [plipcmd.py:125] plip.plipcmd: brought to you by: PharmAI GmbH (2020-2021) - www.pharm.ai - hello@pharm.ai
2024-05-10 11:57:34,904 [INFO] [plipcmd.py:126] plip.plipcmd: please cite: Adasme,M. et al. PLIP 2021: expanding the scope of the protein-ligand interaction profiler to DNA and RNA. Nucl. Acids Res. (05 May 2021), gkab294. doi: 10.1093/nar/gkab294
2024-05-10 11:57:34,904 [INFO] [plipcmd.py:49] plip.plipcmd: starting analysis of tmp_complex.pdb
2024-05-10 11:57:35,006 [INFO] [plipcmd.py:165] plip.plipcmd: finished analysis, find the result files in /var/folders/f5/0zcc5b7570jc40ws28tqdp740000gn/T/tmp7vmlhaks/


In [9]:
from IPython.display import IFrame
IFrame(vizs_from_docked["html_path_pose"][0], 1000, 1000)

## Running a single MD simulation

Great, our pose looks good! Lets use the `VanillaMDSimulator` to run simulations of the protein-ligand complex. 

The `VanillaMDSimulator` has many options for running simulations in different configurations, however a basic configuration should be sufficient for this example. For the purposes of this tutorial we will keep the simulations very, very short (simulation time <1 minute for a single simulation on a typical GPU-enabled computer). 

In [10]:
VanillaMDSimulator?

[0;31mInit signature:[0m
[0mVanillaMDSimulator[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0moutput_dir[0m[0;34m:[0m [0mpathlib[0m[0;34m.[0m[0mPath[0m [0;34m=[0m [0;34m'md'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdebug[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcollision_rate[0m[0;34m:[0m [0mpydantic[0m[0;34m.[0m[0mtypes[0m[0;34m.[0m[0mPositiveFloat[0m [0;34m=[0m [0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mopenmm_logname[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m'openmm_log.tsv'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mopenmm_platform[0m[0;34m:[0m [0masapdiscovery[0m[0;34m.[0m[0msimulation[0m[0;34m.[0m[0msimulate[0m[0;34m.[0m[0mOpenMMPlatform[0m [0;34m=[0m [0;34m<[0m[0mOpenMMPlatform[0m[0;34m.[0m[0mFastest[0m[0;34m:[0m [0;34m'Fastest'[0m[0;34m>[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0m

Let us set up the simulator:

In [11]:
md_simulator = VanillaMDSimulator(
            output_dir="tutorial_files/running_md_simulations/",
            equilibration_steps=1,
            num_steps=1,
            reporting_interval=1)


To run the simulation, we will pass the output from the docking performed early (saved as `results`) to the `simulation` function of the `VanillaMDSimulation` instance.  This will launch a simulation using the [OpenMM simulation package](https://openmm.org/). 

In [12]:
simulation_results = md_simulator.simulate(
            results,
            use_dask=False)

This function returns a list, where each entry contains an instance of `SimulationResult` corresponding to each of the simulations executed. In this case, we only ran a single simulation, and to query the output we just need to set the index to be 0.   For example, let us check to see if the simulation completed successfully:

In [13]:
simulation_results[0].success

True

The simulator makes unique paths for the resulting simulations, which can be access via the simulation_results output. The pathes that are returned are relevative to the directory where the simulation was launched.   Note, future releases will add additional flexibility with regards to simulation output. 

In [14]:
print(simulation_results[0].traj_path)

tutorial_files/running_md_simulations/SARS2_Mac1A-b27f22555232d2d68273612ffce5a119d6e22526d95ce3eb0db9012632bcdaf6+FHHVXLFEHODNRQ-XCZWEQHLNA-M_alkane-IMNFDUFMRHMDMM-UHFFFAOYNA-N/traj.xtc


In [15]:
print(simulation_results[0].final_pdb_path)

tutorial_files/running_md_simulations/SARS2_Mac1A-b27f22555232d2d68273612ffce5a119d6e22526d95ce3eb0db9012632bcdaf6+FHHVXLFEHODNRQ-XCZWEQHLNA-M_alkane-IMNFDUFMRHMDMM-UHFFFAOYNA-N/final.pdb


## Running multiple simulations in parallel with dask-cuda

We can pass a series of DockingResults to the VanillaMDSimulator and have `dask-cuda` parallelize work over available GPU resources (see [here](https://docs.rapids.ai/api/dask-cuda/nightly/)). 

For this we will need a LocalCUDACluster. You will only see parallelism here if you have more than one GPU, e.g on an HPC cluster, otherwise, the simulations will run sequentially. 

**NOTE** dask_cuda is not available for `MacOS` computers.

In [None]:
# create a dask_cuda LocalCUDACluster
from dask_cuda import LocalCUDACluster
from dask.distributed import Client

cluster = LocalCUDACluster()
gpu_client = Client(cluster)

In [None]:
# asapdiscovery provides a convenience function to do this
from asapdiscovery.data.util.dask_utils import DaskType, make_dask_client_meta
gpu_client = make_dask_client_meta(DaskType.LOCAL_GPU)

We can see by inspecting the signature of the `simulate` method that it can accept a `dask` `Client` 

In [None]:
VanillaMDSimulator.simulate?

Lets extend the `results` list of inputs so that these could be run in parallel; this will result in 3 simulation instances.

In [None]:
results_par = results + results + results # duplicate X3 

In [None]:
md_simulator = VanillaMDSimulator(
            output_dir="tutorial_files/running_md_simulations/",
            equilibration_steps=1,
            num_steps=1,
            reporting_interval=1)

simulation_results_parallel = md_simulator.simulate(
            results_par,
            dask_client=gpu_client,
            failure_mode="skip",
            use_dask=True)

The simulation results are stored in a list, where the outputs for each simulation can be accessed in the same way as demonstrated above:

In [None]:
for i, sim_result in enumerate(simulation_results_parallel):
    print(f"simulation {i}: ", sim_result.traj_path)