# Running a Molecular Dynamics (MD) simulation of a protein-ligand complex

In this notebook we run an MD simulation of benzene bound to T4-lysozyme L99A.

## On the MD protocol

 The plain MD protocol allows the user to run an MD simulation either in solvent or vacuum of e.g. a small molecule, a protein, or a protein-ligand complex.

## 1. Defining the ChemicalSystem

`ChemicalSystems` are OpenFE containers which define the various components which exist in a system of interest. 
Here, we will be passing the `SmallMoleculeComponent` for benzene, a `ProteinComponent` generated from a PDB file, and a `SolventComponent` which will contain the necessary information for OpenMM’s Modeller class to add water and 0.15 M NaCl around the solute when creating the OpenMM simulation objects.

In [1]:
import openfe
from openfe import ChemicalSystem, ProteinComponent, SmallMoleculeComponent, SolventComponent
from openff.units import unit

# Define the ligand we are interested in
ligand = SmallMoleculeComponent.from_sdf_file('assets/benzene.sdf')

# Define the solvent environment and protein structure
solvent = SolventComponent(ion_concentration=0.15 * unit.molar)
protein = ProteinComponent.from_pdb_file('assets/t4_lysozyme.pdb', name='t4-lysozyme')

# create the ChemicalSystem
system = ChemicalSystem({'ligand': ligand, 'protein': protein, 'solvent': solvent}, name=f"{ligand.name}_{protein.name}")

## 2. Defining the MD simulation settings

There are various different parameters which can be set to determine how the MD simulation will take place. To allow for maximum user flexibility, these are defined as a series of settings objects which control the following:

1. `simulation_settings`: Parameters controlling the simulation plan, including the number of minimization steps, the length of the NVT and NPT equilibration, and the length of the production MD run.
3. `output_settings`: Parameters controlling the output from the MD simulations, including file names to save the system after minimization, NVT and NPT equilibration, and production run. Special output indices can be defined if not the entire system with all atoms should be saved (e.g. `not water`). A trajectory write interval determines the frequency of writing frames to the output trajectory.
4. `forcefield_settings`: Settings that define the forcefield for the components, including the general forcefields, the small molecule forcefield, the nonbonded method, and the nonbonded cutoff.
5. `engine_settings`: Parameters determining how the OpenMM engine will execute the simulation. This controls the compute platform which will be used to carry out the simulation.
6. `integrator_settings`: Parameters controlling the LangevinSplittingDynamicsMove integrator used for simulation, as well as the barostat frequency.
7. `partial_charge_settings`: Settings that define which method is used for assigning partial charges.
8. `protocol_repeats`: Defines how often to run the MD protocol.
9. `solvation_settings`: Parameters to control the solvent model and the solvent padding.
10. `thermo_settings`: Parameters to control e.g. the temperature and the pressure of the system.

The easiest way to access and change settings is by first importing the default settings, printing them and then changing the settings according to the user's needs.

In [2]:
from openfe.protocols.openmm_md.plain_md_methods import PlainMDProtocol
from openff.units import unit
from pprint import pprint

settings = PlainMDProtocol.default_settings()
settings.simulation_settings.equilibration_length_nvt = 0.01 * unit.nanosecond
settings.simulation_settings.equilibration_length = 0.01 * unit.nanosecond
settings.simulation_settings.production_length = 0.01 * unit.nanosecond
settings.engine_settings.compute_platform = 'CPU'
pprint(settings.dict())

{'engine_settings': {'compute_platform': 'CPU'},
 'forcefield_settings': {'constraints': 'hbonds',
                         'forcefields': ['amber/ff14SB.xml',
                                         'amber/tip3p_standard.xml',
                                         'amber/tip3p_HFE_multivalent.xml',
                                         'amber/phosaa10.xml'],
                         'hydrogen_mass': 3.0,
                         'nonbonded_cutoff': <Quantity(1.0, 'nanometer')>,
                         'nonbonded_method': 'PME',
                         'rigid_water': True,
                         'small_molecule_forcefield': 'openff-2.0.0'},
 'integrator_settings': {'barostat_frequency': <Quantity(25, 'timestep')>,
                         'constraint_tolerance': 1e-06,
                         'langevin_collision_rate': <Quantity(1.0, '1 / picosecond')>,
                         'n_restart_attempts': 20,
                         'reassign_velocities': False,
                

## 3.  Creating a `Protocol`

The actual simulation is performed by a `Protocol`. 

With the `Settings` inspected and adjusted, we can provide these to the `Protocol`. We'll use an OpenMM-based `PlainMDProtocol`.

In [3]:
# Creating the Protocol
from openfe.protocols.openmm_md.plain_md_methods import PlainMDProtocol
protocol = PlainMDProtocol(settings=settings)

## 4. Creating the `Transformation`
Once we have the `ChemicalSystem`s, and the `Protocol`, we can create the `Transformation`. 

In [4]:
transformation = openfe.Transformation(
            stateA=system,
            stateB=system,
            mapping=None,
            protocol=protocol,  # use protocol created above
            name=f"{system.name}"
        )

## 5. Running the MD simulation

**(a) Using the CLI**

We'll write out the transformation to disk, so that it can be run using the `openfe quickrun` command:

In [5]:
import pathlib
# first we create the directory
transformation_dir = pathlib.Path("md_input")
transformation_dir.mkdir(exist_ok=True)

# then we write out the transformation
transformation.dump(transformation_dir / f"{transformation.name}.json")

You can run the MD simulation from the CLI by using the `openfe quickrun` command. It
takes a transformation JSON as input, and the flags `-o` to give the final
output JSON file and `-d` for the directory where simulation results should be
stored. For example,

```bash
openfe quickrun path/to/transformation.json -o results.json -d working-directory
```

where `path/to/transformation.json` is the path to one of the files created above (`md_input/benzene_t4-lysozyme.json`).

**(b) Using the Python API**

Alternatively, the MD simulation can be run by executing the `ProtocolDAG`. The `ProtocolDAG` is created using the `protocol.create()` method and requires as input the `ChemicalSystem`. 

Note: we use the ``shared_basedir`` and ``scratch_basedir`` argument of ``execute_DAG`` in order to set the directory where the simulation files are written to.

In [8]:
import gufe
import pathlib

# Creating the Protocol
protocol = PlainMDProtocol(settings=settings)
# Creating the Protocol DAG
dag = protocol.create(stateA=system, stateB=system, mapping=None)
workdir = pathlib.Path('./')
# Running the MD simulations
dagres = gufe.protocols.execute_DAG(
    dag,
    shared_basedir=workdir,
    scratch_basedir=workdir,
    keep_shared=True, # set this to True to save the outputs
    n_retries=3
)

