# Setting up and running absolute hydration free energy calculations

This tutorial gives a step-by-step process to set up absolute hydration free energy (AHFE) simulation campaign using OpenFE. In this tutorial we are performing an absolute hydration free energy calculation of benzene.

In [1]:
%matplotlib inline
import openfe

## 1. Loading the ligand

First we must load the chemical models between which we wish to calculate free energies.
In this example these are initially stored in a molfile (`.sdf`) containing multiple molecules.
This can be loaded using the `SDMolSupplier` class from rdkit and passed to openfe.

In [2]:
from rdkit import Chem
supp = Chem.SDMolSupplier("../cookbook/assets/benzene.sdf", removeHs=False)
ligands = [openfe.SmallMoleculeComponent.from_rdkit(mol) for mol in supp]

## 2. Creating `ChemicalSystem`s

OpenFE describes complex molecular systems as being composed of `Component`s. For example, we have `SmallMoleculeComponent` for each small molecule in the `LigandNetwork`. We'll create a `SolventComponent` to describe the solvent.

The `Component`s are joined in a `ChemicalSystem`, which describes all the particles in the simulation.

Note that for AHFE simulations, we are not separately defining the vacuum state, but the protocol creates that based on the solvent states.

In [3]:
# defaults are water with NaCl at 0.15 M
solvent = openfe.SolventComponent()

In [4]:
# In state A the ligand is fully interacting in the solvent
systemA = openfe.ChemicalSystem({
    'ligand': ligands[0],
    'solvent': solvent,
}, name=ligands[0].name)
# In state B the ligand is fully decoupled in the solvent, therefore we are only defining the solvent here
systemB = openfe.ChemicalSystem({'solvent': solvent})

## 3. Defining the AHFE simulation settings and creating a `Protocol`

There are various different parameters which can be set to determine how the AHFE simulation will take place. 

The easiest way to customize protocol settings is to start with the default settings, and modify them. Many settings carry units with them.

In [5]:
from openfe.protocols.openmm_afe import AbsoluteSolvationProtocol

In [6]:
settings = AbsoluteSolvationProtocol.default_settings()

Displaying the default values:

In [7]:
settings.thermo_settings.temperature  # display default value

In [8]:
settings.lambda_settings.lambda_elec # display default value

[0.0, 0.25, 0.5, 0.75, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

In [9]:
settings.solvent_simulation_settings.equilibration_length

Changing default values:

In [10]:
from openff.units import unit

# change the values
settings.thermo_settings.temperature = 300.0 * unit.kelvin
settings.lambda_settings.lambda_elec = [0.0, 0.26, 0.5, 0.75, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
settings.solvent_simulation_settings.equilibration_length = 500 * unit.picosecond

In [11]:
settings.solvent_simulation_settings.equilibration_length = 10 * unit.picosecond
settings.solvent_simulation_settings.production_length = 500 * unit.picosecond
settings.vacuum_simulation_settings.equilibration_length = 10 * unit.picosecond
settings.vacuum_simulation_settings.production_length = 500 * unit.picosecond
settings.solvent_engine_settings.compute_platform = 'CUDA'

Here a view of all the settings that the user can modify as shown in the examples above:

In [12]:
from pprint import pprint
pprint(settings.dict())

{'alchemical_settings': {},
 'integrator_settings': {'barostat_frequency': <Quantity(25, 'timestep')>,
                         'constraint_tolerance': 1e-06,
                         'langevin_collision_rate': <Quantity(1.0, '1 / picosecond')>,
                         'n_restart_attempts': 20,
                         'reassign_velocities': False,
                         'remove_com': False,
                         'timestep': <Quantity(4, 'femtosecond')>},
 'lambda_settings': {'lambda_elec': [0.0,
                                     0.26,
                                     0.5,
                                     0.75,
                                     1.0,
                                     1.0,
                                     1.0,
                                     1.0,
                                     1.0,
                                     1.0,
                                     1.0,
                                     1.0,
                            

### Creating the `Protocol`
With the Settings inspected and adjusted, we can provide these to the `Protocol`. This `Protocol` defines the procedure to estimate a free energy difference between two chemical systems, with the details of the two end states yet to be defined.

In [13]:
protocol = AbsoluteSolvationProtocol(settings=settings)

## 4. Running the AHFE simulation using the CLI command `openfe quickrun`

Once we have the ChemicalSystems, and the Protocol, we can create the Transformation. 

In [14]:
transformation = openfe.Transformation(
            stateA=systemA,
            stateB=systemB,
            mapping=None,
            protocol=protocol,  # use protocol created above
            name=f"{systemA.name}"
        )

We'll write out the transformation to disk, so that it can be run using the openfe quickrun command:

In [15]:
import pathlib
# first we create the directory
transformation_dir = pathlib.Path("ahfe_json")
transformation_dir.mkdir(exist_ok=True)

# then we write out the transformation
transformation.dump(transformation_dir / f"{transformation.name}.json")

You can run the AHFE simulation from the CLI by using the `openfe quickrun` command. It takes a transformation JSON as input, and the flags -o to give the final output JSON file and -d for the directory where simulation results should be stored. For example,

`openfe quickrun path/to/transformation.json -o results.json -d working-directory`

where path/to/transformation.json is the path to one of the files created above.

## 5. Running the AHFE simulation using the Python API
**Creating the `ProtocolDAG`**

Once we have the two `ChemicalSystem`s, and the `Protocol`, we can create the `ProtocolDAG`.

This creates a directed-acyclic-graph (DAG) of computational tasks necessary for creating an estimate of the free energy difference between the two chemical systems.

In [15]:
dag = protocol.create(stateA=systemA, stateB=systemB, mapping=None)

To summarize, this `ProtocolDAG` contains:
- chemical models of both sides of the alchemical transformation in `systemA` and `systemB`
- a description of the exact computational algorithm to use to perform the estimate in `protocol`
- the `mapping` is set to `None` since no atoms are mapped in the AHFE protocol

**Executing the simulation**

The DAG contains many invdividual jobs. We can execute them sequentially in this notebook using the `gufe.protocols.execute` function.

In a more realistic (expansive) situation we would farm off the individual jobs to a HPC cluster or cloud compute service so they could be executed in parallel.

Note: we use the `shared_basedir` and `scratch_basedir` argument of `execute_DAG` in order to set the directory where the simulation files are written to

In [16]:
from gufe.protocols import execute_DAG
import pathlib

In [17]:
# Finally we can run the simulations
path = pathlib.Path('./ahfe')
path.mkdir()

# Execute the DAG
dag_results = execute_DAG(dag, scratch_basedir=path, shared_basedir=path, n_retries=3)



Please cite the following:

        Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing unit. J. Comput. Chem. 30:864, 2009. DOI: 10.1002/jcc.21209
        Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations. Comput. Sci. Eng. 12:34, 2010. DOI: 10.1109/MCSE.2010.27
        Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 31:1268, 2010. DOI: 10.1002/jcc.21413
        Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations. J. Chem. Theor. Comput. 6:434, 2010. DOI: 10.1021/ct900463w
        Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs multistate: Simple improvements for enhanced mixing. J. Chem. Phys., 135:194110, 2011. DOI:10.1063/



Please cite the following:

        Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing unit. J. Comput. Chem. 30:864, 2009. DOI: 10.1002/jcc.21209
        Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations. Comput. Sci. Eng. 12:34, 2010. DOI: 10.1109/MCSE.2010.27
        Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 31:1268, 2010. DOI: 10.1002/jcc.21413
        Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations. J. Chem. Theor. Comput. 6:434, 2010. DOI: 10.1021/ct900463w
        Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs multistate: Simple improvements for enhanced mixing. J. Chem. Phys., 135:194110, 2011. DOI:10.1063/



Please cite the following:

        Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing unit. J. Comput. Chem. 30:864, 2009. DOI: 10.1002/jcc.21209
        Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations. Comput. Sci. Eng. 12:34, 2010. DOI: 10.1109/MCSE.2010.27
        Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 31:1268, 2010. DOI: 10.1002/jcc.21413
        Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations. J. Chem. Theor. Comput. 6:434, 2010. DOI: 10.1021/ct900463w
        Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs multistate: Simple improvements for enhanced mixing. J. Chem. Phys., 135:194110, 2011. DOI:10.1063/



Please cite the following:

        Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing unit. J. Comput. Chem. 30:864, 2009. DOI: 10.1002/jcc.21209
        Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations. Comput. Sci. Eng. 12:34, 2010. DOI: 10.1109/MCSE.2010.27
        Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 31:1268, 2010. DOI: 10.1002/jcc.21413
        Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations. J. Chem. Theor. Comput. 6:434, 2010. DOI: 10.1021/ct900463w
        Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs multistate: Simple improvements for enhanced mixing. J. Chem. Phys., 135:194110, 2011. DOI:10.1063/



Please cite the following:

        Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing unit. J. Comput. Chem. 30:864, 2009. DOI: 10.1002/jcc.21209
        Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations. Comput. Sci. Eng. 12:34, 2010. DOI: 10.1109/MCSE.2010.27
        Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 31:1268, 2010. DOI: 10.1002/jcc.21413
        Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations. J. Chem. Theor. Comput. 6:434, 2010. DOI: 10.1021/ct900463w
        Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs multistate: Simple improvements for enhanced mixing. J. Chem. Phys., 135:194110, 2011. DOI:10.1063/



Please cite the following:

        Friedrichs MS, Eastman P, Vaidyanathan V, Houston M, LeGrand S, Beberg AL, Ensign DL, Bruns CM, and Pande VS. Accelerating molecular dynamic simulations on graphics processing unit. J. Comput. Chem. 30:864, 2009. DOI: 10.1002/jcc.21209
        Eastman P and Pande VS. OpenMM: A hardware-independent framework for molecular simulations. Comput. Sci. Eng. 12:34, 2010. DOI: 10.1109/MCSE.2010.27
        Eastman P and Pande VS. Efficient nonbonded interactions for molecular dynamics on a graphics processing unit. J. Comput. Chem. 31:1268, 2010. DOI: 10.1002/jcc.21413
        Eastman P and Pande VS. Constant constraint matrix approximation: A robust, parallelizable constraint method for molecular simulations. J. Chem. Theor. Comput. 6:434, 2010. DOI: 10.1021/ct900463w
        Chodera JD and Shirts MR. Replica exchange and expanded ensemble simulations as Gibbs multistate: Simple improvements for enhanced mixing. J. Chem. Phys., 135:194110, 2011. DOI:10.1063/



## 6. Analysis

Finally now that we've run our simulations, let's go ahead and gather the free
energies for both phases.

This can be achieved by passing the results of executing the DAGs calling the `gather()` method.
This takes a **list** of completed DAG results, catering for when simulations have been extended.

In [18]:
# Get the complex and solvent results
protocol_results = protocol.gather([dag_results])

print(f"AHFE dG: {protocol_results.get_estimate()}, err {protocol_results.get_uncertainty()}")

AHFE dG: -0.8757588273304342 kilocalorie_per_mole, err 0.15236223232996626 kilocalorie_per_mole


In [23]:
# Save the results in a json file
import gzip
import json
import gufe
outdict = {
    "estimate": protocol_results.get_estimate(),
    "uncertainty": protocol_results.get_uncertainty(),
    "protocol_result": protocol_results.to_dict(),
    "unit_results": {
        unit.key: unit.to_keyed_dict()
        for unit in dag_results.protocol_unit_results
    }
}

with gzip.open("ahfe_benzene_json_results.gz", 'wt') as zipfile:
    json.dump(outdict, zipfile, cls=gufe.tokenization.JSON_HANDLER.encoder)