# Preparing `AlchemicalNetwork`s

This notebook will illustrate how to build `AlchemicalNetwork` objects,
from a starting point of chemical models stored in sdf and pdb files.

An `AlchemicalNetwork` is used to represent an entire network of calculations,
and is composed of many smaller objects:

- An `AlchemicalNetwork` composed of 
  - each node a `ChemicalSystem`
    - each containing many components, such as `SmallMoleculeComponent`, `ProteinComponent`
      - internally each Component usually wraps an RDKit representation
  - each directed edge a `Transformation`, containing
    - two `ChemicalSystem`s, the 'A' and 'B' side
    - zero or more `Mapping` objects relating these two sides
    - a `Protocol` defining the computational method to be applied to other items

In [1]:
# suppress `numba` warnings, if present
from numba.core.errors import NumbaWarning
import warnings

warnings.simplefilter('ignore', category=NumbaWarning)

In [2]:
import openfe
from gufe import AlchemicalNetwork
from openff.units import unit
from rdkit import Chem

## Define `ChemicalSystem`s for network nodes

We'll start by defining the nodes for our network.
A `ChemicalSystem` is made of one or more `Component`s. These can be one of `ProteinComponent`, `SmallMoleculeComponent`, or `SolventComponent`, and potentially others as needed. This design allows for memory efficient representation of large networks with perhaps hundreds or thousands of nodes, but perhaps far fewer variants in proteins, ligands, etc.

### Reading Ligands

The ligands are concatenated in a single sdf file, we'll read these using RDKit.

Each of the ligands have been pre-docked into the protein and aligned to their common scaffold. It is important to recognize that any processing required to prepare ligand and protein structures for alchemical free energy calculations should be done *before* the steps we are taking here.

In [3]:
ligands = [
    openfe.SmallMoleculeComponent(m) for m in Chem.SDMolSupplier('data/tyk2_ligands.sdf', removeHs=False)
]
ligands

[SmallMoleculeComponent(name=lig_ejm_31),
 SmallMoleculeComponent(name=lig_ejm_42),
 SmallMoleculeComponent(name=lig_ejm_43),
 SmallMoleculeComponent(name=lig_ejm_45),
 SmallMoleculeComponent(name=lig_ejm_46),
 SmallMoleculeComponent(name=lig_ejm_47),
 SmallMoleculeComponent(name=lig_ejm_48),
 SmallMoleculeComponent(name=lig_ejm_50),
 SmallMoleculeComponent(name=lig_ejm_54),
 SmallMoleculeComponent(name=lig_ejm_55),
 SmallMoleculeComponent(name=lig_jmc_23),
 SmallMoleculeComponent(name=lig_jmc_27),
 SmallMoleculeComponent(name=lig_jmc_28)]

### Reading the protein

The protein is supplied as a PDB file, readable via the `ProteinComponent.from_pdb_file` class method.

In [4]:
protein = openfe.ProteinComponent.from_pdb_file('./data/tyk2_protein.pdb', name='tyk2')

protein

ProteinComponent(name=tyk2)

### Defining the solvent

We'll also need at least one `SolventComponent` to encode our choice of solvent and counterions, with concentration.
The concentration is defined as having units supplied by `openff.units`, this package is used to avoid confusion.


The `SolventComponent` doesn't actually perform any actual solvation (packing water molecules, ions); that is performed just before simulation time during `Protocol` execution.

In [5]:
solvent = openfe.SolventComponent(positive_ion='Na', 
                                  negative_ion='Cl',
                                  neutralize=True, 
                                  ion_concentration=0.15*unit.molar)
solvent

SolventComponent(name=O, Na+, Cl-)

### Build the `ChemicalSystem`s

We can now construct the `ChemicalSystem`s we want represented in our network. Since we are planning to perform relative binding free energy (RBFE) calculations, we'll define both *complex* and *solvent* variants for each ligand.

This produces a dictionary mapping the ligand name to the `ChemicalSystem` that contains that ligand.
There are two dictionaries, for complexed and solvated ligands respectively.

In [6]:
complexed = {l.name: openfe.ChemicalSystem(components={'ligand': l,
                                                       'solvent': solvent, 
                                                       'protein': protein}, 
                                           name=f"{l.name}_complex") 
             for l in ligands}
complexed

{'lig_ejm_31': ChemicalSystem(name=lig_ejm_31_complex, components={'ligand': SmallMoleculeComponent(name=lig_ejm_31), 'solvent': SolventComponent(name=O, Na+, Cl-), 'protein': ProteinComponent(name=tyk2)}),
 'lig_ejm_42': ChemicalSystem(name=lig_ejm_42_complex, components={'ligand': SmallMoleculeComponent(name=lig_ejm_42), 'solvent': SolventComponent(name=O, Na+, Cl-), 'protein': ProteinComponent(name=tyk2)}),
 'lig_ejm_43': ChemicalSystem(name=lig_ejm_43_complex, components={'ligand': SmallMoleculeComponent(name=lig_ejm_43), 'solvent': SolventComponent(name=O, Na+, Cl-), 'protein': ProteinComponent(name=tyk2)}),
 'lig_ejm_45': ChemicalSystem(name=lig_ejm_45_complex, components={'ligand': SmallMoleculeComponent(name=lig_ejm_45), 'solvent': SolventComponent(name=O, Na+, Cl-), 'protein': ProteinComponent(name=tyk2)}),
 'lig_ejm_46': ChemicalSystem(name=lig_ejm_46_complex, components={'ligand': SmallMoleculeComponent(name=lig_ejm_46), 'solvent': SolventComponent(name=O, Na+, Cl-), 'protei

In [7]:
solvated = {l.name: openfe.ChemicalSystem(components={'ligand': l, 
                                                      'solvent': solvent}, 
                                          name=f"{l.name}_water") 
            for l in ligands}
solvated

{'lig_ejm_31': ChemicalSystem(name=lig_ejm_31_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_31), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_42': ChemicalSystem(name=lig_ejm_42_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_42), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_43': ChemicalSystem(name=lig_ejm_43_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_43), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_45': ChemicalSystem(name=lig_ejm_45_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_45), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_46': ChemicalSystem(name=lig_ejm_46_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_46), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_47': ChemicalSystem(name=lig_ejm_47_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_47), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_

## Define `Transformation`s between `ChemicalSystem`s as network edges

A `Transformation` is a directed edge between two `ChemicalSystem`s. It includes a `Protocol` parameterized with `Settings`, and optionally a `ComponentMapping`. 

The `Protocol` defines the actual computational method used to evaluate the `Transformation` to yield estimates for the free energy difference between the `ChemicalSystem`s.

The `ComponentMapping` defines the atom mapping(s) between corresponding `Component`s in the two `ChemicalSystem`s. This is often critical for relative binding free energy calculations, since the choice of mapping can heavily influence convergence of the resulting estimates.

### Define the `Protocol` used for `Transformation` evaluation

For this example, we'll use the same `Protocol` for all our `Transformation`s, with identical `Settings` for each.

In [8]:
from openfe.protocols import openmm_rfe


Any given `Protocol` has a `default_settings()` method, which can be used to get the default settings that are specific to that `Protocol`:

In [9]:
protocol_settings = openmm_rfe.RelativeHybridTopologyProtocol.default_settings()
protocol_settings.dict()

{'forcefield_settings': {'constraints': 'hbonds',
  'rigid_water': True,
  'remove_com': False,
  'hydrogen_mass': 4.0,
  'forcefields': ['amber/ff14SB.xml',
   'amber/tip3p_standard.xml',
   'amber/tip3p_HFE_multivalent.xml',
   'amber/phosaa10.xml'],
  'small_molecule_forcefield': 'openff-2.0.0'},
 'thermo_settings': {'temperature': 298.15 <Unit('kelvin')>,
  'pressure': 0.9869232667160129 <Unit('standard_atmosphere')>,
  'ph': None,
  'redox_potential': None},
 'system_settings': {'nonbonded_method': 'PME',
  'nonbonded_cutoff': 1.0 <Unit('nanometer')>},
 'solvation_settings': {'solvent_model': 'tip3p',
  'solvent_padding': 1.2 <Unit('nanometer')>},
 'alchemical_settings': {'lambda_functions': 'default',
  'lambda_windows': 11,
  'unsampled_endstates': False,
  'use_dispersion_correction': False,
  'softcore_LJ_v2': True,
  'softcore_electrostatics': True,
  'softcore_alpha': 0.85,
  'softcore_electrostatics_alpha': 0.3,
  'softcore_sigma_Q': 1.0,
  'interpolate_old_and_new_14s': Fa

These can be edited, e.g. with:

In [10]:
protocol_settings.thermo_settings.temperature = 299 * unit.kelvin

We can now produce a parameterized `RelativeHybridTopologyProtocol` instance:

In [11]:
protocol = openmm_rfe.RelativeHybridTopologyProtocol(protocol_settings)

### Build the `Transformation`s

We can now construct the `Transformation`s we want represented in our network.

We'll use the predefined connections from the `tyk2` system from above as the basis for our choices here, but you could use any network planner of your choice to generate connections and use those instead.

In [12]:
from openfe.setup import LomapAtomMapper

mapper = LomapAtomMapper(element_change=False)

connections = [("lig_ejm_31", "lig_ejm_50"),
               ("lig_ejm_46", "lig_jmc_23"),
               ("lig_ejm_31", "lig_ejm_55"),
               ("lig_ejm_31", "lig_ejm_48"),
               ("lig_ejm_31", "lig_ejm_54"),
               ("lig_ejm_31", "lig_ejm_47"),
               ("lig_ejm_31", "lig_ejm_46"),
               ("lig_ejm_46", "lig_jmc_27"),
               ("lig_ejm_46", "lig_jmc_28"),
               ("lig_ejm_42", "lig_ejm_43"),
               ("lig_ejm_31", "lig_ejm_42"),
               ("lig_ejm_45", "lig_ejm_55"),]

Since we are planning to perform relative binding free energy (RBFE) calculations, we'll define both *complex* and *solvent* variants for each `Transformation`:

In [13]:
complexed_transformations = []
solvated_transformations = []

for (ligA_name, ligB_name) in connections:
    ligA = complexed[ligA_name]['ligand']
    ligB = complexed[ligB_name]['ligand']
    
    mapping = next(mapper.suggest_mappings(ligA, ligB))
    
    complexed_transformations.append(
        openfe.Transformation(stateA=complexed[ligA_name], 
                              stateB=complexed[ligB_name], 
                              mapping={'ligand': mapping},
                              protocol=protocol) 
    )
    solvated_transformations.append(
        openfe.Transformation(stateA=solvated[ligA_name], 
                              stateB=solvated[ligB_name], 
                              mapping={'ligand': mapping},
                              protocol=protocol) 
    )

## Create the `AlchemicalNetwork`

An `AlchemicalNetwork` is simply the combination of `ChemicalSystem`s (nodes) and `Transformation`s (directed edges) that we want to evaluate $\Delta G$s for. This data structure functions as a declaration of what you want to compute.

We'll finish here by creating an `AlchemicalNetwork` from the collection of objects we've built so far.

In [14]:
network = AlchemicalNetwork(edges=(solvated_transformations + complexed_transformations), 
                            nodes=(list(solvated.values()) + list(complexed.values())),
                            name="tyk2_relative_benchmark")
network

<AlchemicalNetwork-87d04167d403b27af2292dd1d5a10e70>

That's it! We simply toss in all `Transformation`s (edges) and `ChemicalSystem`s (nodes) we want included in this `AlchemicalNetwork`, and optionally give it a name that means something to us (it need not be unique, but can be used to query for network(s) later).

We could have chosen here to leave the `nodes` argument off, since every `ChemicalSystem` we included was already represented among the `edges`, but we show it here for completeness. In this way, it's possible to include `ChemicalSystem`s in the network that aren't connected via any `Transformation`s to others, though in practice there isn't much utility in this.