# Preparing `AlchemicalNetwork`s for use with `fah-alchemy`

`fah-alchemy` is a platform for evluating the free energy differences between chemical systems in an alchemical network.
This notebook will illustrate how to build alchemical networks suitable for submission to a deployed `fah-alchemy` instance.

`fah-alchemy` works in terms of `gufe` objects; the `gufe` module defines the data model for `AlchemicalNetwork`s and all objects they are composed of. We'll import the classes of objects we'll use in this tutorial here.

In [1]:
# suppress `numba` warnings, if present
from numba.core.errors import NumbaWarning
import warnings

warnings.simplefilter('ignore', category=NumbaWarning)

In [2]:
from gufe import AlchemicalNetwork, Transformation, ChemicalSystem
from gufe.components import ProteinComponent, SmallMoleculeComponent, SolventComponent

from openff.units import unit

LICENSE: Could not open license file "oe_license.txt" in local directory
LICENSE: N.B. OE_LICENSE environment variable is not set
LICENSE: N.B. OE_DIR environment variable is not set
LICENSE: No product keys!
LICENSE: No product keys!
LICENSE: No product keys!
LICENSE: No product keys!


## Sample network from `openfe-benchmark`

We'll use a sample network in `openfe-benchmark` for demonstration purposes. The sources can be found here: https://github.com/OpenFreeEnergy/openfe-benchmarks

In particular, we'll use the `tyk2` network.  We'll extract ligands manually from the ligand SDF, and the protein target from its PDB.

In [3]:
from importlib import resources
from rdkit import Chem

from openfe_benchmarks import tyk2



In [4]:
tyk2_system = tyk2.get_system()
tyk2_system

<openfe_benchmarks.utils.RBFEBenchmarkSystem at 0x7f0aa4c079d0>

The connections for the network are defined here; we'll use these for building up our own `AlchemicalNetwork`.

In [5]:
tyk2_system.connections

[('lig_ejm_31', 'lig_ejm_50'),
 ('lig_ejm_46', 'lig_jmc_23'),
 ('lig_ejm_31', 'lig_ejm_55'),
 ('lig_ejm_31', 'lig_ejm_48'),
 ('lig_ejm_31', 'lig_ejm_54'),
 ('lig_ejm_31', 'lig_ejm_47'),
 ('lig_ejm_31', 'lig_ejm_46'),
 ('lig_ejm_46', 'lig_jmc_27'),
 ('lig_ejm_46', 'lig_jmc_28'),
 ('lig_ejm_42', 'lig_ejm_43'),
 ('lig_ejm_31', 'lig_ejm_42'),
 ('lig_ejm_45', 'lig_ejm_55')]

## Define `ChemicalSystem`s for network nodes

An `AlchemicalNetwork` features `ChemicalSystem`s as nodes and `Transformation`s as directed edges between nodes. We'll start by defining the nodes for our network.

A `ChemicalSystem` is made of one or more `Component`s. These can be one of `ProteinComponent`, `SmallMoleculeComponent`, or `SolventComponent`, and potentially others as needed. This design allows for memory efficient representation of large networks with perhaps hundreds or thousands of nodes, but perhaps far fewer variants in proteins, ligands, etc.

### Define `Component`s for a given `ChemicalSystem`

Let's start by assembling the ligands. These are defined as `SmallMoleculeComponent`s, and can be initialized with RDKit molecules. 

We'll read a multimolecule SDF from `openfe-benchmarks` and create a `SmallMoleculeComponent` for each ligand in the file:

In [6]:
with resources.path('openfe_benchmarks.data',
                    'tyk2_ligands.sdf') as fn:
    ligands_sdf = Chem.SDMolSupplier(str(fn), removeHs=False)
    ligands  = [SmallMoleculeComponent(rdkit_ligand) for rdkit_ligand in ligands_sdf]

ligands

[SmallMoleculeComponent(name=lig_ejm_31),
 SmallMoleculeComponent(name=lig_ejm_42),
 SmallMoleculeComponent(name=lig_ejm_43),
 SmallMoleculeComponent(name=lig_ejm_45),
 SmallMoleculeComponent(name=lig_ejm_46),
 SmallMoleculeComponent(name=lig_ejm_47),
 SmallMoleculeComponent(name=lig_ejm_48),
 SmallMoleculeComponent(name=lig_ejm_50),
 SmallMoleculeComponent(name=lig_ejm_54),
 SmallMoleculeComponent(name=lig_ejm_55),
 SmallMoleculeComponent(name=lig_jmc_23),
 SmallMoleculeComponent(name=lig_jmc_27),
 SmallMoleculeComponent(name=lig_jmc_28)]

We'll also load our protein into a `ProteinComponent`:

In [7]:
with resources.path('openfe_benchmarks.data',
                    'tyk2_protein.pdb') as fn:
    protein = ProteinComponent.from_pdb_file(str(fn), name='tyk2')

protein

ProteinComponent(name=tyk2)

We'll also need at least one `SolventComponent` to encode our choice of solvent and counterions, with concentration:

In [8]:
solvent = SolventComponent(positive_ion='Na', 
                           negative_ion='Cl',
                           neutralize=True, 
                           ion_concentration=0.15*unit.molar)
solvent

SolventComponent(name=O, Na+, Cl-)

The `SolventComponent` doesn't actually perform any actual solvation (packing water molecules, ions); that is performed just before simulation time during `Protocol` execution.

Each of the ligands have been pre-docked into the protein and aligned to their common scaffold. It is important to recognize that any processing required to prepare ligand and protein structures for alchemical free energy calculations should be done *before* the steps we are taking here.

### Build the `ChemicalSystem`s

We can now construct the `ChemicalSystem`s we want represented in our network. Since we are planning to perform relative binding free energy (RBFE) calculations, we'll define both *complex* and *solvent* variants for each ligand.

In [9]:
complexed = {l.name: ChemicalSystem(components={'ligand': l, 
                                                'solvent': solvent, 
                                                'protein': protein}, 
                                    name=f"{l.name}_complex") 
             for l in ligands}
complexed

{'lig_ejm_31': ChemicalSystem(name=lig_ejm_31_complex, components={'ligand': SmallMoleculeComponent(name=lig_ejm_31), 'solvent': SolventComponent(name=O, Na+, Cl-), 'protein': ProteinComponent(name=tyk2)}),
 'lig_ejm_42': ChemicalSystem(name=lig_ejm_42_complex, components={'ligand': SmallMoleculeComponent(name=lig_ejm_42), 'solvent': SolventComponent(name=O, Na+, Cl-), 'protein': ProteinComponent(name=tyk2)}),
 'lig_ejm_43': ChemicalSystem(name=lig_ejm_43_complex, components={'ligand': SmallMoleculeComponent(name=lig_ejm_43), 'solvent': SolventComponent(name=O, Na+, Cl-), 'protein': ProteinComponent(name=tyk2)}),
 'lig_ejm_45': ChemicalSystem(name=lig_ejm_45_complex, components={'ligand': SmallMoleculeComponent(name=lig_ejm_45), 'solvent': SolventComponent(name=O, Na+, Cl-), 'protein': ProteinComponent(name=tyk2)}),
 'lig_ejm_46': ChemicalSystem(name=lig_ejm_46_complex, components={'ligand': SmallMoleculeComponent(name=lig_ejm_46), 'solvent': SolventComponent(name=O, Na+, Cl-), 'protei

In [10]:
solvated = {l.name: ChemicalSystem(components={'ligand': l, 
                                               'solvent': solvent}, 
                                   name=f"{l.name}_water") 
            for l in ligands}
solvated

{'lig_ejm_31': ChemicalSystem(name=lig_ejm_31_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_31), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_42': ChemicalSystem(name=lig_ejm_42_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_42), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_43': ChemicalSystem(name=lig_ejm_43_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_43), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_45': ChemicalSystem(name=lig_ejm_45_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_45), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_46': ChemicalSystem(name=lig_ejm_46_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_46), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_47': ChemicalSystem(name=lig_ejm_47_water, components={'ligand': SmallMoleculeComponent(name=lig_ejm_47), 'solvent': SolventComponent(name=O, Na+, Cl-)}),
 'lig_ejm_

We now have all our network nodes defined. Next, we need to define the `Transformation`s that we wish to perform between them.

## Define `Transformation`s between `ChemicalSystem`s as network edges

A `Transformation` is a directed edge between two `ChemicalSystem`s. It includes a `Protocol` parameterized with `Settings`, and if optionally a `ComponentMapping`. 

The `Protocol` defines the actual computational method used to evaluate the `Transformation` to yield estimates for the free energy difference between the `ChemicalSystem`s.

The `ComponentMapping` defines the atom mapping(s) between corresponding `Component`s in the two `ChemicalSystem`s. This is often critical for relative binding free energy calculations, since the choice of mapping can heavily influence convergence of the resulting estimates.

### Define the `Protocol` used for `Transformation` evaluation

For this example, we'll use the same `Protocol` for all our `Transformation`s, with identical `Settings` for each.

In [13]:
from perses.protocols.nonequilibrium_cycling import NonEquilibriumCyclingProtocol

Any given `Protocol` features a `default_settings` method, which can be used to get the default settings that are specific to that `Protocol`:

In [23]:
protocol_settings = NonEquilibriumCyclingProtocol.default_settings()
protocol_settings.dict()

{'lambda_functions': {'lambda_sterics_core': 'lambda',
  'lambda_electrostatics_core': 'lambda',
  'lambda_sterics_insert': 'select(step(lambda - 0.5), 1.0, 2.0 * lambda)',
  'lambda_sterics_delete': 'select(step(lambda - 0.5), 2.0 * (lambda - 0.5), 0.0)',
  'lambda_electrostatics_insert': 'select(step(lambda - 0.5), 2.0 * (lambda - 0.5), 0.0)',
  'lambda_electrostatics_delete': 'select(step(lambda - 0.5), 1.0, 2.0 * lambda)',
  'lambda_bonds': 'lambda',
  'lambda_angles': 'lambda',
  'lambda_torsions': 'lambda'},
 'softcore_LJ_v2': True,
 'interpolate_old_and_new_14s': False,
 'phase': 'vacuum',
 'forcefield_files': ['amber/ff14SB.xml',
  'amber/tip3p_standard.xml',
  'amber/tip3p_HFE_multivalent.xml',
  'amber/phosaa10.xml'],
 'small_molecule_forcefield': 'openff-2.0.0',
 'timestep': 4.0 <Unit('femtosecond')>,
 'neq_splitting': 'V R H O R V',
 'eq_steps': 1000,
 'neq_steps': 100,
 'platform': 'CUDA',
 'save_frequency': 100}

These can be edited, e.g. with:

In [22]:
protocol_settings.save_frequency = 200

TypeError: "NonEqCyclingSettings" is immutable and does not support item assignment

We'll construct our full `Settings` for our chosen `NonEquilibriumCyclingProtocol`, which will include the more general `ThermoSettings` and `ForcefieldSettings` as well:

In [24]:
from openff.units import unit
from gufe.settings.models import (
    Settings, 
    ThermoSettings, 
    ForcefieldSettings,
)
from perses.protocols.settings import NonEqCyclingSettings

settings = Settings(
    settings_version=0,
    forcefield_file="foobar.xml", 
    forcefield_settings=ForcefieldSettings(),
    thermo_settings=ThermoSettings(temperature=300*unit.kelvin),
    protocol_settings=protocol_settings,
)

ValidationError: 3 validation errors for ForcefieldSettings
vdW
  field required (type=value_error.missing)
electrostatics
  field required (type=value_error.missing)
gbsa
  field required (type=value_error.missing)

We can now produce a parameterized `NonEquilibriumCyclingProtocol` instance:

In [25]:
protocol = NonEquilibriumCyclingProtocol(settings)

NameError: name 'settings' is not defined

### Build the `Transformation`s

We can now construct the `Transformation`s we want represented in our network. We'll use the predefined connections from the `tyk2` system from above as the basis for our choices here, but you could use any network planner of your choice to generate connections and use those instead.

In [26]:
tyk2_system.connections

[('lig_ejm_31', 'lig_ejm_50'),
 ('lig_ejm_46', 'lig_jmc_23'),
 ('lig_ejm_31', 'lig_ejm_55'),
 ('lig_ejm_31', 'lig_ejm_48'),
 ('lig_ejm_31', 'lig_ejm_54'),
 ('lig_ejm_31', 'lig_ejm_47'),
 ('lig_ejm_31', 'lig_ejm_46'),
 ('lig_ejm_46', 'lig_jmc_27'),
 ('lig_ejm_46', 'lig_jmc_28'),
 ('lig_ejm_42', 'lig_ejm_43'),
 ('lig_ejm_31', 'lig_ejm_42'),
 ('lig_ejm_45', 'lig_ejm_55')]

**TODO: need to add mappings for each edge; these would be included in the `Transformation` creations below.**

Since we are planning to perform relative binding free energy (RBFE) calculations, we'll define both *complex* and *solvent* variants for each `Transformation`:

In [None]:
complexed_transformations = [Transformation(stateA=complexed[edge[0]], 
                                            stateB=complexed[edge[1]], 
                                            protocol=protocol) 
                             for edge in tyk2s.connections]

In [None]:
solvated_transformations = [Transformation(stateA=solvated[edge[0]], 
                                           stateB=solvated[edge[1]], 
                                           protocol=protocol) 
                            for edge in tyk2s.connections]

## Create the `AlchemicalNetwork`

An `AlchemicalNetwork` is simply the combination of `ChemicalSystem`s (nodes) and `Transformation`s (directed edges) that we want to evaluate $\Delta G$s for. This data structure functions as a declaration of what you want to compute, and is the central object on which systems like `fah-alchemy` operate. 

We'll finish here by creating an `AlchemicalNetwork` from the collection of objects we've built so far.

In [27]:
network = AlchemicalNetwork(edges=(solvated_transformations + complex_transformations), 
                            nodes=(solvated + complexed),
                            name="tyk2_relative_benchmark")
network

NameError: name 'solvent_network' is not defined

That's it! We simply toss in all `Transformation`s (edges) and `ChemicalSystem`s (nodes) we want included in this `AlchemicalNetwork`, and optionally give it a name that means something to us (it need not be unique, but can be used to query for network(s) from `fah-alchemy` later).

We could have chosen here to leave the `nodes` argument off, since every `ChemicalSystem` we included was already represented among the `edges`, but we show it here for completeness. In this way, it's possible to include `ChemicalSystem`s in the network that aren't connected via any `Transformation`s to others, though in practice there isn't much utility in this.

### Optional: Run a `Protocol` locally

We can run our parameterized `NonEqulibriumCyclingProtocol` locally as a way to check if things are working as we expect. We'll pick one of our `Transformation`s out from our `AlchemicalNetwork`:

In [None]:
transformation = list(network.edges)[0]

We'll generate a `ProtocolDAG` that encodes the actual operations to perform in order to execute the `Protocol`:

In [None]:
protocoldag = transformation.create()

And we'll run it locally, in-process. This will run each `ProtocolUnit` in the `ProtocolDAG` in series, in dependency order:

In [6]:
from gufe.protocols.protocoldag import execute_DAG

In [None]:
protocoldagresult = execute_DAG(protocoldag)

The above will raise an exception if at any point execution failed.

## Submitting to a `fah-alchemy` instance

We'd like to evaluate the `Transformation`s we defined in our `AlchemicalNetwork`, but to do this at scale we'll submit this to a `fah-alchemy` API instance.

Deploying `fah-alchemy` is outside of the scope of this tutorial, and we assume here that there is already an instance in place and network-reachable from the machine running this notebook.

In [4]:
from getpass import getpass

from fah_alchemy import FahAlchemyClient, Scope

Instantiate a `FahAlchemyClient`, giving the URL to the target `FahAlchemyAPI`, as well as your user identifier and key (password):

In [3]:
faclient = FahAlchemyClient("https://api.targetserver.org",
                            "username",
                            getpass())

NameError: name 'FahAlchemyClient' is not defined

We'll then use this client to submit our `AlchemicalNetwork`, indicating the `Scope` (organization, campaign, project) that this network should be submitted under:

In [None]:
faclient.create_network(network, Scope('my_org', 'my_campaign', 'my_project'))

We can then action which `Transformation`s we want computed: