# Running RBFE calculations with OpenFold3 structures

This tutorial gives a step-by-step process to:

**1. Run cofolding on a set of TYK2 ligands**

**2. Prepare the OpenFold3 outputs for free energy calculations**
   - Align complexes
   - Protonate the ligands and protein
   - Output of this step:
     - A single prepared protein structures
     - SDFiles for all the ligands in the set
       
**3. Run a network of binding free energy calculations using the SepTop protocol**

## 1. Run cofolding on a set of TYK2 ligands

### Preparing the Input JSON for OpenFold

This step requires assembling two key pieces of information:

- Ligand SMILES strings
- Protein sequence (extracted from the PDB file)

In this workflow, we begin with an SDF file containing the ligands.
We extract the SMILES representation for each ligand and store them in a dictionary alongside their ligand identifiers.

We also identify the largest ligand.
This is important because, in later stages, we will use the protein structure generated during co-folding with the largest ligand. Doing so helps minimize steric clashes between ligand and protein atoms in subsequent calculations.

In [1]:
ligands_file = "tyk2_ligands.sdf"
from rdkit.Chem.Descriptors3D import Asphericity
from rdkit import Chem

ligands_dict = {}
# Store the surface area to later find the ligand with the largest surface area
sa = []
sppl = Chem.SDMolSupplier(ligands_file, removeHs=True)
for mol in sppl:
    smi = Chem.MolToSmiles(mol)
    name = mol.GetProp("_Name")
    ligands_dict[name] = smi
    vsa = Asphericity(mol)
    sa.append(vsa)

In [2]:
largest_ligand = list(ligands_dict.keys())[sa.index(max(sa))]

In [3]:
ligands_dict

{'lig_ejm_31': 'CC(=O)Nc1cc(NC(=O)c2c(Cl)cccc2Cl)ccn1',
 'lig_ejm_42': 'CCC(=O)Nc1cc(NC(=O)c2c(Cl)cccc2Cl)ccn1',
 'lig_ejm_43': 'CC(C)C(=O)Nc1cc(NC(=O)c2c(Cl)cccc2Cl)ccn1',
 'lig_ejm_46': 'O=C(Nc1ccnc(NC(=O)C2CC2)c1)c1c(Cl)cccc1Cl',
 'lig_ejm_47': 'O=C(Nc1ccnc(NC(=O)C2CCC2)c1)c1c(Cl)cccc1Cl',
 'lig_ejm_48': 'O=C(Nc1ccnc(NC(=O)C2CCCC2)c1)c1c(Cl)cccc1Cl',
 'lig_ejm_50': 'O=C(CO)Nc1cc(NC(=O)c2c(Cl)cccc2Cl)ccn1',
 'lig_jmc_23': 'O=C(Nc1ccnc(NC(=O)[C@H]2C[C@H]2F)c1)c1c(Cl)cccc1Cl',
 'lig_jmc_27': 'O=C(Nc1ccnc(NC(=O)[C@H]2C[C@H]2Cl)c1)c1c(Cl)cccc1Cl',
 'lig_jmc_28': 'C[C@@H]1C[C@@H]1C(=O)Nc1cc(NC(=O)c2c(Cl)cccc2Cl)ccn1'}

We then create the json file that will be the OpenFold input.

In [4]:
import json

# Protein sequence and chain ID
protein_info = {
    "molecule_type": "protein",
    "chain_ids": "A",
    "sequence": "MGSPASDPTVFHKRYLKKIRDLGEGHFGKVSLYCYDPTNDGTGEMVAVKALKADAGPQHRSGWKQEIDILRTLYHEHIIKYKGCCEDAGAASLQLVMEYVPLGSLRDYLPRHSIGLAQLLLFAQQICEGMAYLHAQHYIHRNLAARNVLLDNDRLVKIGDFGLAKAVPEGHEYYRVREDGDSPVFWYAPECLKEYKFYYASDVWSFGVTLYELLTHCDSSQSPPTKFLELIGIAQGQMTVLRLTELLERGERLPRPDKCPAEVYHLMKNCWETEASFRPTFENLIPILKTVHEKYRHHHHHH"
}

# Build the queries dictionary
queries = {}

for name, smiles in ligands_dict.items():
    queries[name] = {
        "chains": [
            protein_info,
            {
                "molecule_type": "ligand",
                "chain_ids": "Z",  
                "smiles": smiles
            }
        ]
    }

# Write to JSON
with open("queries.json", "w") as f:
    json.dump({"queries": queries}, f, indent=4)

### Running OpenFold

1. Create an `output_settings.yml` file to specify .pdb as output fiel format

```
output_writer_settings:
  # change output format to pdb (default: mmcif):
  structure_format: pdb
```

2. Run OpenFold

`run_openfold predict --query_json=queries.json --runner_yaml output_settings.yml`

## 2. Prepare the OpenFold3 outputs for free energy calculations

### Align complexes with MDAnalysis

In [10]:
import MDAnalysis as mda
from MDAnalysis.analysis import align
from rdkit import Chem
import glob

# 1) Load reference and mobile complexes
reference_pdb = f"of3_output/{largest_ligand}_seed_42_sample_1_model.pdb"
other_pdb = glob.glob("of3_output/*_seed_42_sample_1_model.pdb")

ref = mda.Universe(reference_pdb)
for pdb in other_pdb:
    
    ligand = pdb.split('/')[1].split('_seed')[0]
    mob = mda.Universe(pdb)

    # 2) Align mobile → reference using protein backbone (recommended)
    ref_prot = ref.select_atoms("protein and backbone")
    mob_prot = mob.select_atoms("protein and backbone")
    
    # Perform alignment in-place
    align.alignto(mob_prot, ref_prot)
    aligned_protein = mob.select_atoms("all")
    aligned_protein.write(f'aligned/{ligand}.pdb')

    # For the reference pdb, save the protein alone
    if pdb == reference_pdb:
        full_protein = mob.select_atoms("protein")
        full_protein.write(f'protein.pdb')

lig_jmc_27
lig_ejm_46
lig_ejm_50
lig_ejm_47




lig_ejm_31
lig_jmc_23
lig_ejm_42
lig_jmc_28
lig_ejm_48
lig_ejm_43


### Protonate the protein with Proteins.plus

In this example, we use the web server proteins.plus to add hydrogens to the protein.

- Upload the `protein.pdb` file
- Choose `Protoss Hydrogen prediction` and click `Calculate`
- Download the prepared PDB file

### Protonate the ligands with OpenEye

In [16]:
from openeye import oechem, oespruce, oequacpac, oeomega

pdbs = glob.glob("aligned/*.pdb")
for pdb in pdbs:
    ligand_name = pdb.split('/')[1].split('.')[0]
    
    # 1) Load the complex
    ifs = oechem.oemolistream(pdb)
    complex_mol = oechem.OEMol()
    oechem.OEReadMolecule(ifs, complex_mol)
    ifs.close()
    
    # 2) Separate protein and ligand
    ligand = oechem.OEMol()
    protein = oechem.OEMol()
    water = oechem.OEMol()
    other = oechem.OEMol()
    
    # Split the complex
    oechem.OESplitMolComplex(ligand, protein, water, other, complex_mol)
    
    
    # 3) Protonate ligand
    hopt = oechem.OEPlaceHydrogensOptions()
    oechem.OEPlaceHydrogens(ligand, hopt)

    # # Assign chemistry & protonate
    oechem.OEAssignHybridization(ligand)
    oechem.OEAssignAromaticFlags(ligand)
    oechem.OEAssignFormalCharges(ligand)
    
    # Add hydrogens
    oequacpac.OEGetReasonableProtomer(ligand)
    
    ofs_ligand = oechem.oemolostream(f"ligands_prepped/{ligand_name}.sdf")
    ligand.SetTitle(ligand_name)
    oechem.OEWriteMolecule(ofs_ligand, ligand)
    ofs_ligand.close()

## 3. Run a network of binding free energy calculations using the SepTop protocol

In [17]:
%matplotlib inline
import gzip
import json
import logging
import pathlib
import tempfile
from openff.toolkit import (
    Molecule, RDKitToolkitWrapper, AmberToolsToolkitWrapper
)
from openff.toolkit.utils.toolkit_registry import (
    toolkit_registry_manager, ToolkitRegistry
)
from openff.units import unit
from kartograf.atom_aligner import align_mol_shape
from kartograf import KartografAtomMapper
import gufe
from gufe.tokenization import JSON_HANDLER
import openfe
from openfe.protocols.openmm_md.plain_md_methods import PlainMDProtocol
from openfe.protocols.openmm_septop import SepTopProtocol
from openfe.protocols.openmm_septop import (
    SepTopSolventSetupUnit,
    SepTopComplexSetupUnit,
)
from rdkit import Chem

In [18]:
ligand_sdfs = glob.glob('ligands_prepped/*sdf')
ligands = []
for l in ligand_sdfs:
    ligand = openfe.SmallMoleculeComponent.from_sdf_file(l)
    ligands.append(ligand)

In [19]:
protein = "protein_prepped.pdb"

In [20]:
solvent = openfe.SolventComponent()

In [21]:
from openfe.protocols.openmm_utils.omm_settings import OpenFFPartialChargeSettings
from openfe.protocols.openmm_utils.charge_generation import bulk_assign_partial_charges

charge_settings = OpenFFPartialChargeSettings(partial_charge_method="am1bcc", off_toolkit_backend="ambertools")

charged_ligands = bulk_assign_partial_charges(
    molecules=ligands,
    overwrite=False,
    method=charge_settings.partial_charge_method,
    toolkit_backend=charge_settings.off_toolkit_backend,
    generate_n_conformers=charge_settings.number_of_conformers,
    nagl_model=charge_settings.nagl_model,
    processors=1
)

Generating charges: 100%|███████████████████████| 10/10 [03:58<00:00, 23.87s/it]


In [22]:
mapper = openfe.LomapAtomMapper(max3d=1.0, element_change=False)
scorer = openfe.lomap_scorers.default_lomap_score
network_planner = openfe.ligand_network_planning.generate_minimal_spanning_network

In [23]:
ligand_network = network_planner(
    ligands=charged_ligands,
    mappers=[mapper],
    scorer=scorer
)

In [24]:
comp = openfe.ProteinComponent.from_pdb_file(protein, name="tyk2")

In [25]:
settings = SepTopProtocol.default_settings()
settings.protocol_repeats = 1

# Fast settings
settings.complex_solvation_settings.box_shape = 'dodecahedron'
settings.solvent_solvation_settings.box_shape = 'dodecahedron'
settings.complex_simulation_settings.time_per_iteration = 2.5 * unit.ps
settings.solvent_simulation_settings.time_per_iteration = 2.5 * unit.ps
settings.forcefield_settings.nonbonded_cutoff = 0.9 * unit.nanometer
settings.complex_solvation_settings.solvent_padding = 1 * unit.nanometer
settings.solvent_solvation_settings.solvent_padding = 1.5 * unit.nanometer

settings.engine_settings.compute_platform = 'CUDA'

In [26]:
settings

{'alchemical_settings': {},
 'complex_equil_output_settings': {'checkpoint_interval': <Quantity(1.0, 'nanosecond')>,
                                   'checkpoint_storage_filename': 'checkpoint.chk',
                                   'equil_npt_structure': 'equil_npt',
                                   'equil_nvt_structure': None,
                                   'forcefield_cache': 'db.json',
                                   'log_output': 'equil_simulation',
                                   'minimized_structure': 'minimized',
                                   'output_indices': 'all',
                                   'preminimized_structure': 'system',
                                   'production_trajectory_filename': 'equil_production',
                                   'trajectory_write_interval': <Quantity(20.0, 'picosecond')>},
 'complex_equil_simulation_settings': {'equilibration_length': <Quantity(0.1, 'nanosecond')>,
                                       'equilib

### Creating a `Protocol`

The actual simulation is performed by a `Protocol`. We'll use an OpenMM-based hybrid topology relative free energy `Protocol`.

In [27]:
protocol = SepTopProtocol(settings=settings)

## Creating the `AlchemicalNetwork`

The `AlchemicalNetwork` contains all the information needed to run the entire campaign. It consists of a `Transformation` for each leg of the campaign. We'll loop over all the mappings, and then loop over the legs. In that inner loop, we'll make each transformation.

In [28]:
transformations = []
for edge in ligand_network.edges:
    # use the solvent and protein created above
    sysA_dict = {'ligand': edge.componentA,
                 'protein': comp,
                 'solvent': solvent}
    sysB_dict = {'ligand': edge.componentB,
                 'protein': comp,
                 'solvent': solvent}
    
    # we don't have to name objects, but it can make things (like filenames) more convenient
    sysA = openfe.ChemicalSystem(sysA_dict, name=f"{edge.componentA.name}")
    sysB = openfe.ChemicalSystem(sysB_dict, name=f"{edge.componentB.name}")
    
    prefix = "rbfe_"  # prefix is only to exactly reproduce CLI
    
    transformation = openfe.Transformation(
        stateA=sysA,
        stateB=sysB,
        mapping=None,
        protocol=protocol,  # use protocol created above
        name=f"{prefix}{sysA.name}_{sysB.name}"
    )
    transformations.append(transformation)

network = openfe.AlchemicalNetwork(transformations)

## Writing the `AlchemicalNetwork` to disk

We'll write out each transformation to disk, so that they can be run independently using the `openfe quickrun` command:

In [30]:
import pathlib
# first we create the directory
transformation_dir = pathlib.Path("transformations_septop_tyk2")
transformation_dir.mkdir(exist_ok=True)

# then we write out each transformation
for transformation in network.edges:
    transformation.to_json(transformation_dir / f"{transformation.name}.json")

Each of these individual .json files contains a Transformation, which contains all the information to run the calculation. These could be farmed out as individual jobs on a HPC cluster.

You can run the SepTop simulation from the CLI by using the openfe quickrun command. It takes a transformation JSON as input, and the flags -o to give the final output JSON file and -d for the directory where simulation results should be stored. For example,

`openfe quickrun path/to/transformation.json -o results.json -d working-directory`

where path/to/transformation.json is the path to one of the files created above.