# Complexes from Covalent Docking (substructure method)

**Important!** Currently, covalent docking cannot be fully configured using the API. We thus suggested that the docking is set up _via_ Hermes. Once a working configuration has been created, it can be used _via_ the API or the GOLD HPC tools.

This notebook illustrates making complexes from covalent docking results. For a normal GOLD run this could be done straightforwardly in Hermes (`File > Export Complex` or using the API [Docker.Results.make_complex](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/docking_api.html?highlight=make_complex#ccdc.docking.Docker.Results.make_complex) method.

The reason this notebook has been created is that these normal ways of making complexes do not quite work as expected for covlent dockings.

The mechanism GOLD uses for covalent docking requires the linker atom to be present in both the protein target and the ligands (see the companion notebook on [Ligand Preparation for Covalent Docking](01_Ligand_Preparation_for_Covalent_Docking.ipynb) for more details). This is reflected in the solutions, in that the linker atom is present in both the protein and the solution and there is no actual bond between the protein and docked ligand. This is fine for preliminary visualization in Hermes, but means any exported complex is unphysical and not suitable for further computation without modification.

We acknowledge that this situation isn't satisfactory and intend to rectify it in time: this notebook attempts to illustrate a short-term fix for the issue.

It is assumed a covalent docking has been performed using the `gold_substructure.conf` GOLD configuration file in this directory, which uses the substructure file method to identify the covalent warhead fragment in the ligand (see section 5.6.3 'Setting Up Substructure-Based Covalent Links' in the GOLD user guide). If an alternative mechanism was used, this notebook won't work.

It is also assumed the linker is a simple S, O or NH. If these assumptions are false the notebook won't work. 

If the docking system is more complex than is assumed, the code will need modification. Please let me know.

In [None]:
from platform import platform
import sys
import os
from pathlib import Path
import logging
import time
import subprocess
import re

In [None]:
import ccdc

from ccdc.io import MoleculeReader, EntryWriter  # EntryReader, EntryWriter, 
from ccdc.docking import Docker
from ccdc.search import MoleculeSubstructure, SubstructureSearch

#### Config

Method by which the ligand linker atom was specified in the docking (_i.e._ using an atom number or a substructure file)...

In [None]:
method = 'substructure'  # 'atom'

Dir from which files will read and written...

In [None]:
working_dir = Path('.')

GOLD conf file; file must exist...

In [None]:
conf_file = working_dir / f'gold_{method}.conf'

File format in which to export protein-ligand complex for a docking solution...

In [None]:
export_format = 'mol2' 

#### Initialization

In [None]:
# Set up and configure a logger...

logger = logging.getLogger(__name__)

handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('[%(asctime)s %(levelname)-7s] %(message)s', datefmt='%y-%m-%d %H:%M:%S'))
logger.addHandler(handler)
logger.setLevel(logging.INFO)

In [None]:
# Locate Hermes executable...

csd_dir = Path(ccdc.io.csd_directory()).parent  # CSD System directory
version = csd_dir.name.replace('CSD_', '')  # Release version
discovery_dir = csd_dir.parent / f'Discovery_{version}'  # Corresponding Discovery directory
    
hermes_exe  = str(discovery_dir / 'Hermes' / 'hermes.exe') if platform().startswith('Windows') else str(discovery_dir / 'bin' / 'hermes')  # Linux or MacOS

In [None]:
# Output platform etc. info useful for debugging...

logger.info(f""" 
Platform:                     {platform()}

Python exe:                   {sys.executable}
Python version:               {'.'.join(str(x) for x in sys.version_info[:3])}

CSD version:                  {ccdc.io.csd_version()}
CSD directory:                {ccdc.io.csd_directory()}
API version:                  {ccdc.__version__}

CSDHOME:                      {os.environ.get('CSDHOME', 'Not set')}
CCDC_LICENSING_CONFIGURATION: {os.environ.get('CCDC_LICENSING_CONFIGURATION', 'Not set')}
""")

#### Setup

Covalent docking is not yet handled properly by the Python API. We thus first read the GOLD conf file as a simple text file and extract the records pertaining to covalent docking. 
Note that this method is not a general method of parsing GOLD conf file, but is sufficient in this case.

In [None]:
with conf_file.open('r') as file:
    
    lines = file.read().split('\n')

pattern = re.compile(r'^(covalent\w+)\s*=\s*([\w\.]+)')

options = {}

for line in lines:
    
    match = pattern.match(line)
    
    if not match: continue

    key, value = match.groups()

    options[key] = value

options

Here we determine the index of the linker atom in the protein from the record in the conf file, which will be the same for all complexes.

Note that, as GOLD uses 1-based indexing (what we are calling 'atom number') but the API uses 0-based indexing, we convert the GOLD atom numbers to 0-based indexes.

In [None]:
protein_atom_index = int(options['covalent_protein_atom_no']) - 1

protein_atom_index

We will use the substructure used to define the ligand linker atom in the docking to identify it again in the complexes. 

As it is conceivable the substructure might match groups on the protein, we add constraints such that it can only match a ligand moiety withing the complex.

The substructure-searcher will be used for all complexes, so we only need to configure it once here.

In [None]:
with MoleculeReader(options['covalent_substructure_filename']) as reader:
    
    substructure_mol = reader[0]

In [None]:
substructure = MoleculeSubstructure(substructure_mol)

In [None]:
for atom in substructure.atoms:
    
    atom.add_protein_atom_type_constraint('LIGAND')

In [None]:
searcher = SubstructureSearch()

_ = searcher.add_substructure(substructure)

Index of the ligand linker atom within the substructure (see above for a note on indexing)...

In [None]:
substructure_atom_index = int(options['covalent_substructure_atom_no']) - 1

substructure_atom_index

Next we load the conf file properly _via_ the API and hence get a docking [Results](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/docking_api.html?highlight=make_complex#ccdc.docking.Docker.Results) object...

In [None]:
settings = Docker.Settings.from_file(str(conf_file))

docker = Docker(settings=settings)

results = docker.results

len(results.ligands)

#### Example

We will use the first solution as an example...

In [None]:
solution = results.ligands[0]

solution.identifier

Make a complex from the solution...

In [None]:
complexed = results.make_complex(solution) 

complexed.remove_unknown_atoms()  # Remove lone pairs for export

Here we get the protein linker atom in the complex.

It is assumed here that the linker atom is singly-connected, _i.e._ is a typical S or O and is 'bare', without a hydrogen. This assumption is made as it simplifies the processing. The code below checks for it so if you do have hydrogen on your linker atoms (or have a more exotic system where the linker is not a typical S or O) it will fail. It would be possible to take account of this, so let me know if it is necessary.

In [None]:
protein_linker_atom = complexed.atoms[protein_atom_index]

assert len(protein_linker_atom.bonds) == 1, f"Error! Protein linker atom has more than one bond!"

logger.info(f"Protein linker atom: {protein_linker_atom.residue_label}/{protein_linker_atom.label} ({protein_linker_atom.index + 1})")

Here we get the ligand linker atom in the complex using a substructure search.

Note again that it is assumed that the linker atom is singly-connected.

In [None]:
matches = searcher.search(complexed)

assert matches, "Error! No ligand substructure match in complex!"

match = matches[0]

ligand_linker_atom = match.match_atoms()[substructure_atom_index]

logger.info(f"Ligand linker atom: {ligand_linker_atom.residue_label}/{ligand_linker_atom.label} ({ligand_linker_atom.index+1})")

Remove any hydrogens on the ligand linker atom (which is possible if the nucleophile is a lysine N, for example). After this, it is assumed that the ligand linker atom is singly-connected and can safely be deleted. If it is _not_ singly-connected (_e.g._ if the nucleophile was a methylated lysine, perhaps: is this likely to happen?), then more elaboration will be required.

In [None]:
for bond in ligand_linker_atom.bonds:

    x_atom = [atom for atom in bond.atoms if atom != ligand_linker_atom][0]

    if x_atom.atomic_number == 1:
        
        logger.info(f"Removing H from ligand linker atom of '{solution.identifier}'.")

        complexed.remove_atom(x_atom)

In [None]:
assert len(ligand_linker_atom.bonds) == 1, f"Error! Ligand linker atom has more than one bond!"

Get the remaining (heavy) ligand atom attached to the ligand linker atom...

In [None]:
ligand_atom = [atom for atom in ligand_linker_atom.bonds[0].atoms if atom != ligand_linker_atom][0]

Remove the now-superfluous ligand linker atom...

In [None]:
complexed.remove_atom(ligand_linker_atom)

Attach protein linker atom to ligand atom...

In [None]:
complexed.add_bond(1, protein_linker_atom, ligand_atom)

Export the now covalently-bound complex complex...

In [None]:
complex_file = working_dir / f'complexed_{method}.mol2'

with EntryWriter(complex_file) as writer:
    
    writer.write(complexed)

Inspection of this complex in Hermes will show that the linker atom is no longer duplicated and that a bond exists between the protein and ligand.

In [None]:
status = subprocess.Popen([hermes_exe, complex_file.as_posix()], creationflags=0x00000008)

#### Exporting all solutions as complexes

This facility can also be used to export all solutions. In the example below, solutions are exported in descending order of fitness for each input ligand.

In [None]:
complexes_dir = working_dir / f'complexes_{method}'

complexes_dir.mkdir(exist_ok=True)

In [None]:
for n, solution in enumerate(results.ligands, 1):

    # Make a complex from the solution...

    complexed = results.make_complex(solution) 

    complexed.remove_unknown_atoms()  # Remove lone pairs for export

    # Determine linker atom in protein (from conf file)...
    
    protein_linker_atom = complexed.atoms[protein_atom_index]

    assert len(protein_linker_atom.bonds) == 1, f"Error! Protein linker atom has more than one bond!"

    logger.debug(f"Protein linker atom: {protein_linker_atom.residue_label}/{protein_linker_atom.label} ({protein_linker_atom.index+1})")

    # Determine linker atom in ligand using substructure...

    matches = searcher.search(complexed)
    
    assert matches, "Error! No ligand substructure match in complex!"

    match = matches[0]

    ligand_linker_atom = match.match_atoms()[substructure_atom_index]

    logger.debug(f"Ligand linker atom: {ligand_linker_atom.residue_label}/{ligand_linker_atom.label} ({ligand_linker_atom.index+1})")
    
    # Remove any Hs on ligand linker atom such that it is singly-connected...
    
    for bond in ligand_linker_atom.bonds:

        x_atom = [atom for atom in bond.atoms if atom != ligand_linker_atom][0]

        if x_atom.atomic_number == 1:

            complexed.remove_atom(x_atom)
            
    assert len(ligand_linker_atom.bonds) == 1, f"Error! Ligand linker atom has more than one bond!"

    # Get the remaining ligand atom attached to the linker atom...

    ligand_atom = [atom for atom in ligand_linker_atom.bonds[0].atoms if atom != ligand_linker_atom][0]

    # Remove the now-superfluous ligand linker atom...

    complexed.remove_atom(ligand_linker_atom)

    # Attach protein linker atom to ligand atom...

    complexed.add_bond(1, protein_linker_atom, ligand_atom)

    # Export complex...

    complex_file = complexes_dir / f'complex_{n:03d}.mol2'

    with EntryWriter(str(complex_file)) as writer:

        writer.write(complexed)

    logger.info(f"Solution '{solution.identifier:30}' written to file {complex_file}.")