# Run GOLD using the CSD Python API

This notebook again illustrates running GOLD _via_ the CSD Python API in [interactive](https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/docking.html#interactive-docking) mode; however, this time, the docking is configured using a pre-prepared GOLD configuration file. There are a couple of changes that need to be made when doing it this way _vs._ configuring entirely _via_ the API that it is worthwhile to illustrate.

Note that Interactive Docking specifically has some quirks that still need to be addressed. For example, if the `save_top_n_solutions` option is set in the input conf file it is not respected here.

#### GOLD docs
* [User Guide](https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/GOLD_User_Guide.pdf)
* [Conf file](https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/GOLD_conf_file_user_guide.pdf)

#### Docking API docs
* [Descriptive](https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/docking.html)
* [Module API](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/docking_api.html)

In [None]:
import logging
import sys
import os
import shutil
from pathlib import Path
from platform import platform
import time
import subprocess

In [None]:
import pandas as pd

In [None]:
import ccdc
from ccdc.io import MoleculeReader, EntryReader, EntryWriter
from ccdc.docking import Docker

### Config

The directory containing the input files for these dockings; directory must exist...

In [None]:
input_dir = Path('input_files').absolute()

GOLD conf file; file must exist...

In [None]:
conf_file = input_dir / 'gold.conf'

Molecules to dock; file must exist...

In [None]:
input_file = input_dir / 'input.sdf'

Output directory (will be created)...

In [None]:
output_dir = Path('output_interactive_conf')

We will set the 'write options' to `MIN_OUT` so output to disk is minimal. See [here](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/docking_api.html?highlight=write_options#ccdc.docking.Docker.Settings.write_options) for available write options, and the GOLD Configuration File User Guide, Chapter 16 for more details. 

In [None]:
write_options = ['MIN_OUT']

### Initialization

In [None]:
logger = logging.getLogger(__name__)
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('[%(asctime)s %(levelname)-7s] %(message)s', datefmt='%y-%m-%d %H:%M:%S'))
logger.addHandler(handler)
logger.setLevel(logging.INFO)

In [None]:
logger.info("""
Platform:                     {platform()}

Python exe:                   {sys.executable}
Python version:               {'.'.join(str(x) for x in sys.version_info[:3])}

CSD version:                  {ccdc.io.csd_version()}
CSD directory:                {ccdc.io.csd_directory()}
API version:                  {ccdc.__version__}

CSDHOME:                      {os.environ.get('CSDHOME', 'Not set')}
CCDC_LICENSING_CONFIGURATION: {os.environ.get('CCDC_LICENSING_CONFIGURATION', 'Not set')}
""")

In [None]:
discovery_dir = sorted(Path(os.environ['CSDHOME']).parent.parent.glob('Discovery_*'))[-1]

hermes_exe = (discovery_dir / 'Hermes' / 'hermes.exe' if platform().startswith('Windows') else discovery_dir / 'bin' / 'hermes').as_posix()

Create a fresh output directory for the docking run...

In [None]:
if output_dir.exists():
    
    logger.warning(f"The output directory '{output_dir}' exists and will be overwritten.")
    
    shutil.rmtree(output_dir)
    
output_dir.mkdir()

os.chdir(output_dir)

### Configure docking

This docking run is largely configured using a pre-prepared GOLD conf file, although we need to make some minor modifications as we will be using the `interactive` mode. We do this by instantiating a `Docker.Settings` object from the file and then modifying it _via_ it's methods and attributes...

In [None]:
settings = Docker.Settings.from_file(str(conf_file))

We will be using interactive docking here, so won't use the input ligand file(s) as specified in the conf file (although we do actually use the same input file).

However, we do need to extract the number of docking runs (`ndocks`) specified in the ligand file record...

In [None]:
ndocks = settings.ligand_files[0].ndocks

We then need to clear the ligand files setting (as we will be using interactive docking)...

In [None]:
settings.clear_ligand_files()

Set number of dockings...

In [None]:
settings.set_hostname(ndocks=ndocks) 

Change the `output_directory` attribute to the current directory...

In [None]:
settings.output_directory = '.'

Set write options...

In [None]:
settings.write_options = write_options

### Run docking

Here we run GOLD in `interactive` mode...

Note how the list of solutions is built up during the docking run: as each molecule is docked in turn, the `session.dock` method returns a tuple of solutions for that molecule. This list of tuples is then used to write out the solution files using the standard GOLD solution file nameing scheme and is then flattened to build up a table of fitness function components.

In [None]:
# Instantiate a docker...

docker = Docker(settings=settings)

# Start an interactive session...

session = docker.dock(mode='interactive', file_name='api_gold.conf')

session.ligand_preparation = None  # We assume ligand preparation has been done

logger.info(f"GOLD interactive session PID: {session.pid}")

logger.info(f"Starting to dock ligands from input file '{input_file}'.")

solns_by_mol = []  # We will build up a list of tuples of solutions as we dock each mol

with EntryReader(input_file.as_posix()) as reader:

    for n_mol, entry in enumerate(reader, 1):

        mol, name = entry.molecule, entry.identifier

        logger.info(f"Starting ligand '{name}'...")

        solns = session.dock(mol)  # Tuple of solutions for this mol
        
        logger.info(f"... done ({len(solns)} solutions).")
        
        solns_by_mol.append(solns)  # Append tuple to list of solutions

logger.info(f"Finished.")

The fitness and it's components are available _via_ a flattened list of solutions...

In [None]:
solutions = [y for x in solns_by_mol for y in x]  # Flatten list of tuples

In [None]:
scores_df = pd.DataFrame([{'identifier': x.identifier, 'fitness': x.fitness(), **x.scoring_term()} for x in solutions])

scores_df.shape

In [None]:
scores_df.head()

### Visualization

Now, as we have been talking to GOLD over a socket and we specified write option `MIN_OUT` above, the solutions have not been written to disk at this point. If we wish to visualise them in _e.g._ Hermes, we will need to do this ourselves.

So, write out the solution files using the standard GOLD solution file naming scheme...

In [None]:
stem, suffix = input_file.stem, input_file.suffix[1:]  # For GOLD standard solution file naming scheme

for n_mol, solns in enumerate(solns_by_mol, 1):
    
    for n_soln, soln in enumerate(solns, 1):

        file_name = f'gold_soln_{stem}_m{n_mol}_{n_soln}.{suffix}'  # GOLD standard solution file naming scheme

        with EntryWriter(file_name) as writer: 

            writer.write(soln)

Once the solution files have been written, the results of a GOLD run setup and run _via_ the API may be visualized in Hermes by loading the GOLD conf file written by the API...

In [None]:
status = subprocess.Popen([hermes_exe, 'api_gold.conf'])