# Run GOLD using the CSD Python API

This notebook illustrates configuring an API docking run using a pre-prepared GOLD configuration file. The docking is then run in `background` mode. 

Note that the `gold.conf` file used is derived from the one written by GOLD in the previous example. The only difference is that the file paths have been manually altered from the absolute paths written by GOLD to relative paths to make the file portable.

#### GOLD docs
* [User Guide](https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/GOLD_User_Guide.pdf)
* [Conf file](https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/GOLD_conf_file_user_guide.pdf)

#### Docking API docs
* [Descriptive](https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/docking.html)
* [Module API](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/docking_api.html)

In [None]:
import logging
import sys
import os
import shutil
from pathlib import Path
from platform import platform
import time
import subprocess

In [None]:
import pandas as pd

In [None]:
import ccdc
from ccdc.io import MoleculeReader, EntryReader, EntryWriter
from ccdc.docking import Docker

### Config

The directory containing the input files for these dockings; directory must exist...

In [None]:
input_dir = Path('input_files').absolute()

Pre-prepared GOLD conf file, which must exist...

In [None]:
conf_file = input_dir / 'gold.conf'

In [None]:
# print(conf_file.open().read())

Output directory (will be created)...

In [None]:
output_dir = Path('output_background_conf')

### Initialization

Create a fresh output directory for the docking run...

In [None]:
logger = logging.getLogger(__name__)
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('[%(asctime)s %(levelname)-7s] %(message)s', datefmt='%y-%m-%d %H:%M:%S'))
logger.addHandler(handler)
logger.setLevel(logging.INFO)

In [None]:
logger.info(f"""
Platform:                     {platform()}

Python exe:                   {sys.executable}
Python version:               {'.'.join(str(x) for x in sys.version_info[:3])}

CSD version:                  {ccdc.io.csd_version()}
CSD directory:                {ccdc.io.csd_directory()}
API version:                  {ccdc.__version__}

CSDHOME:                      {os.environ.get('CSDHOME', 'Not set')}
CCDC_LICENSING_CONFIGURATION: {os.environ.get('CCDC_LICENSING_CONFIGURATION', 'Not set')}
""")

In [None]:
discovery_dir = sorted(Path(os.environ['CSDHOME']).parent.parent.glob('Discovery_*'))[-1]

hermes_exe = (discovery_dir / 'Hermes' / 'hermes.exe' if platform().startswith('Windows') else discovery_dir / 'bin' / 'hermes').as_posix()

In [None]:
if output_dir.exists():
    
    logger.warning(f"The output directory '{output_dir}' exists and will be overwritten.")
    
    shutil.rmtree(output_dir)
    
output_dir.mkdir()

os.chdir(output_dir)

### Configure docking

This docking run is configured using a pre-prepared GOLD conf file. We do this by instantiating a `Docker.Settings` object from the file...

In [None]:
settings = Docker.Settings.from_file(str(conf_file))

The protein target is specified by the conf file...

In [None]:
settings.protein_files[0]

The binding site is specified by the conf file...

In [None]:
len(settings.binding_site.residues)

The ligand file and number of dockings are specified by the conf file...

In [None]:
ligand_file = settings.ligand_files[0]

ligand_file.file_name, ligand_file.ndocks

The only change we make is to set the `output_directory` attribute to the current directory, to avoid another directory being created..

In [None]:
settings.output_directory = '.'

### Run docking

Here we run GOLD in `background` mode...

Note that an 'empty' [Results](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/docking_api.html#ccdc.docking.Docker.Results) object is returned immediately and updated incrementaly as dockings are completed. This feature is used below to monitor the number of ligands docked. Note, however, that due to output buffering, logging may not be very timely, at least for this small number of ligands. If this is an issue the `interactive` mode may be preferable.

In [None]:
# Count ligands to dock (used in logging below)...

with EntryReader(ligand_file.file_name) as reader:

    n_to_dock = len([x.identifier for x in reader])

In [None]:
t0 = time.time()

docker = Docker(settings=settings)

results = docker.dock(mode='background', file_name='api_gold.conf')

In [None]:
logger.info(f"Docking in background: PID {results.pid}...")

n_tries, sleep_seconds = 720, 5  # Allow a maximum of one hour

for n in range(1, n_tries+1):
    
    time.sleep(sleep_seconds)
    
    if docker.dock_status() == 0:
        
        logger.info(f"Docking complete.")
        
        break
        
    n_done = len([x for x in results.docking_log.split('\n') if x.startswith('Completed docking of ligand')])  # Use results object to monitor progress
        
    logger.info(f"{n:03d}) finished {n_done}/{n_to_dock}")
    
else:
    
    logger.warning("Docking did not complete.", file=sys.stderr)
    
logger.info(f"GOLD (background) ran in {time.time() - t0:.1f} seconds.")

Once the background docking is finished, the completed `results` object is available for inspection as before.

For example, we can check all input ligands are accounted for...

In [None]:
assert len({x.identifier.split('|')[0] for x in results.ligands}) == n_to_dock

The fitness and it's components are again available...

In [None]:
scores_df = pd.DataFrame([{'identifier': x.identifier, 'fitness': x.fitness(), **x.scoring_term()} for x in results.ligands])

scores_df.shape

In [None]:
scores_df # .head()

### Visualization

The results of a GOLD run setup and run _via_ the API may be visualized in Hermes by loading the GOLD conf file written by the API...

In [None]:
status = subprocess.Popen([hermes_exe, 'api_gold.conf'])