```
This script can be used for any purpose without limitation subject to the
conditions at http://www.ccdc.cam.ac.uk/Community/Pages/Licences/v2.aspx

This permission notice and the following statement of attribution must be
included in all copies or substantial portions of this script.

2022-06-01: Made available by the Cambridge Crystallographic Data Centre.

```

# Preparing ligands for GOLD docking.

For optimal performance, GOLD requires a good-quality 3D ligand structure as input. The CSD [Molecule API](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/molecule_api.html) and [Conformer API](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/conformer_api.html) can now be used together to generate such structures. This notebook is designed to show how this can be done.

Note that it is assumed that the input structures are all in the desired charge and tautomeric states. No protonation/deprotonation or tautomer standardization/enumeration is done here. This is currently out of scope for the [CSD Python API](https://downloads.ccdc.cam.ac.uk/documentation/API/index.html), but we recommend [RDKit](http://www.rdkit.org/docs/GettingStartedInPython.html) to those who wish to investigate it further.

In [1]:
import sys
sys.path.append('../..')
from ccdc_notebook_utilities import create_logger
import os
from pathlib import Path
import re
import csv

In [2]:
import ccdc
from ccdc.molecule import Molecule
from ccdc.entry import Entry
from ccdc.conformer import ConformerGenerator
from ccdc.io import EntryWriter

#### Config

The directory containing the input files for docking; directory must exist...

In [3]:
input_dir = Path('input_files')

CSV file of input structures as SMILES with Names (_N.B._ any other columns will be kept as data items)...

In [4]:
input_csv = input_dir / 'input.csv'

smiles_col, name_col = 'smiles', 'name'  # Columns requied in input file

Output file for this script (which is the _input_ file for GOLD); note that the file extension determines the format...

In [5]:
output_file = input_dir / 'input.sdf' 

#### Initialization

In [6]:
# Get logger and configure if necessary...

logger = create_logger()

In [7]:
# Check that all required files and directories exist...

for directory in [input_dir]: assert directory.exists(), f"Error! Required directory '{directory}' not found."

for file in [input_csv]: assert file.exists(), f"Error! Required file '{file}' not found."

In [8]:
comment = re.compile(r'^\s*#')  # Pattern to match comment lines in CSV files etc.

### Load SMILES input from CSV file and create a 3D input file for GOLD

Recall that a SMILES and Name column are required. All columns in the input CSV file are written to the output file as SD-format data items, including the SMILES, Name and any data columns that might be present. This is not strictly necessary but experience suggests it can be convenient in practice.

Initialise a conformer generator: recall that only a single conformer is required, as GOLD performs flexible docking...

In [9]:
conformer_generator = ConformerGenerator()

conformer_generator.settings.max_conformers = 1 

Process the ligands...

In [10]:
logger.info(f"Starting to process ligands...")

with input_csv.open() as file:
    
    reader = csv.DictReader(file)
    
    assert all(col in reader.fieldnames for col in [smiles_col, name_col]), f"Error! Required column missing from '{input_csv}'."  # Ensure all required columns are present
    
    with EntryWriter(output_file) as writer:  # Recall that the API uses the output file suffix to determine the output format
        
        for index, record in enumerate(record for record in reader if not comment.match(record[reader.fieldnames[0]])):  # Skip commented-out lines
            
            smiles, name = record[smiles_col], record[name_col]  # Required columns
            
            # Convert SMILES to a 0D API molecule (we call this '0D' as this mol object has neither 2D or 3D coordinates)...
            
            try:

                mol = Molecule.from_string(smiles, format='smiles')

                mol.identifier = name

            except RuntimeError as error:

                logger.warning(f"Failed to make 0D mol for '{name}': {error.args[0]}")
                
                continue
                
            # Convert 0D molecule to 3D by generating a single conformer...
            
            try:

                mol = conformer_generator.generate(mol)[0].molecule

            except RuntimeError as error:
                
                logger.warning(f"Failed to make 3D mol for '{name}': {error.args[0]}")
                
                continue
            
            # Create an API entry object from the molecule, which will allow the saving of SD-format data items...

            entry = Entry.from_molecule(mol, index=index, **record)
            
            # Write to the output file...

            writer.write(entry)
            
            logger.info(f"{index:3d}) completed mol '{name}'.")
            
logger.info(f"Finished.")