```
This script can be used for any purpose without limitation subject to the
conditions at http://www.ccdc.cam.ac.uk/Community/Pages/Licences/v2.aspx

This permission notice and the following statement of attribution must be
included in all copies or substantial portions of this script.

2022-06-01: Made available by the Cambridge Crystallographic Data Centre.

```

# Searching for FAD in the CrossMiner database _via_ the API

Here we search the CrossMiner database for FAD (Flavin Adenine Dinucleotide) in it's Quinone or Semiquinone oxidation states.

These searches can be performed on the structure database released with [CSD-CrossMiner](https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/CSD-CrossMiner_User_Guide_2020_1.pdf) alongside it's pharmacophore feature database. 

In [1]:
from pathlib import Path
import sys
sys.path.append('../..')
from ccdc_notebook_utilities import create_logger

import os
from time import time

import warnings

In [2]:
from functools import reduce
from operator import or_
from collections import defaultdict

In [3]:
from IPython.display import HTML

In [4]:
import ccdc
from ccdc.pharmacophore import Pharmacophore
from ccdc.io import EntryReader
from ccdc.search import SMARTSSubstructure, SubstructureSearch, CombinedSearch

### Configuration

Template URL for a PDBe visualization for a structure...

In [5]:
pdbe_url = 'https://www.ebi.ac.uk/pdbe/entry/view3D/{pdb_id}/?view=entry_index&viewer=litemol&assembly=1'  # Template URL

### Initialization

In [6]:
logger = create_logger()

#### Open the CrossMiner structure database

As the CrossMiner structure database is quite large and not of interest to all users it is not installed by default. We thus check it is present before attempting to open it for searching...

In [7]:
db_file = Path(Pharmacophore.default_feature_database_location()).parent / 'pdb_crossminer.csdsqlx'  # Use feature-database to locate structure database

assert db_file.exists(), f"Error! The CrossMiner structure database '{db_file.resolve()}' was not found!"

Open the database...

In [8]:
db = ccdc.io.EntryReader(str(db_file))

logger.info(f"Number of entries in CrossMiner Structure database '{db_file.resolve()}': {len(db):,d}")

### Search for FAD

SMARTS for FAD in it's Quinone or Semiquinone oxidation states (adapted from the SMILES in the Wikipedia entry for [FAD](https://en.wikipedia.org/wiki/Flavin_adenine_dinucleotide)).

Note that...
* A non-aromatic representation is used for the 5-ring of the Adenine moiety.
* The phosphate Phosphorous is aromartic (a lower-case 'p' is used)
* The phosphate oxygens are represented by `[#8]`, _i.e._ 'any oxygen', as this gives the most hits.

More SMARTS could be added for the other oxidation states if necessary.

In [9]:
smarts = [
    'c12cc(C)c(C)cc1N=C3C(=O)NC(=O)N=C3N2CC(O)C(O)C(O)C[#8]p([#8])([#8])[#8]p([#8])([#8])[#8]CC4C(O)C(O)C(O4)N5C=Nc6c5ncnc6N',  # Quinone
    'c12cc(C)c(C)cc1NC=3C(=O)NC(=O)NC=3N2CC(O)C(O)C(O)C[#8]p([#8])([#8])[#8]p([#8])([#8])[#8]CC4C(O)C(O)C(O4)N5C=Nc6c5ncnc6N',  # Semiquinone
]

Use combined search with `OR` operator to find FAD in either oxidation state (_N.B._ the use of `reduce` here means other SMARTS could be added to the list above without needing to modify this code, as would be the case if the queries were added individually)...

In [10]:
def make_query(smarts, smarts_type='COFACTOR'):
    
    """
    Local utility function to make a query object from the SMARTS string of a cofactor.
    """
    
    query = SubstructureSearch()

    query.settings.max_hits_per_structure = 1    

    substructure = ccdc.search.SMARTSSubstructure(smarts)

    for atom in substructure.atoms: atom.add_protein_atom_type_constraint(smarts_type)

    query.add_substructure(substructure)
    
    return query

In [11]:
searcher = CombinedSearch(reduce(or_, [make_query(x) for x in smarts]))

In [None]:
%%time

hits = searcher.search(db) 

In [None]:
len(hits)

Organise hits by PDB ID...

In [None]:
hits_by_pdb_id = defaultdict(list)

for hit in hits:
    
    pdb_id = hit.identifier.split('_')[0]  # Extract PDB code portion from CrossMiner identifier
    
    hits_by_pdb_id[pdb_id].append(hit)

In [None]:
len(hits_by_pdb_id)

Examine a hit...

In [None]:
n = 0

pdb_id, hits_for_pdb_id = list(hits_by_pdb_id.items())[n]

print('\n'.join(sorted(x.identifier for x in hits_for_pdb_id)))

We can now create a link to the PDBe entry, which provides the full context including a 3D visualization...

In [None]:
HTML(f'<a href="https://www.ebi.ac.uk/pdbe/entry/pdb/{pdb_id}" target="_blank">{pdb_id}</a>')