# Searching for FAD in the CrossMiner database _via_ the API

Here we search the CrossMiner database for FAD (Flavin Adenine Dinucleotide) in it's Quinone or Semiquinone oxidation states.

These searches can be performed on the structure database released with [CSD-CrossMiner](https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/CSD-CrossMiner_User_Guide_2020_1.pdf) alongside it's pharmacophore feature database. 

In [None]:
from platform import platform
import sys
import os
from pathlib import Path

In [None]:
from functools import reduce
from operator import or_
from collections import defaultdict

In [None]:
from IPython.display import HTML, SVG, IFrame

In [None]:
import ccdc
from ccdc.pharmacophore import Pharmacophore
from ccdc.io import EntryReader
from ccdc.search import SMARTSSubstructure, SubstructureSearch, CombinedSearch

### Configuration

Get the path to the CrossMiner structure database...

In [None]:
data_dir = Path(Pharmacophore.default_feature_database_location()).parent.resolve()

db_file = data_dir / 'pdb_crossminer.csdsqlx'

Template URL for a PDBe visualization for a structure...

In [None]:
pdbe_url = 'https://www.ebi.ac.uk/pdbe/entry/view3D/{pdb_id}/?view=entry_index&viewer=litemol&assembly=1'  # Template URL

### Initialization

In [None]:
print(f"""
Platform:                     {platform()}

Python exe:                   {sys.executable}
Python version:               {'.'.join(str(x) for x in sys.version_info[:3])}

CSD version:                  {ccdc.io.csd_version()}
CSD directory:                {ccdc.io.csd_directory()}
API version:                  {ccdc.__version__}

CSDHOME:                      {os.environ.get('CSDHOME', 'Not set')}
CCDC_LICENSING_CONFIGURATION: {os.environ.get('CCDC_LICENSING_CONFIGURATION', 'Not set')}

CrossMiner database:          {db_file}
""", file=sys.stderr)

Open the CrossMiner structure database...

In [None]:
db = EntryReader(str(db_file))

len(db)

### Search for FAD

SMARTS for FAD in it's Quinone or Semiquinone oxidation states (adapted from the SMILES in the Wikipedia entry for [FAD](https://en.wikipedia.org/wiki/Flavin_adenine_dinucleotide)).

Note that...
* A non-aromatic representation is used for the 5-ring of the Adenine moiety.
* The phosphate Phosphorous is aromartic (a lower-case 'p' is used)
* The phosphate oxygens are represented by `[#8]`, _i.e._ 'any oxygen', as this gives the most hits.

More SMARTS could be added for the other oxidation states if necessary.

In [None]:
smarts = [
    'c12cc(C)c(C)cc1N=C3C(=O)NC(=O)N=C3N2CC(O)C(O)C(O)C[#8]p([#8])([#8])[#8]p([#8])([#8])[#8]CC4C(O)C(O)C(O4)N5C=Nc6c5ncnc6N',  # Quinone
    'c12cc(C)c(C)cc1NC=3C(=O)NC(=O)NC=3N2CC(O)C(O)C(O)C[#8]p([#8])([#8])[#8]p([#8])([#8])[#8]CC4C(O)C(O)C(O4)N5C=Nc6c5ncnc6N',  # Semiquinone
]

Use combined search with `OR` operator to find FAD in either oxidation state (_N.B._ the use of `reduce` here means other SMARTS could be added to the list above without needing to modify this code, as would be the case if the queries were added individually)...

In [None]:
def make_query(smarts, smarts_type='COFACTOR'):
    
    """
    Local utility function to make a query object from the SMARTS string of a cofactor.
    """
    
    query = SubstructureSearch()

    query.settings.max_hits_per_structure = 1    

    substructure = ccdc.search.SMARTSSubstructure(smarts)

    for atom in substructure.atoms: atom.add_protein_atom_type_constraint(smarts_type)

    query.add_substructure(substructure)
    
    return query

In [None]:
searcher = CombinedSearch(reduce(or_, [make_query(x) for x in smarts]))

In [None]:
%%time

hits = searcher.search(db) 

In [None]:
len(hits)

Organise hits by PDB ID...

In [None]:
hits_by_pdb_id = defaultdict(list)

for hit in hits:
    
    pdb_id = hit.identifier.split('_')[0]  # Extract PDB code portion from CrossMiner identifier
    
    hits_by_pdb_id[pdb_id].append(hit)

In [None]:
len(hits_by_pdb_id)

Examine a hit...

In [None]:
n = 0

pdb_id, hits_for_pdb_id = list(hits_by_pdb_id.items())[n]

len(hits_for_pdb_id)

In [None]:
url = pdbe_url.format(pdb_id=pdb_id)

HTML(f'<a href="{url}" target="_blank">{pdb_id}</a>')

In [None]:
IFrame(url, 800, 1000)