# Background

This notebook runs through a few simple examples of CSD Python API usage to set the satge for the main notebook collection.

Some of the functionality illustrated, along with the imports _etc._ are recorded in the file `Discovery_Notebook_utils.py` for use by the other notebooks.

In [None]:
from platform import platform
import sys
import os
from pathlib import Path
import logging

In [None]:
import pandas as pd

In [None]:
from IPython.display import HTML

In [None]:
import rdkit
from rdkit.Chem.Draw import IPythonConsole

In [None]:
import ccdc
from ccdc.io import csd_version

In [None]:
from ccdc.io import EntryReader
from ccdc.diagram import DiagramGenerator

### Initialization

Define info useful for debugging...

In [None]:
script_info = f"""
Platform:       {platform()}
Python version: {'.'.join(str(x) for x in sys.version_info[:3])}
Python exe:     {sys.executable}
CSD version:    {csd_version()}
CSDHOME:        {os.environ['CSDHOME']}
API version:    {ccdc.__version__}
RDKit version:  {rdkit.__version__}
"""

Set up a logger object, with timestamp _etc._...

In [None]:
logger = logging.getLogger(__name__)
handler = logging.StreamHandler()
handler.setFormatter(logging.Formatter('[%(asctime)s %(levelname)-7s] %(message)s', datefmt='%y-%m-%d %H:%M:%S'))
logger.addHandler(handler)
logger.setLevel(logging.INFO)

In [None]:
logger.info(script_info)

### Basic Lookup and CSD Entities

There are three 'entity types' in the CSD database system, Entries, Crystals and Molecules. They manage different type of data relevant to a database entry and each have an API module dedicated to them. The Entry object is the 'highest level', and contains nested Crystal and Molecule objects.

#### Entry
The [Entry](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/entry_api.html) holds high-level information such as the deposition date and citation. The Entry for a CSD Refcode may be obtained very straightforwardly _via_ an [EntryReader](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/io_api.html?highlight=entryreader#ccdc.io.EntryReader) object...

In [None]:
reader = EntryReader()

In [None]:
refcode = 'ZODZEA'

In [None]:
HTML(f'<a href="https://www.ccdc.cam.ac.uk/structures/Search?ccdcid={refcode}" target="_blank">{refcode}</a>')

In [None]:
entry = reader.entry(refcode)

In [None]:
entry.deposition_date

The Publication field contain nested a nested Journal objects...

In [None]:
publication = entry.publication

publication

In [None]:
publication.authors

In [None]:
journal = entry.publication.journal

journal.full_name, journal.abbreviated_name

#### Crystal

The [Crystal](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/crystal_api.html) holds crystallographic information, such as unit cell parameters. A Crystal object may be retrieved directly from the database using a [CrystalReader](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/io_api.html?highlight=crystalreader#ccdc.io.CrystalReader) object or extracted from an Entry object, as below...

In [None]:
crystal = entry.crystal

In [None]:
crystal.cell_angles

In [None]:
crystal.cell_lengths

In [None]:
crystal.cell_volume

#### Molecule

The [Molecule](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/molecule_api.html) holds chemical information, such as the connection table. It may be retrieved directly from the CSD or read from a variety of file formats using a [MoleculeReader](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/io_api.html?highlight=moleculereader#ccdc.io.MoleculeReader) object. It may also be extracted from an Entry object, as below...

Note that the Python API is primarily intended as a means of programatically accessing the CSD. It is not currently intended to be a full-featured cheminformatics toolkit, although the Molecule API can be used for a variety of chemiformatics tasks.

In [None]:
mol = entry.molecule

In [None]:
def make_row(atom):
    
    bonds = ' / '.join([f"{bond.atoms[1].label if bond.atoms[0].label == atom.label else bond.atoms[0].label} ({bond.sybyl_type})" for bond in atom.bonds])

    return [atom.label, atom.atomic_number, atom.sybyl_type, *atom.coordinates, bonds]

atoms_df = pd.DataFrame(
        data=[make_row(atom) for atom in mol.atoms],
        columns=['label', 'atomic_number', 'atom_type', 'x', 'y', 'z', 'bonds']
    )

atoms_df.shape

In [None]:
atoms_df.head()

In [None]:
def make_row(bond):

    return [bond.atoms[0].label, bond.atoms[1].label, bond.sybyl_type]

bonds_df = pd.DataFrame(
        data=[make_row(bond) for bond in mol.bonds],
        columns=['atom_1', 'atom_2', 'bond_type']
    )

bonds_df.shape

In [None]:
bonds_df.head()

The connection table may be witten in a variety of formats, which is useful for interoperability with other toolkits such as RDKit (see below).

In [None]:
conn_tab = mol.to_string('sdf') 

print(conn_tab)

In [None]:
# print(mol.to_string('mol2'))

A SMILES string is also available...

In [None]:
mol.smiles

### 2D Depictions

A 2D depiction may be generated using the [Diagram API](https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/diagram.html?highlight=diagram).

First, set up a CCDC Diagram Generator...

In [None]:
diagram_generator = DiagramGenerator()

diagram_generator.settings.return_type = 'SVG'
diagram_generator.settings.explicit_polar_hydrogens = False
diagram_generator.settings.shrink_symbols = False

Then generate an SVG image and display...

In [None]:
HTML(diagram_generator.image(mol))

Markup may be applied to depictions, such as highlighting substructure matches or labelling certain atoms. Examples of this are shown in other Notebooks.

### RDKit

As noted above, the Python API is primarily intended as a means of programatically accessing the CSD. It is not currently intended to be a full-featured cheminformatics toolkit, although the Molecule API can be used for a variety of chemiformatics tasks. Where functionality is not present in the Molecule API, we normally use the [RDKit](http://rdkit.org/docs/index.html), as it is freely-available, powerful and well-supported.

In [None]:
rdk_mol = rdkit.Chem.MolFromMolBlock(conn_tab)  # See above for source of conn_tab    

rdkit.Chem.MolToSmiles(rdk_mol)

### Plotting

There are many good plotting packages available for Python now, with various strengths and weaknesses. One I find generally useful is [Altair](https://altair-viz.github.io/index.html), because it straightforwardly generates attractive plots that integrate well with Jupyter. Altair is used in various Notebooks in this collection, athough it could be substituted for other packages if desired. One nice feature of Altair is that it allows the creation of [interative](https://altair-viz.github.io/user_guide/interactions.html) plots, although we will not explore that much in these Notebooks. 

### 3D Visualisation

The CCDC is working on a suite of web-based tools for visualisation of small molecules and proteins. These tools will be embeddable in notebooks so as to allow interactive visualisation of CSD database entries, PDB enties and small molecule- and macromolecular structures from other sources.

In the meantime, PyMOL may be driven from a Jupyter notebook as shown below.

In [None]:
# Imports required for PyMOL demo...

import subprocess
import xmlrpc.client as xmlrpclib
from time import sleep

**Important!** The PyMOL executable named in `pymol_exe` below must be in your path. If you installed PyMOL using `conda` (as per the instructions in `Setup_CSD_API.ps1`) this should be the case. If not, you will need to set `pymol_exe` to the name of your PyMOL executable, and to either ensure it is in your path or include the full path to the executable in `pymol_exe`.

In [None]:
pymol_exe = 'pymol.exe'

Utility to start PyMOL (somewhat) robustly...

In [None]:
def start_pymol():
    
    process = subprocess.Popen([pymol_exe, '-R'])

    pymol = xmlrpclib.ServerProxy('http://localhost:9123')

    for n_try in range(10):

        try:
            
            pymol.do('')  # No-op
            
            return pymol
        
        except ConnectionRefusedError as error:
        
            sleep(1)

    return None  # Failed to start PyMOL

Write the string representation of the molecule we retrieved above (_i.e._ `conn_tab`) to a temporary file...

In [None]:
mol_file = 'demo.mol'

with open(mol_file, 'w') as file:
    
    file.write(conn_tab + '\n')  # See above for source of conn_tab    

Start PyMOL, load the molecule and configure the visulization...

In [None]:
pymol = start_pymol()

if pymol:
    
    pymol.load(mol_file)
    
    pymol.do('set stick_radius, 0.1')

In [None]:
os.unlink(mol_file)  # Tidy up