# Molecular Geometry Analysis

[Mogul](https://www.ccdc.cam.ac.uk/support-and-resources/ccdcresources/mogul_2020_1.pdf) uses a knowledge-base of intramolecular geometric parameters dervided from the CSD to perform geometric analyses on small molecules.
Similar molecular geometry analyses may be performed using the [Conformer API](https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/molecular_geometry_analysis.html).

In [None]:
%run ../Discovery_Notebook_utils.py

In [None]:
from io import StringIO

In [None]:
from ccdc.conformer import GeometryAnalyser

### Configuration

### Initialization

In [None]:
logger.info(script_info)

### Geometry analysis of a small molecule

First, set up a CCDC [Geometry Analyser](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/conformer_api.html#ccdc.conformer.GeometryAnalyser)...

In [None]:
analyser = GeometryAnalyser()

analyser.settings.generalisation = False  # Use only fully-defined distributions
analyser.settings.ring.analyse = False  # Can be slow, so disable for now

Next, we load a molecule to analyse. This is a local copy of the ligand [4QQ](https://www.ebi.ac.uk/pdbe/entry/pdb/1ett/bound/4QQ) from the PDBe structure [1ETT](https://www.ebi.ac.uk/pdbe/entry/pdb/1ett) (Bovine Thrombin).

In [None]:
ligand_file = '1ett.mol2'

In [None]:
with MoleculeReader(ligand_file) as reader:
    
    molecule = reader[0]

Standardise the molecule to CSD conventions...

_N.B._ this is not always necessary, but is quick and can't hurt for structures taken from outside the CSD ecosystem.

In [None]:
molecule.remove_hydrogens()
molecule.assign_bond_types(which='unknown')
molecule.standardise_aromatic_bonds()  
molecule.standardise_delocalised_bonds()
molecule.add_hydrogens()

In [None]:
SVG(diagram_generator.image(molecule))

Analyse our molecule of interest...

In [None]:
analysed_mol = analyser.analyse_molecule(molecule)

len(analysed_mol.analysed_torsions)  # Number of torsions found

Make a dataframe of the analysis results...

* `value` is the value of the torsion angle in the molecule being analysed.
* `unusual` indicates whether the geometric feature is considered unusual or not.
* `enough_hits` indicates whether there are enough hits in the CSD for a sound judgement.
* `d_min` is the distance to the nearest value in the CSD.
* `local_density` is the percentage of CSD values within 10 degrees of query value.
* `depiction` is a 2D depiction with the torsion highlighted.
* `object` is the API torsion object, cached here for later reference.

Local utility to depict a molecule with a torsion highlighted...

In [None]:
def depict_torsion(torsion):

    return diagram_generator.image(molecule, highlight_atoms=[molecule.atoms[x] for x in torsion.atom_indices])

In [None]:
torsions_df = pd.DataFrame(
                [('-'.join(x.atom_labels), x.value, x.unusual, x.enough_hits, x.d_min, x.local_density, depict_torsion(x), x) for x in analysed_mol.analysed_torsions],
                columns=['atom_labels', 'value', 'unusual', 'enough_hits', 'd_min', 'local_density', 'depiction', 'object']
            ).sort_values('d_min', ascending=False).reset_index(drop=True)

torsions_df.shape

For convenience, we will examine further only the subset of torsions considered 'unusual' and with enough hits to be reasonably certain of the result...

In [None]:
unusual_df = torsions_df.query("unusual and enough_hits").drop(['unusual', 'enough_hits'], axis=1).reset_index(drop=True)

unusual_df.shape

In [None]:
show_dataframe(unusual_df.drop('object', axis=1).head(3))  # Top three

### Plotting distributions of CSD values

Plotting a histogram of the CSD values used in the geometry analysis can be a great help in evaluating the result.

We will illustrate plotting with the first unusual torsion found above...

In [None]:
n = 0

torsion = unusual_df.iloc[n]['object']  # Extract the cached API torsion object from dataframe

In [None]:
torsion.value, len(torsion.distribution)  # Value in analysed molecule and number of observed values in the CSD

Create a histogram and display it using Altair...

In [None]:
plot = (altair.Chart(pd.DataFrame({'distribution':  torsion.distribution}))
        .mark_bar().encode(altair.X('distribution:Q', bin=altair.Bin(extent=[0, 180], step=5.0)), y='count()') +
    altair.Chart(pd.DataFrame([{"value": [abs(torsion.value)]}]))
        .mark_rule().encode(x='value:Q'))

plot

Now, it would be nice to be able to add the histograms to a table, so they could be inspected alongside the summary data and depiction.

One way to do this is to generate PNGs for the plots and display them _via_ inline HTML. However, I haven't go this working on Windows yet, so the (slightly fragile) solution below just uses the HTML generated by Altair. Note that the plots may disappear; if this happen, redo the 'show dataframe' step. If the depictions appear small, try widening your browser window.

In [None]:
# Local utility to generate a torsion-angle histogram

def torsion_plot(index):
    
    torsion = unusual_df.iloc[index]['object']

    plot = (
        altair.Chart(pd.DataFrame({'distribution':  torsion.distribution}))
            .mark_bar().encode(altair.X('distribution:Q', bin=altair.Bin(extent=[0, 180], step=5.0)), y='count()') +
        altair.Chart(pd.DataFrame([{"value": [abs(torsion.value)]}]))
            .mark_rule().encode(x='value:Q')
        .properties(
            width=400,
            height=200
        )
    )

    with StringIO() as buffer:

        plot.save(buffer, 'html')

        html = buffer.getvalue().replace('vis', f'vis_{index:02d}')  # NB. Make div names unique
                   
    return html

Add the histograms to the table...

In [None]:
unusual_df = unusual_df.assign(torsion_plot = [torsion_plot(index) for index in unusual_df.index.values])

In [None]:
show_dataframe(unusual_df.head())

### Visualisation of 3D structure

It would be even greater to add a marked-up 3D depiction to the table to complete the picture, and the CCDC is working on a web-based visualizer that will allow this. When this is ready, an example will be added to a future iteration of this notebook.