# Conformer Generation

This notebook illustrates the use of the [Conformer API](https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/conformer.html).

The API does not yet read stereochemical information, so requires a 3D structure as input if stereochemistry is to be handled properly. Thus, if the structures intended as input to the Conformer API
are available as SMILES (or, say, a 2D SDF file), we recommend the use of [RDKit](http://rdkit.org/) to generate an initial 3D structure which is then used as input to the conformer generator.
Note that work is underway to rectify this situation, and the CCDC API will thus be able to correctly handle stereochemical SMILES in a future release.

It is assumed that the input structures are all in the desired charge and tautomeric states. No protonation/deprotonation or tautomer standardization/enumeration is done here.

#### References

The Conformer API use CSD-derived conformational distributions to generate conformers for small molecules:
* https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/conformer.html
* https://downloads.ccdc.cam.ac.uk/documentation/API/modules/conformer_api.html

The associated publication describes the statistical validation performed, which also includes comparisons with PDB structures:
* https://pubs.acs.org/doi/abs/10.1021/acs.jcim.7b00697

In addition, there is a large amount of literature describing various comparisons between CSD, PDB and computed small-molecule conformations. A sample is given below:
* https://journals.iucr.org/d/issues/2017/03/00/ba5249/ba5249.pdf
* https://link.springer.com/article/10.1007/s10822-011-9538-6
* https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5553890/

In [None]:
%run ../Discovery_Notebook_Utils.py

In [None]:
from ccdc.conformer import MoleculeMinimiser, ConformerGenerator
from ccdc.search import SubstructureSearch, SMARTSSubstructure
from ccdc.descriptors import MolecularDescriptors

### Configuration

We will be using Lapatinib as our test molecule...

In [None]:
smiles, name = 'CS(=O)(=O)CCNCc1ccc(o1)c2ccc3c(c2)c(ncn3)Nc4ccc(c(c4)Cl)OCc5cccc(c5)F', 'Lapatinib'

### Initialization

In [None]:
logger.info(script_info)

<a id="mol_prep"></a>

### Molecule Preparation

As noted above, if starting from SMILES, we currently recommend the use of RDKit to generate an initial 3D structure which is then used as input to the Conformer API...

In [None]:
rdk_mol = Chem.MolFromSmiles(smiles)  # Convert SMILES to an RDKit molecule object

rdk_mol.SetProp('_Name', name)  # _Name is a special property that gets recorded in the molfile header, which can be convenient

rdk_mol = Chem.AddHs(rdk_mol)  # Hydrogens are required for 3D structure generation

assert AllChem.EmbedMolecule(rdk_mol) == 0, "Error! RDKit 'EmbedMolecule' failed!" # Generate a 3D structure using distance geomettry

molblock = Chem.MolToMolBlock(rdk_mol)  # Convert RDKit molecule to a string representation (SDF format)

We can then create a CCDC molecule from this starting structure, and standardize the molecular representation to ensure conformance with CSD conventions...

In [None]:
mol = Molecule.from_string(molblock)  # Make CCDC molecule object from the string representation

mol.remove_hydrogens()  # Remove Hydrogens as these will be added using the API below

mol.assign_bond_types(which='unknown')

mol.standardise_delocalised_bonds()

mol.standardise_aromatic_bonds()

mol.add_hydrogens()

Depict the CCDC molecule...

In [None]:
mol2html(mol)

Save for use later...

In [None]:
with MoleculeWriter('lapatinib.mol2') as writer:
    
    writer.write(mol)

### Simple minimization and superimposition

First, we simply minimise our molecule and then superimpose the minimised structure onto the original.

To minimise the molecule, we use a [MoleculeMinimiser](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/conformer_api.html#molecule-minimisation) object...

In [None]:
minimiser = MoleculeMinimiser()

In [None]:
minimised_mol = minimiser.minimise(mol)

Overlay the minimised structure onto the original using all heavy atoms...

In [None]:
overlayed_mol = MolecularDescriptors.overlay(mol, minimised_mol, zip(mol.heavy_atoms, minimised_mol.heavy_atoms))

rmsd = MolecularDescriptors.rmsd(mol, overlayed_mol)

logger.info(f"RMSD: {rmsd:.3f}")

Export the structures...

In [None]:
with MoleculeWriter('minimised.mol2') as writer:
    
    writer.write(overlayed_mol)

Visualize the exported structures in PyMOL...

In [None]:
# pymol = start_pymol()

# if pymol:
    
#     pymol.do('set stick_radius, 0.1')
#     pymol.do('set sphere_scale, 0.2')
    
#     pymol.load('lapatinib.mol2')  # See above
#     pymol.load('minimised.mol2')
    
#     pymol.do(f"hide everything, elem H and bound_to elem C")  # Hide non-polar Hydrogens

### Generating Conformers

Here we illustrate the generation of conformers for the molecule used above, using a [ConformerGenerator](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/conformer_api.html#ccdc.conformer.ConformerGenerator). It can be configured _via_ it's [settings](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/conformer_api.html#ccdc.conformer.ConformerSettings) attribute.

In [None]:
conformer_generator = ConformerGenerator()

conformer_generator.settings.max_conformers = 20 

The [ConformerHitList](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/conformer_api.html#ccdc.conformer.ConformerHitList) object produced contains attributes relating to the overall performance of the run; for example, whether or not the sampling limit was reached and how many rotamers had no observations in the CSD.

In [None]:
conformers = conformer_generator.generate(mol)

len(conformers), conformers.sampling_limit_reached, conformers.n_rotamers_with_no_observations

#### Analysing the conformers

The individual [ConformerHit](https://downloads.ccdc.cam.ac.uk/documentation/API/modules/conformer_api.html#ccdc.conformer.ConformerHit) objects have a `normalised_score` attribute, which is a value between 0.0 (most probable) and 1.0 (least probable); conformers are listed in probability order (the most probable first). 

A method is also provided to calculate the RMSD of each conformer in the hit list with respect to the input molecule as supplied (the default) or to a minimised version of the input molecule.

In [None]:
conformers_df = pd.DataFrame(
                    data   =[(x.normalised_score,  x.rmsd(),          x.rmsd(wrt='minimised')) for x in conformers],
                    columns=['Normalized Score',  'RMSD (original)', 'RMSD (minimised)']
                )

conformers_df.shape

In [None]:
conformers_df

### Superimposition of conformers onto a substructure

Conformers can be superimposed using all atoms, exactly as shown above for the minimised structure. However, this is not usually very informative, so instead we will illustrate the superposition of the conformers generated above onto a substructure. This method better highlights the similarities and differences between conformers.
It can also be easily extended to superimpose conformers for different molecules that share a substructure, such as members of a congeneric series (although we will not illustrate this here).


A [substructure search](https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/substructure_searching.html) is used to tag conformers with the substructure of interest, so we specify a substructure query for the superimposition using a SMARTS string...


In [None]:
query = 'n1cnc(N)c2ccccc12' # 4-amino quinazoline

In [None]:
searcher = SubstructureSearch()

substructure = SMARTSSubstructure(query)

In [None]:
_ = searcher.add_substructure(substructure)

tagged = searcher.search([x.molecule for x in conformers], max_hits_per_structure=1)

We use first (_i.e._ best-scoring) conformer as reference...

In [None]:
reference = tagged[0]

ref_molecule, ref_match_atoms = reference.molecule, reference.match_atoms()

Perform superimposition using substructure...

In [None]:
superimposed = [MolecularDescriptors.overlay(ref_molecule, hit.molecule, zip(ref_match_atoms, hit.match_atoms())) for hit in tagged]

Write superimposed structures...

In [None]:
superimposed_dir = Path('superimposed')

superimposed_dir.mkdir(exist_ok=True)

for n, hit in enumerate(superimposed, 1):
    
    superimposed_file = superimposed_dir / f'conformer_{n:03d}.mol2'
    
    with MoleculeWriter(str(superimposed_file)) as writer:

        writer.write(hit)        

Visualise superimposed structures in PyMOL (_N.B._ can be a bit slow)...

In [None]:
# pymol = start_pymol()

# if pymol:

#     pymol.do('set stick_radius, 0.1')
#     pymol.do('set sphere_scale, 0.2')

#     for superimposed in superimposed_dir.glob('conformer_*.mol2'):

#         pymol.load(str(superimposed))

#     pymol.do(f"hide everything, elem H and bound_to elem C")

### Conformer generation for multiple molecules

Conformer generation can also be performed on input files containing multiple molecules. As noted above, we assume that the input structures are already in the desired charge and tautomeric state.

As input we will use a MOL2 file (see the Input_for_GOLD notebook in the Docking folder for details of its preparation)...

In [None]:
input_file = 'input.mol2'

We will output the conformers to an SDF file (_N.B._ file format is taken from file extension)...

In [None]:
output_file = 'conformers.sdf'

Generate conformers using the Conformer API...

In [None]:
conformer_generator = ConformerGenerator()

conformer_generator.settings.max_conformers = 20 

conformer_generator.settings.superimpose_conformers_onto_reference = True

In [None]:
%%time

with EntryReader(str(input_file)) as reader:  # Initital 3D structure
        
    with EntryWriter(str(output_file)) as writer: # Conformers

        for entry in reader:

            mol = entry.molecule

            # Standardize molecular representation to ensure conformance with CSD conventions...

            mol.remove_hydrogens()
            mol.assign_bond_types(which='unknown')
            mol.standardise_delocalised_bonds()
            mol.standardise_aromatic_bonds()
            mol.add_hydrogens()

            # Generate conformers...

            conformers = conformer_generator.generate(mol)
            
            # Write conformers to file, along with per-conformer stats...
            
            for conformer in conformers:
                
                attributes = {**entry.attributes, **{x: getattr(conformer, x) for x in ['normalised_score']}}
                
                entry = Entry.from_molecule(conformer.molecule, **attributes)

                writer.write(entry)

Inspect the conformers generated in PyMOL...

_N.B._ Use `Movie > Show All States` to show all conformers at once.

In [None]:
# pymol = start_pymol()

# if pymol:
    
#     pymol.do('set stick_radius, 0.1')
    
#     pymol.load(output_file)
    
#     pymol.do(f"hide everything, elem H and bound_to elem C")