# ADN T2 ~ Molecular Descriptors and Fingerprints

In the current tutorial we will learn how to use python combined with Rdkit in order to calculate different molecular descriptors descriptors, using in built *rdkit* functions and also third party software like *Mordred*.

as a definition : a **Molecular Descriptor or MD** “… the final result of a logical and mathematical procedure that transforms chemical information of a molecule, such as structural features, into useful numbers or the result of standardized experiments.

Utility :

They can be used in Molecular similarity to compare molecules (and/or their properties), so it can help us to identify new molecules with desired properties and biological activity

## Molecular descriptors
*Similarity can be assessed in many different ways depending on the application (see J. Med. Chem. (2014), 57, 3186-3204</a>):*

* 1D molecular descriptor: Solubility, logP, molecular weight, melting point.
    * Global descriptor: only one value represents the whole molecule
    * Usually do not contain enough information to be applied to machine learning (ML)
    * Can be added to 2D fingerprints to improve molecular encoding for ML
* 2D molecular descriptors: Molecular graphs, paths, fragments, atom environments
    * Detailed representation of individual parts of the molecule
    * Contains many features/bits per molecule called fingerprints
    * Very often used in similarity search and ML
* 3D molecular descriptors: Shape, stereochemistry
    * Less robust than 2D representations because of molecule flexibility (what is the "right" conformation of a molecule?) Biological similarity
    * Biological fingerprint, e.g. individual bits represent bioactivity measure against different targets
    * Independent of molecular structure
    * Requires experimental (or predicted) data

In [8]:
import pandas as pd
import numpy as np
from rdkit import Chem,DataStructs
from rdkit.Chem import AllChem, Descriptors
from rdkit.Chem.Draw import IPythonConsole
from mordred import Calculator, descriptors


In [2]:
file = 'data-test.sdf'
mols = []
for m in Chem.SDMolSupplier(file):
    if m is not None:
        mols.append(m)

In [6]:
len(mols)

50

### Molecular descriptors using *Rdkit*

In [4]:
desc = []

for m in mols :
    desc.append([Descriptors.MolLogP(m),
    Descriptors.HeavyAtomCount(m),
    Descriptors.MolMR(m),
    Descriptors.MolWt(m),
    Descriptors.NumHAcceptors(m),
    Descriptors.NumHDonors(m)])
columns = ['logp', 'HAC', 'MMR', 'MW', 'NHA', 'NHB']
desc = pd.DataFrame(desc, columns= columns)

In [5]:
desc

Unnamed: 0,logp,HAC,MMR,MW,NHA,NHB
0,2.1436,12,47.468,156.188,2,0
1,0.3995,10,35.372,138.174,4,0
2,-0.2663,6,25.5104,86.158,0,1
3,-0.4891,17,68.2684,236.403,0,2
4,2.783,14,57.044,180.21,2,0
5,2.1612,25,95.9625,344.519,2,2
6,2.1612,25,95.9625,344.519,2,2
7,-2.9381,56,186.3958,804.88,18,11
8,-5.1139,67,219.0292,967.021,23,14
9,-2.9381,56,186.3958,804.88,18,11


In [9]:
Calc = Calculator(descriptors, ignore_3D= True)