Filtering Molecular Datasets
>Several sets of rules exist for estimating the likelihood of a molecule exhibiting drug-like behaviour. It’s worth noting that these are rules of thumb, and that many examples of approved small molecule drugs exist that disobey these rules.

Lipinski Rule of 5
> Lipinski’s “Rule of 5” was introduced to estimate the oral bioavailability of molecules. Poor absorption is likely if the molecule violates more than one of the following conditions:
- Molecular Weight <= 500 Da
- No. Hydrogen Bond Donors <= 5
- No. Hydrogen Bond Acceptors <= 10
- LogP <= 5

In [4]:
from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.Chem.FilterCatalog import FilterCatalog, FilterCatalogParams



In [3]:
mol = Chem.MolFromSmiles('CC(=O)Nc1ccc(O)cc1')  # e.g. Paracetamol
#Ro5 descriptors
MW = Descriptors.MolWt(mol)
HBA = Descriptors.NOCount(mol)
HBD = Descriptors.NHOHCount(mol)
LogP = Descriptors.MolLogP(mol)
conditions = [MW <= 500, HBA <= 10, HBD <= 5, LogP <= 5]
pass_ro5 = conditions.count(True) >= 3
print(pass_ro5)

True


Filtering Unwanted Substructures
> Pan Assay Interference Compounds (or PAINS) are molecules that display non-specific binding, leading to unwanted side effects and false-positives in virtual screening. Common PAINS motifs include toxoflavin, isothiazolones, hydroxyphenyl hydrazones, curcumin, phenolsulfonamides, rhodanines, enones, quinones, and catechols.

> The Brenk filter removes molecules containing substructures with undesirable pharmacokinetics or toxicity. These include sulfates and phosphates that contribute to unfavourable pharmacokinetics, nitro groups which are mutagenic and 2-halopyridines and thiols which are both reactive.

>The NIH filter defined a list of functional groups with undesirable properties. These are split into those with reactive functionalities (including Michael acceptors, aldehydes, epoxides, alkyl halides, metals, 2-halo pyridines, phosphorus nitrogen bonds, α-chloroketones and β-lactams) and medicinal chemistry exclusions (including oximes, crown ethers, hydrazines, flavanoids, polyphenols, primary halide sulfates and multiple nitro groups).

In [5]:
mol = Chem.MolFromSmiles('CC1=C(C=C(C=C1)N2C(=O)C(=C(N2)C)N=NC3=CC=CC(=C3O)C4=CC(=CC=C4)C(=O)O)C')  # e.g. Eltrombopag
#PAINS flag
params_pains = FilterCatalogParams()
params_pains.AddCatalog(FilterCatalogParams.FilterCatalogs.PAINS_A)

True

In [6]:
catalog_pains = FilterCatalog(params_pains)
flag = catalog_pains.HasMatch(mol)  # Checks if there is a matching PAINS
print("PAINs: ", flag)

PAINs:  True


In [7]:
# Brenk Flag
params_unwanted = FilterCatalogParams()
params_unwanted.AddCatalog(FilterCatalogParams.FilterCatalogs.BRENK)

True

In [8]:
catalog_unwanted = FilterCatalog(params_unwanted)
flag = catalog_unwanted.HasMatch(mol)  # Checks if there is a matching unwanted substructure
print("Brenk: ", flag)

Brenk:  True


In [9]:
# NIH Flag
params_nih = FilterCatalogParams()
params_nih.AddCatalog(FilterCatalogParams.FilterCatalogs.NIH)

True

In [10]:
catalog_nih = FilterCatalog(params_nih)
flag = catalog_nih.HasMatch(mol)  # Checks if there is a matching NIH
print("NIH: ", flag)

NIH:  True


All of the available filters can also be considered at once. Additional information such as the class and description of the unwanted substructures can be obtained using the FilterCatalogEntry object

In [11]:
mol = Chem.MolFromSmiles('CC1=C(C=C(C=C1)N2C(=O)C(=C(N2)C)N=NC3=CC=CC(=C3O)C4=CC(=CC=C4)C(=O)O)C')  # e.g. Eltrombopag

# ALL Filters
params_all = FilterCatalogParams()
params_all.AddCatalog(FilterCatalogParams.FilterCatalogs.ALL)

True

In [12]:
catalog_all = FilterCatalog(params_all)
print([entry.GetProp('FilterSet') for entry in catalog_all.GetMatches(mol)])

['PAINS_A', 'Brenk', 'NIH', 'ChEMBL23_Dundee', 'ChEMBL23_BMS', 'ChEMBL23_MLSMR', 'ChEMBL23_Inpharmatica', 'ChEMBL23_LINT']


In [13]:
print([entry.GetDescription() for entry in catalog_all.GetMatches(mol)])

['azo_A(324)', 'diazo_group', 'azo_aryl', 'diazo group', 'azo_aryl', 'Azo', 'Filter5_azo', 'acyclic N-,=N and not N bound to carbonyl or sulfone']
