ECFP (extended connectivity fingerprints) aka circular fingerprints, are built by applying the Morgan algorithm to a set of user-supplied atom invariants. In this tutorial we will generate fragments of a macrocyclic and a non-macrocyclic molecule of similar size and compare them.

 When generating Morgan fingerprints, the radius of the fingerprint(must also be provided :

In [29]:
from rdkit import Chem, DataStructs
from rdkit.Chem import AllChem, Draw, rdDepictor #, rdCoordGen # rdCoordGen requires Python 3.7
from IPython.display import display, HTML
import pandas as pd
import os

# The first canonical SMILES corresponds to macrocyclic compound BACE_149 from D3R GC2018
# The following two SMILES correspond to non-macrocyclic compounds
smiles = ['CCCCNC(=O)[C@H](C)C[C@H](O)[C@@H]1C[C@H](C)CCCCCCC[C@H](NC(=O)OC(C)(C)C)C(=O)N[C@@H](C)C(=O)N1',
          'CCCCNC(=O)C(C)CC(C(CC1CCCCC1)NC(=O)C(C(C)C)NC(=O)CNC(=O)OC(C)(C)C)O',
          'CC(C)C(C(=O)NC(C(C)C)C(=O)OC)NC(=O)CCC(C(CC1CCCCC1)NC(=O)C(C)NC(=O)C(C)N)O'
          ]
ids = ['BACE_149', 'mol2', 'mol3']
df = pd.DataFrame({'mol': [Chem.MolFromSmiles(x) for x in smiles]}, index=ids)


Lets visualize these 3 molecules.

In [30]:
mols = []
for mol in df['mol'].values:
    mol = Chem.Mol(mol)
    rdDepictor.Compute2DCoords
    # rdCoordGen.AddCoords(mol)  # requires Python 3.7
    # rescale(mol, f=1.4)  # AddCoords seems to produced coordinates that are hard to display, so rescale them
    mols.append(mol)
legends = df['mol'].keys()
img = Draw.MolsToGridImage(mols, 
                    molsPerRow=len(legends),
                    subImgSize=(300, 300),
                    legends=legends,
                    useSVG=False,   # set to True in Python 3.7
                    )
# display(img)  # try it again in Python 3.7
print(os.getcwd())
img.save("images/gridmol.png")  # save the image to a file for the time being

/home2/thomas/Documents/tutorials/Multilayer_Perceptron_Keras


IOError: [Errno 2] No such file or directory: 'images/gridmol.png'

In [None]:
fp1 = AllChem.GetMorganFingerprint(m1,radius=3)
fp2 = AllChem.GetMorganFingerprint(m2,radius=3)
fp3 = AllChem.GetMorganFingerprint(m3,radius=3)
print("The ECFP fingeprint(similarity between m1 and m2 is %f" % DataStructs.DiceSimilarity(fp1,fp2))
print("The ECFP fingeprint(similarity between m2 and m3 is %f" % DataStructs.DiceSimilarity(fp2,fp3))
print("The ECFP fingeprint(similarity between m1 and m3 is %f" % DataStructs.DiceSimilarity(fp1,fp3))

Morgan fingerprints, like atom pairs and topological torsions, use counts bm1y default, but it’s also possible to calculate them as bit vectors:

In [4]:
fp1 = AllChem.GetMorganFingerprintAsBitVect(m1,radius=3,nBits=4096)
fp2 = AllChem.GetMorganFingerprintAsBitVect(m2,radius=3,nBits=4096)
fp3 = AllChem.GetMorganFingerprintAsBitVect(m3,radius=3,nBits=4096)
print("The ECFP fingeprint(similarity between m1 and m2 is %f" % DataStructs.DiceSimilarity(fp1,fp2))
print("The ECFP fingeprint(similarity between m2 and m3 is %f" % DataStructs.DiceSimilarity(fp2,fp3))
print("The ECFP fingeprint(similarity between m1 and m3 is %f" % DataStructs.DiceSimilarity(fp1,fp3))

The ECFP fingeprint similarity between m1 and m2 is 0.480000
The ECFP fingeprint similarity between m2 and m3 is 0.555556
The ECFP fingeprint similarity between m1 and m3 is 0.248521


As you can see, the similarity changes slightly if you express them as bit vectors and can change further if you increase the nBits parameter which controls bit collisions.

When comparing the ECFP/FCFP fingerprints and the Morgan fingerprints generated by the RDKit, remember that the 4 in ECFP4 corresponds to the diameter of the atom environments considered, while the Morgan fingerprints take a radius parameter. So the examples above, with radius=2, are roughly equivalent to ECFP4 and FCFP4.