# Validate our ECIF calculations #

Compare ECIF::LD for the same complex, calculated with our script vs. calculated with the example code given by Sánchez-Cruz et al.

**NOTE: This will not run properly with RDKit 2021.09.1 and pandas >1.1.2 due to a name change in the `pandas.io` module (see [this bug report](https://github.com/rdkit/rdkit/issues/3701)). Until the fixed version 2021.09.2 of RDKit reaches the conda repositories, make sure that your version of pandas is not greater than 1.1.2.**

In [1]:
import sys
sys.path.append('..')
import pandas as pd
from ECIF.ecif import *
import calculate_ecif as cf

# Two example structures
protein = "../ECIF/Example_Structures/1a0q_protein.pdb"
ligand = "../ECIF/Example_Structures/1a0q_ligandCD1.sdf"

# Calculate descriptors with their code
# (from https://github.com/DIFACQUIM/ECIF/02_Examples(Descriptors).ipynb)
their_ecif = GetECIF(protein, ligand, distance_cutoff=6.0)
their_ld = GetRDKitDescriptors(ligand)

their_ecif_ld = pd.DataFrame([their_ecif], columns=PossibleECIF).join(pd.DataFrame([their_ld], columns=LigandDescriptors))

# Calculate descriptors with our code
cf.main(complx_name='1A0Q', receptor_file=protein, ligand_files=[ligand], cutoff=6.0, output='../example_data/our_example_descriptors.csv')
our_ecif_ld = pd.read_csv('../example_data/our_example_descriptors.csv', sep=',', usecols=lambda x: x != 'Receptor' and x != 'Ligand')

RDKit ERROR: [15:42:15] Explicit valence for atom # 17 C, 6, is greater than permitted
[15:42:15] Explicit valence for atom # 17 C, 6, is greater than permitted
Traceback (most recent call last):
  File "/home/Luis/.conda/envs/ecif/lib/python3.7/site-packages/rdkit/ML/Descriptors/MoleculeDescriptors.py", line 88, in CalcDescriptors
    res[i] = fn(mol)
  File "/home/Luis/.conda/envs/ecif/lib/python3.7/site-packages/rdkit/Chem/QED.py", line 286, in qed
    qedProperties = properties(mol)
  File "/home/Luis/.conda/envs/ecif/lib/python3.7/site-packages/rdkit/Chem/QED.py", line 249, in properties
    mol = Chem.RemoveHs(mol)
rdkit.Chem.rdchem.AtomValenceException: Explicit valence for atom # 17 C, 6, is greater than permitted
RDKit ERROR: [15:42:15] Explicit valence for atom # 17 C, 6, is greater than permitted
[15:42:15] Explicit valence for atom # 17 C, 6, is greater than permitted
Traceback (most recent call last):
  File "/home/Luis/.conda/envs/ecif/lib/python3.7/site-packages/rdkit/ML

In [2]:
their_ecif_ld.head()

Unnamed: 0,C;4;1;3;0;0-Br;1;1;0;0;0,C;4;1;3;0;0-C;3;3;0;1;1,C;4;1;3;0;0-C;4;1;1;0;0,C;4;1;3;0;0-C;4;1;2;0;0,C;4;1;3;0;0-C;4;1;3;0;0,C;4;1;3;0;0-C;4;2;0;0;0,C;4;1;3;0;0-C;4;2;1;0;0,C;4;1;3;0;0-C;4;2;1;0;1,C;4;1;3;0;0-C;4;2;1;1;1,C;4;1;3;0;0-C;4;2;2;0;0,...,fr_quatN,fr_sulfide,fr_sulfonamd,fr_sulfone,fr_term_acetylene,fr_tetrazole,fr_thiazole,fr_thiocyan,fr_thiophene,fr_urea
0,0,0,0,0,3,0,0,0,15,8,...,0,0,0,0,0,0,0,0,0,0


In [3]:
our_ecif_ld.head()

Unnamed: 0,C;4;1;3;0;0-Br;1;1;0;0;0,C;4;1;3;0;0-C;3;3;0;1;1,C;4;1;3;0;0-C;4;1;1;0;0,C;4;1;3;0;0-C;4;1;2;0;0,C;4;1;3;0;0-C;4;1;3;0;0,C;4;1;3;0;0-C;4;2;0;0;0,C;4;1;3;0;0-C;4;2;1;0;0,C;4;1;3;0;0-C;4;2;1;0;1,C;4;1;3;0;0-C;4;2;1;1;1,C;4;1;3;0;0-C;4;2;2;0;0,...,fr_quatN,fr_sulfide,fr_sulfonamd,fr_sulfone,fr_term_acetylene,fr_tetrazole,fr_thiazole,fr_thiocyan,fr_thiophene,fr_urea
0,0,0,0,0,3,0,0,0,15,8,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,3,0,0,0,15,8,...,0,0,0,0,0,0,0,0,0,0


Comparing both descriptors, we find that not all values are equal.

In [4]:
our_ecif_ld.equals(their_ecif_ld)

False

Let's see, which values differ.

In [5]:
eq = our_ecif_ld.eq(their_ecif_ld)
our_ecif_ld.loc[:, eq.columns[(eq == False).all()]]

Unnamed: 0,BertzCT,Chi0n,Chi1,Chi1n,Chi1v,Chi2v,Chi4n,HallKierAlpha,Kappa1,PEOE_VSA14,...,SMR_VSA7,SlogP_VSA12,SlogP_VSA3,SlogP_VSA6,TPSA,EState_VSA10,EState_VSA9,VSA_EState6,MolLogP,MolMR
0,1447.121015,30.843917,18.657313,15.232253,6.827698,4.797017,0.921026,0.15,6.100099,160.044631,...,30.212094,7.248241,23.513708,30.212094,106.61,23.513708,27.456255,-6.787745,3.0926,82.4327
1,1447.121015,30.843917,18.657313,15.232253,6.827698,4.797017,0.921026,0.15,6.100099,160.044631,...,30.212094,7.248241,23.513708,30.212094,106.61,23.513708,27.456255,-6.787745,3.0926,82.4327


In [6]:
their_ecif_ld.loc[:, eq.columns[(eq == False).all()]]

Unnamed: 0,BertzCT,Chi0n,Chi1,Chi1n,Chi1v,Chi2v,Chi4n,HallKierAlpha,Kappa1,PEOE_VSA14,...,SMR_VSA7,SlogP_VSA12,SlogP_VSA3,SlogP_VSA6,TPSA,EState_VSA10,EState_VSA9,VSA_EState6,MolLogP,MolMR
0,1447.121015,30.843917,18.657313,15.232253,6.827698,4.797017,0.921026,0.15,6.100099,160.044631,...,30.212094,7.248241,23.513708,30.212094,106.61,23.513708,27.456255,-6.787745,3.0926,82.4327


Let's see how much they actually differ. As you can see, the difference is negligible.

In [7]:
for col in eq.columns[(eq == False).all()]:
    print(col)
    print(f'Ours  : {our_ecif_ld.loc[0, col]}')
    print(f'Theirs: {their_ecif_ld.loc[0, col]}')
    print()

BertzCT
Ours  : 1447.1210152675228
Theirs: 1447.121015267523

Chi0n
Ours  : 30.84391693378309
Theirs: 30.843916933783095

Chi1
Ours  : 18.657312840881538
Theirs: 18.65731284088154

Chi1n
Ours  : 15.232253127182787
Theirs: 15.232253127182789

Chi1v
Ours  : 6.82769824219312
Theirs: 6.827698242193119

Chi2v
Ours  : 4.797016690840897
Theirs: 4.797016690840896

Chi4n
Ours  : 0.9210263200128379
Theirs: 0.921026320012838

HallKierAlpha
Ours  : 0.14999999999999994
Theirs: 0.14999999999999997

Kappa1
Ours  : 6.100098621183212
Theirs: 6.1000986211832116

PEOE_VSA14
Ours  : 160.04463080100518
Theirs: 160.0446308010052

SMR_VSA10
Ours  : 18.804004649930047
Theirs: 18.804004649930043

SMR_VSA5
Ours  : 44.47549749683414
Theirs: 44.47549749683413

SMR_VSA7
Ours  : 30.212093538316477
Theirs: 30.212093538316473

SlogP_VSA12
Ours  : 7.2482410740100525
Theirs: 7.248241074010053

SlogP_VSA3
Ours  : 23.513708122078125
Theirs: 23.51370812207812

SlogP_VSA6
Ours  : 30.212093538316477
Theirs: 30.2120935383164