<a href="https://colab.research.google.com/github/DIFACQUIM/antiviral_ML/blob/main/5_modelability_index.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Modelability Index (MODI)**
MODI measures the proportion of compounds in a data set whose nearest neighbor belongs to the same class within a defined feature space. We computed the MODI values for ChEMBL data sets using Molecular ACCess System (MACCS) keys (166-bits) and Morgan Chiral of radius 2 (2048-bits) fingerprints, using the RDKit, NumPy, pandas, and SciPy libraries for Python 3. For each target in the ChEMBL data sets, MODI was calculated using two approaches: (1) including compounds classified as “Mixed” in the overall classification, and (2) excluding them.

This section was implemented by Gabriela Valle-Núñez and supervised by Fernanda I. Saldívar-González.

References: https://github.com/rcbraga/modi
[1]: Alexander Golbraikh, Eugene Muratov, Denis Fourches, and Alexander Tropsha Journal of Chemical Information and Modeling 2014 54 (1), 1-4 DOI: 10.1021/ci400572x

**Contact:** [gabrielavallenunez@gmail.com](https://gabrielavallenunez@gmail.com)


# **1. Prepare the environment**

In [None]:
!pip install rdkit-pypi

Collecting rdkit-pypi
  Downloading rdkit_pypi-2022.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.9 kB)
Downloading rdkit_pypi-2022.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (29.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.4/29.4 MB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rdkit-pypi
Successfully installed rdkit-pypi-2022.9.5


In [None]:
import numpy as np
import pandas as pd
import rdkit as rd
import matplotlib.pyplot as plt
import seaborn as sns
import rdkit.Chem as Chem
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem.PandasTools import LoadSDF
from rdkit.DataStructs import ConvertToNumpyArray
from scipy import spatial as sp

Failed to find the pandas get_adjustment() function to patch
Failed to patch pandas - PandasTools will have limited functionality


In [None]:
import time
tic =time.time()
from IPython.utils import io
import tqdm.notebook
import os, sys, random
total = 100
with tqdm.notebook.tqdm(total=total) as pbar:
    with io.capture_output() as captured:

        #from platform import python_version
        #if python_version() < "3.7":
        #!pip install https://github.com/biotite-dev/biotite/archive/master.tar.gz

        pbar.update(20)
        #Graphic libraries
        !pip install matplotlib
        import matplotlib.pyplot as plt
        %matplotlib inline
        import matplotlib.font_manager as font_manager
        !pip install seaborn
        import seaborn as sns
        pbar.update(40)

        #System libraries and primary tools
        import os.path
        import os, subprocess
        os.getcwd()
        !pip install cufflinks
        import cufflinks as cf
        import warnings
        warnings.filterwarnings("ignore")
        import math
        from math import pi
        from pathlib import Path
        from tempfile import TemporaryDirectory
        !pip install numpy
        import numpy as np
        !pip install simplejson
        import simplejson as json
        %config Completer.use_jedi = False
        pbar.update(30)
        pbar.update(10)
toc =time.time()
print("Execution time: "+ str(1000*(toc-tic))+" ms")

  0%|          | 0/100 [00:00<?, ?it/s]

Execution time: 39256.696701049805 ms


## **1.1. Load the data set**

In [None]:
df = pd.read_csv("/content/drive/MyDrive/antivirals_machine_learning/Files/DataFrames/7_viralcompound_focusedOnResp_complete_VERSION_2.xlsx - Sheet1 (2).csv", engine='python')
df.head(4)

Unnamed: 0.1,Unnamed: 0,molecule_chembl_id,acronym,target,activity_id,assay_chembl_id,assay_description,document_chembl_id,document_journal,ligand_efficiency,...,Overall_Classification,Classification_length,Inhibitor,No_Activity,Unknown,Total,%_Inhibitor,%_No_Activity,%_Unknown,Final_Classification
0,0,CHEMBL222813,IAV,Neuraminidase,3527776 - 6383033 - 3341420 - 13932871 - 13932...,CHEMBL1274484 - CHEMBL1839500 - CHEMBL1167423 ...,Inhibition of Influenza A virus (A/duck/Laos/2...,CHEMBL1134303 - CHEMBL4425127 - CHEMBL3817747 ...,Bioorg Med Chem Lett - Eur J Med Chem - Medche...,"{'bei': '29.48', 'le': '0.58', 'lle': '13.38',...",...,Mixed,2357,179,2,0,181,98.9,1.1,0.0,Inhibitor
1,1,CHEMBL1229,IAV,Neuraminidase,2947997 - 2947996 - 8059690 - 2948009 - 139446...,CHEMBL1041520 - CHEMBL1041523 - CHEMBL1958757 ...,Inhibition of Influenza A virus (A/duck/Laos/2...,CHEMBL4130469 - CHEMBL1134303 - CHEMBL3739372 ...,Bioorg Med Chem Lett - J Nat Prod - Eur J Med ...,"{'bei': '31.36', 'le': '0.61', 'lle': '8.51', ...",...,Mixed,1434,100,6,4,110,90.91,5.45,3.64,Inhibitor
2,2,CHEMBL674,IAV,Neuraminidase,19080701 - 13932882 - 13932879 - 13932877 - 13...,CHEMBL4363628 - CHEMBL3124128 - CHEMBL3124126 ...,Inhibition of Influenza A virus A/Anhui/2005(H...,CHEMBL4364245 - CHEMBL4425127 - CHEMBL3817747 ...,Bioorg Med Chem Lett - Eur J Med Chem - Medche...,"{'bei': '35.02', 'le': '0.68', 'lle': '9.15', ...",...,Mixed,1122,84,2,0,86,97.67,2.33,0.0,Inhibitor
3,3,CHEMBL466246,IAV,Neuraminidase,5086875 - 5086874 - 5086870 - 5086871 - 508777...,CHEMBL1634261 - CHEMBL1634260 - CHEMBL1634256 ...,Inhibition of Influenza A virus (A/Yokohama/67...,CHEMBL1629572 - CHEMBL3232942,Antimicrob Agents Chemother - Bioorg Med Chem,"{'bei': '25.67', 'le': '0.51', 'lle': '11.81',...",...,Inhibitor,572,44,0,0,44,100.0,0.0,0.0,Inhibitor


In [None]:
# pre-treatment of the data. Unify a column for the target and discard non essential columns
df.columns = df.columns.str.lower()
df = df[['molecule_chembl_id', "acronym", "target", "canonical_smiles_std", "first_pic50", "final_classification"]]
df["unique_target"] = df["acronym"].astype(str) + "_" + df["target"].astype(str).replace(" ", "_")
df.drop("acronym", axis = 1, inplace = True)
df.drop("target", axis = 1, inplace = True)
print(df.columns)
print(df.shape)

Index(['molecule_chembl_id', 'canonical_smiles_std', 'first_pic50',
       'final_classification', 'unique_target'],
      dtype='object')
(4521, 5)


In [None]:
df = df[(df != 0).all(axis=1)]
print(df)

     molecule_chembl_id                               canonical_smiles_std  \
0          CHEMBL222813  CC(=O)N[C@H]1[C@H]([C@H](O)[C@H](O)CO)OC(C(=O)...   
1            CHEMBL1229  CCOC(=O)C1=C[C@@H](OC(CC)CC)[C@H](NC(C)=O)[C@@...   
2             CHEMBL674   CCC(CC)O[C@@H]1C=C(C(=O)O)C[C@H](N)[C@H]1NC(C)=O   
3          CHEMBL466246  CO[C@@H]([C@@H]1OC(C(=O)O)=C[C@H](NC(=N)N)[C@H...   
4          CHEMBL467058  CCCCCCCC(=O)OC[C@@H](O)[C@@H](OC)[C@@H]1OC(C(=...   
...                 ...                                                ...   
4516       CHEMBL495228     COC(=O)c1cc([N+](=O)[O-])c(NC(C)=O)cc1OCCC(C)C   
4517       CHEMBL501838             CC(=O)Nc1cc(OC(C)C)c(C(=O)O)cc1NC(=N)N   
4518      CHEMBL5188555  CN(Cc1cn(C2C3=C(OC2(C)C)c2ccccc2C(=O)C3=O)nn1)...   
4519       CHEMBL495226                   COC(=O)c1ccc(NC(C)=O)cc1OC1CCCC1   
4520       CHEMBL133016  CC(NC(=O)[C@H](CC(=O)N(C)C)NC(=O)[C@@H](NC(=O)...   

      first_pic50 final_classification      unique_target  
0  

In [None]:
#Rename columns
df.rename(columns = {"canonical_smiles_std":"SMILES", "first_pic50": "pIC50"}, inplace = True)

df.head()

Unnamed: 0,molecule_chembl_id,SMILES,pIC50,final_classification,unique_target
0,CHEMBL222813,CC(=O)N[C@H]1[C@H]([C@H](O)[C@H](O)CO)OC(C(=O)...,9.79588,Inhibitor,IAV_Neuraminidase
1,CHEMBL1229,CCOC(=O)C1=C[C@@H](OC(CC)CC)[C@H](NC(C)=O)[C@@...,9.79588,Inhibitor,IAV_Neuraminidase
2,CHEMBL674,CCC(CC)O[C@@H]1C=C(C(=O)O)C[C@H](N)[C@H]1NC(C)=O,9.958607,Inhibitor,IAV_Neuraminidase
3,CHEMBL466246,CO[C@@H]([C@@H]1OC(C(=O)O)=C[C@H](NC(=N)N)[C@H...,8.88941,Inhibitor,IAV_Neuraminidase
4,CHEMBL467058,CCCCCCCC(=O)OC[C@@H](O)[C@@H](OC)[C@@H]1OC(C(=...,7.406714,Inhibitor,IAV_Neuraminidase


In [None]:
# Delete "Mixed" values
df = df[(df != 'Mixed').all(axis=1)]

# Delete "Unknown" values
df = df[(df != 'Unknown').all(axis=1)]

df

Unnamed: 0,molecule_chembl_id,SMILES,pIC50,final_classification,unique_target
0,CHEMBL222813,CC(=O)N[C@H]1[C@H]([C@H](O)[C@H](O)CO)OC(C(=O)...,9.795880,Inhibitor,IAV_Neuraminidase
1,CHEMBL1229,CCOC(=O)C1=C[C@@H](OC(CC)CC)[C@H](NC(C)=O)[C@@...,9.795880,Inhibitor,IAV_Neuraminidase
2,CHEMBL674,CCC(CC)O[C@@H]1C=C(C(=O)O)C[C@H](N)[C@H]1NC(C)=O,9.958607,Inhibitor,IAV_Neuraminidase
3,CHEMBL466246,CO[C@@H]([C@@H]1OC(C(=O)O)=C[C@H](NC(=N)N)[C@H...,8.889410,Inhibitor,IAV_Neuraminidase
4,CHEMBL467058,CCCCCCCC(=O)OC[C@@H](O)[C@@H](OC)[C@@H]1OC(C(=...,7.406714,Inhibitor,IAV_Neuraminidase
...,...,...,...,...,...
4515,CHEMBL5188623,CC(C)(C)c1ccc(N(C(=O)c2ccco2)C(C(=O)Nc2ccc(F)c...,5.107349,Inhibitor,SARS-CoV-2_Mpro
4516,CHEMBL495228,COC(=O)c1cc([N+](=O)[O-])c(NC(C)=O)cc1OCCC(C)C,5.528708,Inhibitor,IAV_Neuraminidase
4517,CHEMBL501838,CC(=O)Nc1cc(OC(C)C)c(C(=O)O)cc1NC(=N)N,7.309804,Inhibitor,IAV_Neuraminidase
4518,CHEMBL5188555,CN(Cc1cn(C2C3=C(OC2(C)C)c2ccccc2C(=O)C3=O)nn1)...,5.657577,Inhibitor,SARS-CoV-2_PLP


In [None]:
# Replace "Inhibitor" to "Active" and "No_Activity" to "Inactive"
df['final_classification'] = df['final_classification'].replace({'Inhibitor': 'Active', 'No_Activity': 'Inactive'})
df

Unnamed: 0,molecule_chembl_id,SMILES,pIC50,final_classification,unique_target
0,CHEMBL222813,CC(=O)N[C@H]1[C@H]([C@H](O)[C@H](O)CO)OC(C(=O)...,9.795880,Active,IAV_Neuraminidase
1,CHEMBL1229,CCOC(=O)C1=C[C@@H](OC(CC)CC)[C@H](NC(C)=O)[C@@...,9.795880,Active,IAV_Neuraminidase
2,CHEMBL674,CCC(CC)O[C@@H]1C=C(C(=O)O)C[C@H](N)[C@H]1NC(C)=O,9.958607,Active,IAV_Neuraminidase
3,CHEMBL466246,CO[C@@H]([C@@H]1OC(C(=O)O)=C[C@H](NC(=N)N)[C@H...,8.889410,Active,IAV_Neuraminidase
4,CHEMBL467058,CCCCCCCC(=O)OC[C@@H](O)[C@@H](OC)[C@@H]1OC(C(=...,7.406714,Active,IAV_Neuraminidase
...,...,...,...,...,...
4515,CHEMBL5188623,CC(C)(C)c1ccc(N(C(=O)c2ccco2)C(C(=O)Nc2ccc(F)c...,5.107349,Active,SARS-CoV-2_Mpro
4516,CHEMBL495228,COC(=O)c1cc([N+](=O)[O-])c(NC(C)=O)cc1OCCC(C)C,5.528708,Active,IAV_Neuraminidase
4517,CHEMBL501838,CC(=O)Nc1cc(OC(C)C)c(C(=O)O)cc1NC(=N)N,7.309804,Active,IAV_Neuraminidase
4518,CHEMBL5188555,CN(Cc1cn(C2C3=C(OC2(C)C)c2ccccc2C(=O)C3=O)nn1)...,5.657577,Active,SARS-CoV-2_PLP


In [None]:
#Reset index
df = df.reset_index(drop=True)
df

Unnamed: 0,molecule_chembl_id,SMILES,pIC50,final_classification,unique_target
0,CHEMBL222813,CC(=O)N[C@H]1[C@H]([C@H](O)[C@H](O)CO)OC(C(=O)...,9.795880,Active,IAV_Neuraminidase
1,CHEMBL1229,CCOC(=O)C1=C[C@@H](OC(CC)CC)[C@H](NC(C)=O)[C@@...,9.795880,Active,IAV_Neuraminidase
2,CHEMBL674,CCC(CC)O[C@@H]1C=C(C(=O)O)C[C@H](N)[C@H]1NC(C)=O,9.958607,Active,IAV_Neuraminidase
3,CHEMBL466246,CO[C@@H]([C@@H]1OC(C(=O)O)=C[C@H](NC(=N)N)[C@H...,8.889410,Active,IAV_Neuraminidase
4,CHEMBL467058,CCCCCCCC(=O)OC[C@@H](O)[C@@H](OC)[C@@H]1OC(C(=...,7.406714,Active,IAV_Neuraminidase
...,...,...,...,...,...
4011,CHEMBL5188623,CC(C)(C)c1ccc(N(C(=O)c2ccco2)C(C(=O)Nc2ccc(F)c...,5.107349,Active,SARS-CoV-2_Mpro
4012,CHEMBL495228,COC(=O)c1cc([N+](=O)[O-])c(NC(C)=O)cc1OCCC(C)C,5.528708,Active,IAV_Neuraminidase
4013,CHEMBL501838,CC(=O)Nc1cc(OC(C)C)c(C(=O)O)cc1NC(=N)N,7.309804,Active,IAV_Neuraminidase
4014,CHEMBL5188555,CN(Cc1cn(C2C3=C(OC2(C)C)c2ccccc2C(=O)C3=O)nn1)...,5.657577,Active,SARS-CoV-2_PLP


# **2. MODI computation**


---



In [None]:
# Prepare MODI's environment
import os

# Create 'funcs' path
if not os.path.exists('funcs'):
    os.makedirs('funcs')

# Load 'general.py' to 'funcs' path
from google.colab import files
uploaded = files.upload()

# Replace data loaded to 'funcs' path
for filename in uploaded.keys():
    os.rename(filename, os.path.join('funcs', filename))

# Set functions
import sys
import os

# Set path
cwd = os.getcwd() if 'cwd' not in globals() else cwd

try:
    from funcs.general import *  # Import classes and functions needed
except ImportError:
    os.chdir('..')
    cwd = os.getcwd()
    sys.path.insert(0, cwd)
    from funcs.general import *


Saving general.py to general.py
Saving sirms.py to sirms.py


In [None]:
import sys,os
try:
    if(cwd is not None):
        from funcs.general import *
except:
    %cd ..
    cwd = os.getcwd()
    sys.path.insert(0,cwd)
    from funcs.general import *

## **2.1. MODI computation for each target**


In [None]:
# List of targets to filter
targets = ['IAV_Neuraminidase', 'SARS-CoV-2_Mpro', 'IBV_Neuraminidase',
           'SARS-CoV-2_PLP', 'HRV_Capsid protein', 'IAV_Polymerase (PA)',
           'IAV_RdRp', 'HRV_Protease', 'SARS-CoV_Mpro',
           'HRSV_Fusion glycoprotein F0', 'SARS-CoV-2_RdRp',
           'IAV_M2 proton channel', 'SARS-CoV-2_Spike glycoprotein',
           'SARS-CoV-2_MTase (NSP14)', 'IAV_Hemagglutinin',
           'IAV_Polymerase (PB2)', 'SARS-CoV-2_Helicase (NSP13)',
           'SARS-CoV_Helicase (NSP13)', 'MERS-CoV_PLP',
           'SARS-CoV_Spike glycoprotein', 'HPIV-1_Hemagglutinin-neuraminidase',
           'NiV_gpG', 'HRV_Mpro', 'HCoV-229E_Mpro', 'HRSV_M2 proton channel',
           'HRSV_Protein P', 'HRSV_RdRp', 'HEV-71_Mpro', 'HEV-71_Capsid protein',
           'FCoV_Mpro', 'MERS-CoV_Mpro', 'HCoV-NL63_PLP']

# Loop through each target and perform the operations
for target in targets:
    # Filter the dataframe for the current target
    target_df = df[df["unique_target"] == target].reset_index(drop=True)
    print(f"Target: {target}")
    print("Shape:", target_df.shape)
    print("Columns:", target_df.columns)
    print(target_df.head(2))

    # Calculate MODI with different descriptors, handle errors for small sample sizes
    try:
        MODI_maccs = modi(target_df, 'final_classification', descriptor="maccs")
        print(f"MODI (maccs) for {target} = {MODI_maccs}")
    except IndexError:
        print(f"Due to data sample, couldn't calculate MODI with maccs for {target}")

    try:
        MODI_morgan = modi(target_df, 'final_classification', descriptor="morgan_chiral2")
        print(f"MODI (morgan_chiral2) for {target} = {MODI_morgan}")
    except IndexError:
        print(f"Due to data sample, couldn't calculate MODI with morgan_chiral2 for {target}")

    # Calculate median pIC50
    median_pIC50 = target_df["pIC50"].median()
    print(f"Median pIC50 for {target} = {median_pIC50}")

    # Group by final_classification and describe pIC50
    pIC50_description = target_df.groupby('final_classification')['pIC50'].describe()
    print(f"pIC50 description for {target}:\n", pIC50_description)

    print("\n" + "="*50 + "\n")  # Separator for readability


Target: IAV_Neuraminidase
Shape: (1123, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0       CHEMBL222813  CC(=O)N[C@H]1[C@H]([C@H](O)[C@H](O)CO)OC(C(=O)...   
1         CHEMBL1229  CCOC(=O)C1=C[C@@H](OC(CC)CC)[C@H](NC(C)=O)[C@@...   

     pIC50 final_classification      unique_target  
0  9.79588               Active  IAV_Neuraminidase  
1  9.79588               Active  IAV_Neuraminidase  


  0%|          | 0/1123 [00:00<?, ?it/s]

MODI (maccs) for IAV_Neuraminidase = 0.8762243989314337


  0%|          | 0/1123 [00:00<?, ?it/s]

MODI (morgan_chiral2) for IAV_Neuraminidase = 0.9056099732858415
Median pIC50 for IAV_Neuraminidase = 5.721246399
pIC50 description for IAV_Neuraminidase:
                       count      mean       std     min       25%      50%  \
final_classification                                                         
Active                733.0  6.848073  1.260421  5.0000  5.782516  6.60206   
Inactive              390.0  3.775196  1.229119 -3.7404  3.698970  4.02230   

                           75%       max  
final_classification                      
Active                7.853872  9.958607  
Inactive              4.391909  4.950782  


Target: SARS-CoV-2_Mpro
Shape: (815, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4802135  CC(C)(C)[C@H](NC(=O)C(F)(F)F)C(=O)N1C[C@H]2[C@...   
1      CHEMBL3427166  CC(C)C[C@H](NC(=O

  0%|          | 0/815 [00:00<?, ?it/s]

MODI (maccs) for SARS-CoV-2_Mpro = 0.8834355828220859


  0%|          | 0/815 [00:00<?, ?it/s]

MODI (morgan_chiral2) for SARS-CoV-2_Mpro = 0.912883435582822
Median pIC50 for SARS-CoV-2_Mpro = 6.346787486
pIC50 description for SARS-CoV-2_Mpro:
                       count      mean       std       min      25%       50%  \
final_classification                                                           
Active                651.0  6.620906  0.981204  5.000000  5.87322  6.542724   
Inactive              164.0  4.259580  0.290167  2.657577  4.00000  4.301030   

                           75%      max  
final_classification                     
Active                7.293283  9.69897  
Inactive              4.459758  4.69897  


Target: IBV_Neuraminidase
Shape: (202, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0       CHEMBL222813  CC(=O)N[C@H]1[C@H]([C@H](O)[C@H](O)CO)OC(C(=O)...   
1        CHEMBL73669  CCCN(CCc1ccccc1)C(

  0%|          | 0/202 [00:00<?, ?it/s]

MODI (maccs) for IBV_Neuraminidase = 0.7178217821782178


  0%|          | 0/202 [00:00<?, ?it/s]

MODI (morgan_chiral2) for IBV_Neuraminidase = 0.7079207920792079
Median pIC50 for IBV_Neuraminidase = 5.468709007999999
pIC50 description for IBV_Neuraminidase:
                       count      mean       std      min       25%       50%  \
final_classification                                                           
Active                132.0  6.635317  1.265995  5.00000  5.494850  6.437748   
Inactive               70.0  4.010075  0.572335  2.09691  3.589285  4.162868   

                           75%       max  
final_classification                      
Active                7.526560  9.853872  
Inactive              4.472039  4.698970  


Target: SARS-CoV-2_PLP
Shape: (107, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0       CHEMBL549695           Cc1ccc(N)cc1C(=O)N[C@H](C)c1cccc2ccccc12   
1      CHEMBL5191209  C[C@

  0%|          | 0/107 [00:00<?, ?it/s]

MODI (maccs) for SARS-CoV-2_PLP = 0.8691588785046729


  0%|          | 0/107 [00:00<?, ?it/s]

MODI (morgan_chiral2) for SARS-CoV-2_PLP = 0.8691588785046729
Median pIC50 for SARS-CoV-2_PLP = 5.983802646
pIC50 description for SARS-CoV-2_PLP:
                       count      mean       std      min       25%       50%  \
final_classification                                                           
Active                 94.0  5.976425  0.547423  5.00000  5.628744  6.073333   
Inactive               13.0  4.259696  0.289072  3.69897  4.000000  4.251812   

                           75%       max  
final_classification                      
Active                6.237697  7.200659  
Inactive              4.376751  4.698970  


Target: HRV_Capsid protein
Shape: (18, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0        CHEMBL29609     Cc1cc(CCCOc2c(C)cc(-c3noc(C(F)(F)F)n3)cc2C)on1   
1      CHEMBL4213214  Cn1cc(-c2ccc(C[C

  0%|          | 0/18 [00:00<?, ?it/s]

MODI (maccs) for HRV_Capsid protein = 0.8333333333333334


  0%|          | 0/18 [00:00<?, ?it/s]

MODI (morgan_chiral2) for HRV_Capsid protein = 0.8888888888888888
Median pIC50 for HRV_Capsid protein = 6.265088992
pIC50 description for HRV_Capsid protein:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                 15.0  6.355052  0.776260  5.060481  5.960409  6.397940   
Inactive                3.0  4.346172  1.279758  3.607303  3.607303  3.607303   

                           75%       max  
final_classification                      
Active                6.650515  8.397940  
Inactive              4.715606  5.823909  


Target: IAV_Polymerase (PA)
Shape: (256, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0       CHEMBL463590  O=C(O)/C(O)=C/C(=O)C1(Cc2ccc(Cl)cc2)CCN(Cc2ccc...   
1      CHEMBL2040554

  0%|          | 0/256 [00:00<?, ?it/s]

MODI (maccs) for IAV_Polymerase (PA) = 0.8359375


  0%|          | 0/256 [00:00<?, ?it/s]

MODI (morgan_chiral2) for IAV_Polymerase (PA) = 0.875
Median pIC50 for IAV_Polymerase (PA) = 5.400000186
pIC50 description for IAV_Polymerase (PA):
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                151.0  6.390499  1.080504  5.013228  5.576421  6.099999   
Inactive              105.0  3.813192  0.449817  3.092051  3.301030  3.698970   

                           75%       max  
final_classification                      
Active                6.910414  9.899974  
Inactive              4.187087  4.692504  


Target: IAV_RdRp
Shape: (143, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL3110332  CSCC[C@H](N)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@H](C...   
1      CHEMBL3792444  Cc1cccc(NC(=O)CSc2n

  0%|          | 0/143 [00:00<?, ?it/s]

MODI (maccs) for IAV_RdRp = 0.8951048951048951


  0%|          | 0/143 [00:00<?, ?it/s]

MODI (morgan_chiral2) for IAV_RdRp = 0.9300699300699301
Median pIC50 for IAV_RdRp = 3.928117993
pIC50 description for IAV_RdRp:
                       count      mean       std      min       25%       50%  \
final_classification                                                           
Active                 13.0  6.704721  1.326455  5.00000  5.522879  6.638272   
Inactive              130.0  4.028947  0.450292  3.69897  3.698970  3.855429   

                           75%       max  
final_classification                      
Active                7.779892  8.744727  
Inactive              4.318759  7.363312  


Target: HRV_Protease
Shape: (389, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0       CHEMBL305321  CCOC(=O)/C=C/[C@H](CCC(N)=O)NC(=O)C(Cc1ccccc1)...   
1        CHEMBL95007  CC(=O)NC[C@@H](C=O)NC(=O)[C@H](Cc1ccccc

  0%|          | 0/389 [00:00<?, ?it/s]

MODI (maccs) for HRV_Protease = 0.8251928020565553


  0%|          | 0/389 [00:00<?, ?it/s]

MODI (morgan_chiral2) for HRV_Protease = 0.8483290488431876
Median pIC50 for HRV_Protease = 5.958607315
pIC50 description for HRV_Protease:
                       count      mean       std      min       25%       50%  \
final_classification                                                           
Active                298.0  6.501334  1.038265  5.00000  5.714460  6.301030   
Inactive               91.0  4.083592  0.448703  2.60206  3.823909  4.136677   

                           75%       max  
final_classification                      
Active                7.242237  9.920819  
Inactive              4.414869  4.698970  


Target: SARS-CoV_Mpro
Shape: (197, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                        SMILES     pIC50  \
0        CHEMBL45830        CC(C)C1=Cc2ccc3c(c2C(=O)C1=O)CCCC3(C)C  4.675718   
1       CHEMBL358279  NC(=O)c1ccc2c(c1

  0%|          | 0/197 [00:00<?, ?it/s]

MODI (maccs) for SARS-CoV_Mpro = 0.6649746192893401


  0%|          | 0/197 [00:00<?, ?it/s]

MODI (morgan_chiral2) for SARS-CoV_Mpro = 0.766497461928934
Median pIC50 for SARS-CoV_Mpro = 4.519993057
pIC50 description for SARS-CoV_Mpro:
                       count      mean       std  min       25%       50%  \
final_classification                                                       
Active                 77.0  5.698559  0.680578  5.0  5.200659  5.408935   
Inactive              120.0  4.109571  0.421155  3.0  3.768307  4.221849   

                           75%      max  
final_classification                     
Active                6.022276  7.30103  
Inactive              4.413194  4.69897  


Target: HRSV_Fusion glycoprotein F0
Shape: (249, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4174690   Cc1ccc2nc(N3CCCc4ccc(F)cc4C3)cc(NCC3(N)COC3)c2c1   
1      CHEMBL4468815  Cc1ccc2nc(N3CCS(=O)(=O)c4ccccc

  0%|          | 0/249 [00:00<?, ?it/s]

MODI (maccs) for HRSV_Fusion glycoprotein F0 = 0.9558232931726908


  0%|          | 0/249 [00:00<?, ?it/s]

MODI (morgan_chiral2) for HRSV_Fusion glycoprotein F0 = 0.963855421686747
Median pIC50 for HRSV_Fusion glycoprotein F0 = 8.0
pIC50 description for HRSV_Fusion glycoprotein F0:
                       count     mean       std  min       25%  50%       75%  \
final_classification                                                           
Active                240.0  7.87151  1.129244  5.0  7.049517  8.0  8.799971   
Inactive                9.0  4.23841  0.299208  4.0  4.000000  4.0  4.585027   

                            max  
final_classification             
Active                10.000000  
Inactive               4.657577  


Target: SARS-CoV-2_RdRp
Shape: (46, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4065616  CCC(CC)COC(=O)[C@H](C)N[P@](=O)(OC[C@H]1O[C@@]...   
1      CHEMBL4857861       CCOc1ccccc1CN(OC)C(=

  0%|          | 0/46 [00:00<?, ?it/s]

MODI (maccs) for SARS-CoV-2_RdRp = 0.8695652173913043


  0%|          | 0/46 [00:00<?, ?it/s]

MODI (morgan_chiral2) for SARS-CoV-2_RdRp = 0.7391304347826086
Median pIC50 for SARS-CoV-2_RdRp = 5.1640018985000005
pIC50 description for SARS-CoV-2_RdRp:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                 41.0  5.301219  0.310569  5.007889  5.116907  5.193820   
Inactive                5.0  4.432928  0.268874  4.000000  4.355561  4.559091   

                           75%       max  
final_classification                      
Active                5.366532  6.585027  
Inactive              4.569925  4.680062  


Target: IAV_M2 proton channel
Shape: (92, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                          SMILES     pIC50  \
0      CHEMBL3392935   C[C@H](N)C12CC3CC(CC(C3)C1)C2  6.468521   
1      CHEMBL1235691  C[C@@H](N)C12CC3C

  0%|          | 0/92 [00:00<?, ?it/s]

MODI (maccs) for IAV_M2 proton channel = 0.8152173913043478


  0%|          | 0/92 [00:00<?, ?it/s]

MODI (morgan_chiral2) for IAV_M2 proton channel = 0.8260869565217391
Median pIC50 for IAV_M2 proton channel = 5.451684011999999
pIC50 description for IAV_M2 proton channel:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                 68.0  5.772478  0.561605  5.004365  5.327751  5.676206   
Inactive               24.0  4.334715  0.317038  3.496754  4.071092  4.464206   

                           75%       max  
final_classification                      
Active                6.128613  8.096910  
Inactive              4.572031  4.688246  


Target: SARS-CoV-2_Spike glycoprotein
Shape: (44, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL5088895  CC[C@H](C)[C@H](NC(=O)[C@H](Cc1ccccc1)NC(=O)CN...

  0%|          | 0/44 [00:00<?, ?it/s]

MODI (maccs) for SARS-CoV-2_Spike glycoprotein = 0.8636363636363636


  0%|          | 0/44 [00:00<?, ?it/s]

MODI (morgan_chiral2) for SARS-CoV-2_Spike glycoprotein = 0.8181818181818182
Median pIC50 for SARS-CoV-2_Spike glycoprotein = 5.4020474075
pIC50 description for SARS-CoV-2_Spike glycoprotein:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                 28.0  6.334940  0.891754  5.137272  5.469571  6.244326   
Inactive               16.0  4.279078  0.290061  3.698970  4.065382  4.189768   

                          75%       max  
final_classification                     
Active                6.99355  7.886057  
Inactive              4.60206  4.602060  


Target: SARS-CoV-2_MTase (NSP14)
Shape: (39, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL1214186  Nc1ncnc2c1ncn2[C@@H]1O[C@H](C[C@@H](N)C

  0%|          | 0/39 [00:00<?, ?it/s]

MODI (maccs) for SARS-CoV-2_MTase (NSP14) = 0.9743589743589743


  0%|          | 0/39 [00:00<?, ?it/s]

MODI (morgan_chiral2) for SARS-CoV-2_MTase (NSP14) = 0.9743589743589743
Median pIC50 for SARS-CoV-2_MTase (NSP14) = 6.356547324
pIC50 description for SARS-CoV-2_MTase (NSP14):
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                 38.0  6.400063  0.759991  5.017729  5.721804  6.409994   
Inactive                1.0  4.000000       NaN  4.000000  4.000000  4.000000   

                           75%       max  
final_classification                      
Active                7.080562  7.721246  
Inactive              4.000000  4.000000  


Target: IAV_Hemagglutinin
Shape: (25, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4639026  CC(=O)c1cc(C)c(O)c(Cc2[nH]c3ccccc3c2Cc2c(O)c(C...   
1    

  0%|          | 0/25 [00:00<?, ?it/s]

MODI (maccs) for IAV_Hemagglutinin = 0.8


  0%|          | 0/25 [00:00<?, ?it/s]

MODI (morgan_chiral2) for IAV_Hemagglutinin = 0.72
Median pIC50 for IAV_Hemagglutinin = 5.580044252
pIC50 description for IAV_Hemagglutinin:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                 19.0  7.044793  1.905426  5.107349  5.464686  6.350665   
Inactive                6.0  3.900801  0.752809  2.522879  3.703738  4.170660   

                           75%        max  
final_classification                       
Active                8.033256  11.177178  
Inactive              4.415910   4.494850  


Target: IAV_Polymerase (PB2)
Shape: (79, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL3318007  O=C(O)[C@H]1C2CCC(CC2)[C@@H]1Nc1nc(-c2c[nH]c3n...   
1      CHEMBL4524775  O=C(O)[C@H]

  0%|          | 0/79 [00:00<?, ?it/s]

MODI (maccs) for IAV_Polymerase (PB2) = 1.0


  0%|          | 0/79 [00:00<?, ?it/s]

MODI (morgan_chiral2) for IAV_Polymerase (PB2) = 1.0
Median pIC50 for IAV_Polymerase (PB2) = 7.619788758
pIC50 description for IAV_Polymerase (PB2):
                       count      mean      std      min       25%       50%  \
final_classification                                                          
Active                 79.0  7.545199  1.13291  5.19382  6.678274  7.619789   

                           75%        max  
final_classification                       
Active                8.522879  10.221849  


Target: SARS-CoV-2_Helicase (NSP13)
Shape: (3, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL1595621               C=CCn1c(S)nnc1CSc1ccccc1[N+](=O)[O-]   
1      CHEMBL5173495  CCCOc1ccc(-c2cc(OCCN3CCN(Cc4ccccc4)CC3)c3c(OC)...   

      pIC50 final_classification                unique_target  
0  7.33724

  0%|          | 0/3 [00:00<?, ?it/s]

MODI (maccs) for SARS-CoV-2_Helicase (NSP13) = 0.6666666666666666


  0%|          | 0/3 [00:00<?, ?it/s]

MODI (morgan_chiral2) for SARS-CoV-2_Helicase (NSP13) = 0.6666666666666666
Median pIC50 for SARS-CoV-2_Helicase (NSP13) = 4.522878745
pIC50 description for SARS-CoV-2_Helicase (NSP13):
                       count      mean  std       min       25%       50%  \
final_classification                                                       
Active                  1.0  7.337242  NaN  7.337242  7.337242  7.337242   
Inactive                2.0  4.522879  0.0  4.522879  4.522879  4.522879   

                           75%       max  
final_classification                      
Active                7.337242  7.337242  
Inactive              4.522879  4.522879  


Target: SARS-CoV_Helicase (NSP13)
Shape: (9, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                   SMILES     pIC50  \
0       CHEMBL492768  O=C(O)/C(O)=C/C(=O)c1ccc(OCc2ccccc2)cc1  4.399027   
1       C

  0%|          | 0/9 [00:00<?, ?it/s]

MODI (maccs) for SARS-CoV_Helicase (NSP13) = 1.0


  0%|          | 0/9 [00:00<?, ?it/s]

MODI (morgan_chiral2) for SARS-CoV_Helicase (NSP13) = 1.0
Median pIC50 for SARS-CoV_Helicase (NSP13) = 4.301029996
pIC50 description for SARS-CoV_Helicase (NSP13):
                       count      mean       std      min      25%      50%  \
final_classification                                                         
Inactive                9.0  4.338706  0.082879  4.30103  4.30103  4.30103   

                          75%       max  
final_classification                     
Inactive              4.30103  4.542118  


Target: MERS-CoV_PLP
Shape: (10, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                          SMILES  \
0       CHEMBL222021                               O=Cc1ccc(O)c(O)c1   
1      CHEMBL2030452  Cc1ccc(C)c(CSc2nc(C3CCCCC3)c(C#N)c(=O)[nH]2)c1   

      pIC50 final_classification unique_target  
0  3.657577             Inactive  MERS-CoV

  0%|          | 0/10 [00:00<?, ?it/s]

MODI (maccs) for MERS-CoV_PLP = 0.8


  0%|          | 0/10 [00:00<?, ?it/s]

MODI (morgan_chiral2) for MERS-CoV_PLP = 0.5
Median pIC50 for MERS-CoV_PLP = 4.632136693
pIC50 description for MERS-CoV_PLP:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                  5.0  5.119186  0.000000  5.119186  5.119186  5.119186   
Inactive                5.0  3.871071  0.174778  3.657577  3.838632  3.850781   

                           75%       max  
final_classification                      
Active                5.119186  5.119186  
Inactive              3.863279  4.145087  


Target: SARS-CoV_Spike glycoprotein
Shape: (15, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4873011  C[C@@H]1O[C@@H](O[C@H]2[C@H](O[C@H]3CC[C@]4(C)...   
1      CHEMBL4852791  C[C@@H]1O[C@@H](O[C@H]2[

  0%|          | 0/15 [00:00<?, ?it/s]

MODI (maccs) for SARS-CoV_Spike glycoprotein = 0.6666666666666666


  0%|          | 0/15 [00:00<?, ?it/s]

MODI (morgan_chiral2) for SARS-CoV_Spike glycoprotein = 0.8666666666666667
Median pIC50 for SARS-CoV_Spike glycoprotein = 5.199282922
pIC50 description for SARS-CoV_Spike glycoprotein:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                 13.0  5.378298  0.552422  5.001305  5.110138  5.202732   
Inactive                2.0  4.698970  0.000000  4.698970  4.698970  4.698970   

                          75%      max  
final_classification                    
Active                5.25649  7.00000  
Inactive              4.69897  4.69897  


Target: HPIV-1_Hemagglutinin-neuraminidase
Shape: (36, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0       CHEMBL213310  CC(=O)N[C@H]1[C@H]([C@H](O)[C@H](O)CO)OC

  0%|          | 0/36 [00:00<?, ?it/s]

MODI (maccs) for HPIV-1_Hemagglutinin-neuraminidase = 1.0


  0%|          | 0/36 [00:00<?, ?it/s]

MODI (morgan_chiral2) for HPIV-1_Hemagglutinin-neuraminidase = 1.0
Median pIC50 for HPIV-1_Hemagglutinin-neuraminidase = 3.823908741
pIC50 description for HPIV-1_Hemagglutinin-neuraminidase:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                  2.0  5.536538  0.435756  5.228413  5.382475  5.536538   
Inactive               34.0  3.478123  0.715203  1.795880  2.823909  3.823909   

                           75%       max  
final_classification                      
Active                5.690601  5.844664  
Inactive              3.823909  4.410050  


Target: NiV_gpG
Shape: (7, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0       CHEMBL214087                          Nc1ccc2oc(Cc3ccccc3)nc2c1   
1

  0%|          | 0/7 [00:00<?, ?it/s]

MODI (maccs) for NiV_gpG = 0.8571428571428571


  0%|          | 0/7 [00:00<?, ?it/s]

MODI (morgan_chiral2) for NiV_gpG = 0.8571428571428571
Median pIC50 for NiV_gpG = 5.397940009
pIC50 description for NiV_gpG:
                       count      mean       std      min       25%       50%  \
final_classification                                                           
Active                  6.0  5.410238  0.280357  5.09691  5.172168  5.460409   
Inactive                1.0  4.698970       NaN  4.69897  4.698970  4.698970   

                           75%       max  
final_classification                      
Active                5.522879  5.823909  
Inactive              4.698970  4.698970  


Target: HRV_Mpro
Shape: (21, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                            SMILES     pIC50  \
0       CHEMBL222234         O=C(Oc1cncc(Br)c1)c1ccco1  7.096910   
1       CHEMBL466351  O=C(Oc1cncc(Cl)c1)c1ccc2ccccc2c1  6.148742   

  final_classi

  0%|          | 0/21 [00:00<?, ?it/s]

MODI (maccs) for HRV_Mpro = 1.0


  0%|          | 0/21 [00:00<?, ?it/s]

MODI (morgan_chiral2) for HRV_Mpro = 1.0
Median pIC50 for HRV_Mpro = 7.096910013
pIC50 description for HRV_Mpro:
                       count      mean       std       min       25%      50%  \
final_classification                                                           
Active                 21.0  7.021462  1.247843  5.075721  6.123205  7.09691   

                           75%      max  
final_classification                     
Active                8.154902  8.69897  


Target: HCoV-229E_Mpro
Shape: (11, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4802135  CC(C)(C)[C@H](NC(=O)C(F)(F)F)C(=O)N1C[C@H]2[C@...   
1      CHEMBL5208243  COC(=O)[C@H](C[C@@H]1CCNC1=O)NC(=O)[C@@H]1[C@@...   

      pIC50 final_classification   unique_target  
0  6.838632               Active  HCoV-229E_Mpro  
1  4.000000            

  0%|          | 0/11 [00:00<?, ?it/s]

MODI (maccs) for HCoV-229E_Mpro = 0.45454545454545453


  0%|          | 0/11 [00:00<?, ?it/s]

MODI (morgan_chiral2) for HCoV-229E_Mpro = 0.45454545454545453
Median pIC50 for HCoV-229E_Mpro = 6.537602002
pIC50 description for HCoV-229E_Mpro:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                  7.0  7.023623  0.615038  6.330683  6.579602  6.838632   
Inactive                4.0  4.000000  0.000000  4.000000  4.000000  4.000000   

                           75%       max  
final_classification                      
Active                7.475391  7.886057  
Inactive              4.000000  4.000000  


Target: HRSV_M2 proton channel
Shape: (1, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4091425  Cc1cnc(N2CC3(CCOCC3)C2)c(C(=O)Nc2ccc(C(=O)N3CC...   

     pIC50 final_classificati

  0%|          | 0/1 [00:00<?, ?it/s]

Due to data sample, couldn't calculate MODI with maccs for HRSV_M2 proton channel


  0%|          | 0/1 [00:00<?, ?it/s]

Due to data sample, couldn't calculate MODI with morgan_chiral2 for HRSV_M2 proton channel
Median pIC50 for HRSV_M2 proton channel = 8.698970004
pIC50 description for HRSV_M2 proton channel:
                       count     mean  std      min      25%      50%      75%  \
final_classification                                                            
Active                  1.0  8.69897  NaN  8.69897  8.69897  8.69897  8.69897   

                          max  
final_classification           
Active                8.69897  


Target: HRSV_Protein P
Shape: (1, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4091425  Cc1cnc(N2CC3(CCOCC3)C2)c(C(=O)Nc2ccc(C(=O)N3CC...   

     pIC50 final_classification   unique_target  
0  8.69897               Active  HRSV_Protein P  


  0%|          | 0/1 [00:00<?, ?it/s]

Due to data sample, couldn't calculate MODI with maccs for HRSV_Protein P


  0%|          | 0/1 [00:00<?, ?it/s]

Due to data sample, couldn't calculate MODI with morgan_chiral2 for HRSV_Protein P
Median pIC50 for HRSV_Protein P = 8.698970004
pIC50 description for HRSV_Protein P:
                       count     mean  std      min      25%      50%      75%  \
final_classification                                                            
Active                  1.0  8.69897  NaN  8.69897  8.69897  8.69897  8.69897   

                          max  
final_classification           
Active                8.69897  


Target: HRSV_RdRp
Shape: (1, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4091425  Cc1cnc(N2CC3(CCOCC3)C2)c(C(=O)Nc2ccc(C(=O)N3CC...   

     pIC50 final_classification unique_target  
0  8.69897               Active     HRSV_RdRp  


  0%|          | 0/1 [00:00<?, ?it/s]

Due to data sample, couldn't calculate MODI with maccs for HRSV_RdRp


  0%|          | 0/1 [00:00<?, ?it/s]

Due to data sample, couldn't calculate MODI with morgan_chiral2 for HRSV_RdRp
Median pIC50 for HRSV_RdRp = 8.698970004
pIC50 description for HRSV_RdRp:
                       count     mean  std      min      25%      50%      75%  \
final_classification                                                            
Active                  1.0  8.69897  NaN  8.69897  8.69897  8.69897  8.69897   

                          max  
final_classification           
Active                8.69897  


Target: HEV-71_Mpro
Shape: (26, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL3742031  Cc1cc(C(=O)N[C@@H](Cc2ccc(F)cc2)C(=O)N[C@@H](C...   
1      CHEMBL4282584  CCCCNC(=O)O[C@@H](C#N)[C@H](C[C@@H]1CCCNC1=O)N...   

      pIC50 final_classification unique_target  
0  8.397940               Active   HEV-71_Mpro  
1  6.346787       

  0%|          | 0/26 [00:00<?, ?it/s]

MODI (maccs) for HEV-71_Mpro = 1.0


  0%|          | 0/26 [00:00<?, ?it/s]

MODI (morgan_chiral2) for HEV-71_Mpro = 1.0
Median pIC50 for HEV-71_Mpro = 6.699513552
pIC50 description for HEV-71_Mpro:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                 26.0  6.618389  0.794128  5.206908  5.982546  6.699514   

                          75%      max  
final_classification                    
Active                7.18255  8.39794  


Target: HEV-71_Capsid protein
Shape: (22, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4202574  O=C1N(CCCCc2cc3cc(-c4ccc(Cl)cc4)ccc3o2)CCN1c1c...   
1      CHEMBL4203068  Nc1cc(N2CCN(CCCCCOc3ccc(-c4ccc(Cl)cc4)cc3)S2(=...   

      pIC50 final_classification          unique_target  
0  5.958607               Active  HEV-71_Capsid pro

  0%|          | 0/22 [00:00<?, ?it/s]

MODI (maccs) for HEV-71_Capsid protein = 0.9545454545454546


  0%|          | 0/22 [00:00<?, ?it/s]

MODI (morgan_chiral2) for HEV-71_Capsid protein = 0.9545454545454546
Median pIC50 for HEV-71_Capsid protein = 6.2620482375
pIC50 description for HEV-71_Capsid protein:
                       count      mean       std       min  25%       50%  \
final_classification                                                       
Active                 20.0  6.852538  1.252253  5.552842  6.0  6.360532   
Inactive                2.0  4.000000  0.000000  4.000000  4.0  4.000000   

                           75%        max  
final_classification                       
Active                7.110409  10.522879  
Inactive              4.000000   4.000000  


Target: FCoV_Mpro
Shape: (12, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4202812  CC(C)C[C@H](NC(=O)OC1(Cc2ccccc2)CCN(C(=O)OC(C)...   
1      CHEMBL4203883  CC(C)C[C@H](NC(

  0%|          | 0/12 [00:00<?, ?it/s]

MODI (maccs) for FCoV_Mpro = 1.0


  0%|          | 0/12 [00:00<?, ?it/s]

MODI (morgan_chiral2) for FCoV_Mpro = 1.0
Median pIC50 for FCoV_Mpro = 5.736830361
pIC50 description for FCoV_Mpro:
                       count      mean       std       min       25%      50%  \
final_classification                                                           
Active                 12.0  5.718192  0.305949  5.173925  5.578825  5.73683   

                           75%      max  
final_classification                     
Active                5.958607  6.09691  


Target: MERS-CoV_Mpro
Shape: (12, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL4202812  CC(C)C[C@H](NC(=O)OC1(Cc2ccccc2)CCN(C(=O)OC(C)...   
1      CHEMBL4203883  CC(C)C[C@H](NC(=O)OC1(Cc2ccccc2)CCN(C(=O)OC(C)...   

      pIC50 final_classification  unique_target  
0  6.096910               Active  MERS-CoV_Mpro  
1  6.154902            

  0%|          | 0/12 [00:00<?, ?it/s]

MODI (maccs) for MERS-CoV_Mpro = 1.0


  0%|          | 0/12 [00:00<?, ?it/s]

MODI (morgan_chiral2) for MERS-CoV_Mpro = 1.0
Median pIC50 for MERS-CoV_Mpro = 6.15490196
pIC50 description for MERS-CoV_Mpro:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Active                 12.0  6.025958  0.410596  5.124939  6.084122  6.154902   

                           75%      max  
final_classification                     
Active                6.221849  6.39794  


Target: HCoV-NL63_PLP
Shape: (5, 5)
Columns: Index(['molecule_chembl_id', 'SMILES', 'pIC50', 'final_classification',
       'unique_target'],
      dtype='object')
  molecule_chembl_id                                             SMILES  \
0      CHEMBL3233813  C[C@H](c1cccc2ccccc12)N1CCC(C(=O)NCc2ccc(F)c(F...   
1      CHEMBL3233815  C[C@H](c1cccc2ccccc12)N1CCC(C(=O)NCc2cccc(F)c2...   

      pIC50 final_classification  unique_target  
0  4.356547             Inactive  HCoV-NL63_PLP  
1  4.48148

  0%|          | 0/5 [00:00<?, ?it/s]

MODI (maccs) for HCoV-NL63_PLP = 1.0


  0%|          | 0/5 [00:00<?, ?it/s]

MODI (morgan_chiral2) for HCoV-NL63_PLP = 1.0
Median pIC50 for HCoV-NL63_PLP = 4.356547324
pIC50 description for HCoV-NL63_PLP:
                       count      mean       std       min       25%       50%  \
final_classification                                                            
Inactive                5.0  4.367244  0.096571  4.229148  4.337242  4.356547   

                           75%       max  
final_classification                      
Inactive              4.431798  4.481486  




In [None]:
# List of targets to filter
targets = ['IAV_Neuraminidase', 'SARS-CoV-2_Mpro', 'IBV_Neuraminidase',
           'SARS-CoV-2_PLP', 'HRV_Capsid protein', 'IAV_Polymerase (PA)',
           'IAV_RdRp', 'HRV_Protease', 'SARS-CoV_Mpro',
           'HRSV_Fusion glycoprotein F0', 'SARS-CoV-2_RdRp',
           'IAV_M2 proton channel', 'SARS-CoV-2_Spike glycoprotein',
           'SARS-CoV-2_MTase (NSP14)', 'IAV_Hemagglutinin',
           'IAV_Polymerase (PB2)', 'SARS-CoV-2_Helicase (NSP13)',
           'SARS-CoV_Helicase (NSP13)', 'MERS-CoV_PLP',
           'SARS-CoV_Spike glycoprotein', 'HPIV-1_Hemagglutinin-neuraminidase',
           'NiV_gpG', 'HRV_Mpro', 'HCoV-229E_Mpro', 'HRSV_M2 proton channel',
           'HRSV_Protein P', 'HRSV_RdRp', 'HEV-71_Mpro', 'HEV-71_Capsid protein',
           'FCoV_Mpro', 'MERS-CoV_Mpro', 'HCoV-NL63_PLP']

# Initialize an empty list to store the results
results = []

# Loop through each target and perform the operations
for target in targets:
    # Filter the dataframe for the current target
    target_df = df[df["unique_target"] == target].reset_index(drop=True)

    # Initialize dictionary to store results for the current target
    target_results = {
        "unique_target": target,
        "count": target_df.shape[0],
        "pIC50 median": target_df["pIC50"].median(),
        "Active": target_df[target_df['final_classification'] == 'Active'].shape[0],
        "Inactive": target_df[target_df['final_classification'] == 'Inactive'].shape[0]
    }

    # Calculate MODI with different descriptors, handle errors for small sample sizes
    try:
        target_results["MODI MACCS"] = modi(target_df, 'final_classification', descriptor="maccs")
    except IndexError:
        target_results["MODI MACCS"] = "N/A"

    try:
        target_results["MODI Morgan Chiral2"] = modi(target_df, 'final_classification', descriptor="morgan_chiral2")
    except IndexError:
        target_results["MODI Morgan Chiral2"] = "N/A"

    # Append the results for this target to the list
    results.append(target_results)

# Convert the results list to a DataFrame
results_df = pd.DataFrame(results)

# Display the resulting DataFrame
print(results_df)


  0%|          | 0/1123 [00:00<?, ?it/s]

  0%|          | 0/1123 [00:00<?, ?it/s]

  0%|          | 0/815 [00:00<?, ?it/s]

  0%|          | 0/815 [00:00<?, ?it/s]

  0%|          | 0/202 [00:00<?, ?it/s]

  0%|          | 0/202 [00:00<?, ?it/s]

  0%|          | 0/107 [00:00<?, ?it/s]

  0%|          | 0/107 [00:00<?, ?it/s]

  0%|          | 0/18 [00:00<?, ?it/s]

  0%|          | 0/18 [00:00<?, ?it/s]

  0%|          | 0/256 [00:00<?, ?it/s]

  0%|          | 0/256 [00:00<?, ?it/s]

  0%|          | 0/143 [00:00<?, ?it/s]

  0%|          | 0/143 [00:00<?, ?it/s]

  0%|          | 0/389 [00:00<?, ?it/s]

  0%|          | 0/389 [00:00<?, ?it/s]

  0%|          | 0/197 [00:00<?, ?it/s]

  0%|          | 0/197 [00:00<?, ?it/s]

  0%|          | 0/249 [00:00<?, ?it/s]

  0%|          | 0/249 [00:00<?, ?it/s]

  0%|          | 0/46 [00:00<?, ?it/s]

  0%|          | 0/46 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/92 [00:00<?, ?it/s]

  0%|          | 0/44 [00:00<?, ?it/s]

  0%|          | 0/44 [00:00<?, ?it/s]

  0%|          | 0/39 [00:00<?, ?it/s]

  0%|          | 0/39 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

  0%|          | 0/79 [00:00<?, ?it/s]

  0%|          | 0/79 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/3 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/10 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/15 [00:00<?, ?it/s]

  0%|          | 0/36 [00:00<?, ?it/s]

  0%|          | 0/36 [00:00<?, ?it/s]

  0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/21 [00:00<?, ?it/s]

  0%|          | 0/21 [00:00<?, ?it/s]

  0%|          | 0/11 [00:00<?, ?it/s]

  0%|          | 0/11 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/26 [00:00<?, ?it/s]

  0%|          | 0/26 [00:00<?, ?it/s]

  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/22 [00:00<?, ?it/s]

  0%|          | 0/12 [00:00<?, ?it/s]

  0%|          | 0/12 [00:00<?, ?it/s]

  0%|          | 0/12 [00:00<?, ?it/s]

  0%|          | 0/12 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

  0%|          | 0/5 [00:00<?, ?it/s]

                         unique_target  count  pIC50 median  Active  Inactive  \
0                    IAV_Neuraminidase   1123      5.721246     733       390   
1                      SARS-CoV-2_Mpro    815      6.346787     651       164   
2                    IBV_Neuraminidase    202      5.468709     132        70   
3                       SARS-CoV-2_PLP    107      5.983803      94        13   
4                   HRV_Capsid protein     18      6.265089      15         3   
5                  IAV_Polymerase (PA)    256      5.400000     151       105   
6                             IAV_RdRp    143      3.928118      13       130   
7                         HRV_Protease    389      5.958607     298        91   
8                        SARS-CoV_Mpro    197      4.519993      77       120   
9          HRSV_Fusion glycoprotein F0    249      8.000000     240         9   
10                     SARS-CoV-2_RdRp     46      5.164002      41         5   
11               IAV_M2 prot

In [None]:
# save results as csv and xlxs
results_df.to_csv("MODI_results_14_11_24.csv")
results_df.to_excel("MODI_results_14_11_24.xlsx")