**PROJECT:** Integration of machine learning, QSAR, and polypharmacology for multitarget drug discovery in neuropsychiatric disorders: Prediction of serotonergic and dopaminergic receptor inhibitors

MSc. Caroline Mensor Folchini (UFPR)

***Code by Alexandre de F. Cobre*** [Github](https://github.com/AlexandreCOBRE/code)

#**Calculating molecule fingerprint descriptors**

In [None]:
# Tasks to be performed
## Step 1:Install the Padelpy library
## Step 2: Prepare the fingerprint.xml file
## Step 3: Import the treated dataset
## Step 4: Prepare the data subset for input into padelpy
## Step 5: Calculate fingerprint descriptors
## Step 6: View the calculated descriptors
## Step 7: Save the dataset

### **Step 1:Install the Padelpy library**

In [None]:
! pip install padelpy

Collecting padelpy
  Downloading padelpy-0.1.16-py3-none-any.whl.metadata (7.7 kB)
Downloading padelpy-0.1.16-py3-none-any.whl (20.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.9/20.9 MB[0m [31m35.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: padelpy
Successfully installed padelpy-0.1.16


###**Step 2: Prepare the fingerprint.xml file**

In [None]:
! wget https://github.com/dataprofessor/padel/raw/main/fingerprints_xml.zip
! unzip fingerprints_xml.zip

--2025-01-19 21:28:04--  https://github.com/dataprofessor/padel/raw/main/fingerprints_xml.zip
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/dataprofessor/padel/main/fingerprints_xml.zip [following]
--2025-01-19 21:28:04--  https://raw.githubusercontent.com/dataprofessor/padel/main/fingerprints_xml.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10871 (11K) [application/zip]
Saving to: ‘fingerprints_xml.zip’


2025-01-19 21:28:04 (72.8 MB/s) - ‘fingerprints_xml.zip’ saved [10871/10871]

Archive:  fingerprints_xml.zip
  inflating: AtomPairs2DFingerprintCount.xml  
  inflating: AtomPairs2DFin

### **2.1. Create a list and organize xml files**

In [None]:
import glob
arquivos_xml = glob.glob("*.xml")
arquivos_xml.sort()
arquivos_xml

['AtomPairs2DFingerprintCount.xml',
 'AtomPairs2DFingerprinter.xml',
 'EStateFingerprinter.xml',
 'ExtendedFingerprinter.xml',
 'Fingerprinter.xml',
 'GraphOnlyFingerprinter.xml',
 'KlekotaRothFingerprintCount.xml',
 'KlekotaRothFingerprinter.xml',
 'MACCSFingerprinter.xml',
 'PubchemFingerprinter.xml',
 'SubstructureFingerprintCount.xml',
 'SubstructureFingerprinter.xml']

In [None]:
lista_FP = ['AtomPairs2DCount',
 'AtomPairs2D',
 'EState',
 'CDKextended',
 'CDK',
 'CDKgraphonly',
 'KlekotaRothCount',
 'KlekotaRoth',
 'MACCS',
 'PubChem',
 'SubstructureCount',
 'Substructure']

###**Creating a dictionary**

In [None]:

fp = dict(zip(lista_FP, arquivos_xml))
fp

{'AtomPairs2DCount': 'AtomPairs2DFingerprintCount.xml',
 'AtomPairs2D': 'AtomPairs2DFingerprinter.xml',
 'EState': 'EStateFingerprinter.xml',
 'CDKextended': 'ExtendedFingerprinter.xml',
 'CDK': 'Fingerprinter.xml',
 'CDKgraphonly': 'GraphOnlyFingerprinter.xml',
 'KlekotaRothCount': 'KlekotaRothFingerprintCount.xml',
 'KlekotaRoth': 'KlekotaRothFingerprinter.xml',
 'MACCS': 'MACCSFingerprinter.xml',
 'PubChem': 'PubchemFingerprinter.xml',
 'SubstructureCount': 'SubstructureFingerprintCount.xml',
 'Substructure': 'SubstructureFingerprinter.xml'}

In [None]:
fp['PubChem']

'PubchemFingerprinter.xml'

###**## Step 3: Import the treated dataset**

In [None]:
from google.colab import files
ploaded = files.upload()

Saving DA_5HT_dataset_3classes.csv to DA_5HT_dataset_3classes.csv


In [None]:
import pandas as pd
df = pd.read_csv("DA_5HT_dataset_3classes.csv")
df

Unnamed: 0.1,Unnamed: 0,molecule_chembl_id,canonical_smiles,bioactivity_class,MW,LogP,NumHDonors,NumHAcceptors,pIC50
0,0,CHEMBL303519,c1cnc(N2CCN(Cc3cccc4c3Cc3ccccc3-4)CC2)nc1,Intermediate,342.446,3.37000,0.0,4.0,5.008774
1,1,CHEMBL292943,COc1ccc(-c2cccc(CN3CCN(c4ncccn4)CC3)c2)cc1,Active,360.461,3.47440,0.0,5.0,7.301030
2,2,CHEMBL61682,Fc1ccc(-c2cncc(CN3CCN(c4ccccc4F)CC3)c2)cc1,Active,365.427,4.34900,0.0,3.0,7.602060
3,3,CHEMBL64487,COc1ccccc1-c1cccc(CN2CCN(c3ncccn3)CC2)c1,Active,360.461,3.47440,0.0,5.0,6.443697
4,4,CHEMBL64597,c1cnc(N2CCN(Cc3cccc(-c4ccsc4)c3)CC2)nc1,Active,336.464,3.52730,0.0,5.0,6.522879
...,...,...,...,...,...,...,...,...,...
5623,5623,CHEMBL4864918,CCCn1c(-c2ccccc2)cc(C(=O)NCCCN2CCN(c3cccc(Cl)c...,Inactive,479.068,5.46902,1.0,4.0,6.563837
5624,5624,CHEMBL5398630,CCCn1c(-c2ccccc2)cc(C(=O)NCCCN2CCN(c3cccc(C)c3...,Active,472.677,5.43246,1.0,4.0,6.647817
5625,5625,CHEMBL3183055,CCCn1c(-c2ccccc2)cc(C(=O)NCCCN2CCN(c3cccc(Cl)c...,Active,513.513,6.12242,1.0,4.0,6.202040
5626,5626,CHEMBL2017291,CCCn1c(-c2ccccc2)cc(C(=O)NCCCN2CCN(c3ccc(Cl)c(...,Inactive,549.974,6.54422,1.0,4.0,6.489455


##**Step 4: Prepare the data subset for input into padelpy**

In [None]:
df2 = pd.concat( [df['canonical_smiles'],df['molecule_chembl_id']], axis=1 )
df2.to_csv('molecule.smi', sep='\t', index=False, header=False)
df2

Unnamed: 0,canonical_smiles,molecule_chembl_id
0,c1cnc(N2CCN(Cc3cccc4c3Cc3ccccc3-4)CC2)nc1,CHEMBL303519
1,COc1ccc(-c2cccc(CN3CCN(c4ncccn4)CC3)c2)cc1,CHEMBL292943
2,Fc1ccc(-c2cncc(CN3CCN(c4ccccc4F)CC3)c2)cc1,CHEMBL61682
3,COc1ccccc1-c1cccc(CN2CCN(c3ncccn3)CC2)c1,CHEMBL64487
4,c1cnc(N2CCN(Cc3cccc(-c4ccsc4)c3)CC2)nc1,CHEMBL64597
...,...,...
5623,CCCn1c(-c2ccccc2)cc(C(=O)NCCCN2CCN(c3cccc(Cl)c...,CHEMBL4864918
5624,CCCn1c(-c2ccccc2)cc(C(=O)NCCCN2CCN(c3cccc(C)c3...,CHEMBL5398630
5625,CCCn1c(-c2ccccc2)cc(C(=O)NCCCN2CCN(c3cccc(Cl)c...,CHEMBL3183055
5626,CCCn1c(-c2ccccc2)cc(C(=O)NCCCN2CCN(c3ccc(Cl)c(...,CHEMBL2017291


##**Step 5: Calculate fingerprint descriptors**

In [None]:
fp

{'AtomPairs2DCount': 'AtomPairs2DFingerprintCount.xml',
 'AtomPairs2D': 'AtomPairs2DFingerprinter.xml',
 'EState': 'EStateFingerprinter.xml',
 'CDKextended': 'ExtendedFingerprinter.xml',
 'CDK': 'Fingerprinter.xml',
 'CDKgraphonly': 'GraphOnlyFingerprinter.xml',
 'KlekotaRothCount': 'KlekotaRothFingerprintCount.xml',
 'KlekotaRoth': 'KlekotaRothFingerprinter.xml',
 'MACCS': 'MACCSFingerprinter.xml',
 'PubChem': 'PubchemFingerprinter.xml',
 'SubstructureCount': 'SubstructureFingerprintCount.xml',
 'Substructure': 'SubstructureFingerprinter.xml'}

In [None]:
# Calculating the descriptors. In this case I chose Pubchem


from padelpy import padeldescriptor

fingerprint = 'PubChem'

fingerprint_output_file = ''.join([fingerprint,'.csv']) #Pubchem.csv
fingerprint_descriptortypes = fp[fingerprint]

padeldescriptor(mol_dir='molecule.smi',
                d_file=fingerprint_output_file, #'Pubchem.csv'
                #descriptortypes='PubChemFingerprint.xml',
                descriptortypes= fingerprint_descriptortypes,
                detectaromaticity=True,
                standardizenitro=True,
                standardizetautomers=True,
                threads=2,
                removesalt=True,
                log=True,
                fingerprints=True)

##**Step 6: View the calculated descriptors**

In [None]:
descritores = pd.read_csv(fingerprint_output_file)
descritores

Unnamed: 0,Name,PubchemFP0,PubchemFP1,PubchemFP2,PubchemFP3,PubchemFP4,PubchemFP5,PubchemFP6,PubchemFP7,PubchemFP8,...,PubchemFP871,PubchemFP872,PubchemFP873,PubchemFP874,PubchemFP875,PubchemFP876,PubchemFP877,PubchemFP878,PubchemFP879,PubchemFP880
0,CHEMBL303519,1,1,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,CHEMBL292943,1,1,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,CHEMBL61682,1,1,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,CHEMBL64487,1,1,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,CHEMBL64597,1,1,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5623,CHEMBL4864918,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5624,CHEMBL5398630,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5625,CHEMBL3183055,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5626,CHEMBL2017291,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## **Step 7: Save the dataset**

In [None]:
descritores.to_csv("DA_5HT_part4.csv")