italicized text:## **UNIVERSIDADE LÚRIO**


STUDENT: Hamza Age Daudo

PROJECT: VIRTUAL SCREENING THROUGH QSAR-3D ANALYSIS OF POTENTIAL AGENTS USED IN THE FIGHT AGAINST LEPROSY - CLOFAZIMINE ANALOGUES.


INSTALLATIONS AND IMPORTS:

This section should be executed every time this Notebook is reopened.

\1.1. Performing the necessary installations/uninstallations:

In [1]:
!pip install fastapi kaleido python-multipart uvicorn
!pip install chembl_webresource_client

Collecting fastapi
  Downloading fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting kaleido
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl.metadata (15 kB)
Collecting python-multipart
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting uvicorn
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting starlette<0.42.0,>=0.40.0 (from fastapi)
  Downloading starlette-0.41.3-py3-none-any.whl.metadata (6.0 kB)
Downloading fastapi-0.115.6-py3-none-any.whl (94 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.8/94.8 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.9/79.9 MB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading python_multipart-0.0.20-py3-none-any.whl (24 kB)
Downloading uvicorn-0.34.0-py3-none-any.whl (62 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━

1.2. Importing necessary libraries:

In [2]:
import pandas as pd
from chembl_webresource_client.new_client import new_client

PART 2:
DATASET SELECTION:

The chosen database is ChEMBL (https://www.ebi.ac.uk/chembl/). It is a database of bioactive, drug-like small molecules, containing 2D structures, calculated properties (e.g., logP, molecular weight, Lipinski parameters, etc.), and abstracted bioactivities (e.g., binding constants, pharmacology, and ADMET data). The data are summarized and curated from primary scientific literature, covering a significant portion of structure-activity relationships (SAR) and modern drug discovery.

2.1 Searching for datasets targeting: "CHEMBL4633"

In [3]:
alvo = new_client.target
pesquisa_alvo = alvo.search('CHEMBL4633')
ds = pd.DataFrame.from_dict(pesquisa_alvo)
ds

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,14.0,False,CHEMBL4633,"[{'accession': 'P22001', 'component_descriptio...",SINGLE PROTEIN,9606
1,[],Homo sapiens,Voltage-gated potassium channel,2.0,False,CHEMBL2362996,"[{'accession': 'P51787', 'component_descriptio...",PROTEIN FAMILY,9606


2.2 Searching for a specific target within the dataset:

In [5]:
# Defining the target to be searched:
alvo = "Voltage-gated potassium channel"

# Checking if any element in pref_name contains this target:
contains_alvo = ds['pref_name'].str.contains(alvo)

# Obtaining the indices of the rows with the defined target:
indices_com_alvo = ds[contains_alvo].index.tolist()

if contains_alvo.any():
    print(f"Pelo menos um elemento contém o termo: {alvo}")
    print(f"Índices das linhas com o termo '{alvo}': {indices_com_alvo}")
else:
    print(f"Nenhum elemento contém o termo: {alvo}")

Pelo menos um elemento contém o termo: Voltage-gated potassium channel
Índices das linhas com o termo 'Voltage-gated potassium channel': [0, 1]


2.3 Converting IC50 values to a standard concentration unit (Nanomolar - nM) and generating a single dataframe:

In order to expand access to bioactivity data, a unit conversion system has been applied to transfer values in M, µM, mM to a standard nM unit.

Note: For bioassays using common concentration units (m/v), such as µg/mL, the molar mass of each compound would be required to make this conversion feasible.

In [10]:
# prompt: select rows from df['target_chembl_id'] if in this list: 'CHEMBL612893','CHEMBL3797017','CHEMBL612644'

ensaios = ds[ds['target_chembl_id'].isin(["CHEMBL4633"])]
ensaios

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,14.0,False,CHEMBL4633,"[{'accession': 'P22001', 'component_descriptio...",SINGLE PROTEIN,9606


In [11]:
indices_com_ensaio = ensaios.index
indices_com_ensaio

Index([0], dtype='int64')

In [12]:
# Creating a list to store individual DataFrames (Required only during the first execution!):

dfs = []


# Iterating over the different indices:

for i in indices_com_ensaio :

    df_nM_i = []

    df_uM_i = []

    df_mM_i = []

    df_M_i = []

    ds_selecionado_i = ds.target_chembl_id[i]


    # Filtering bioactive compounds with IC50 data in nM units for each index:

    atividade = new_client.activity

    resultado_nM = atividade.filter(target_chembl_id=ds_selecionado_i).filter(standard_type="IC50").filter(units="nM")



    # Filtering bioactive compounds with IC50 data in µM units for each index:

    resultado_uM = atividade.filter(target_chembl_id=ds_selecionado_i).filter(standard_type="IC50").filter(units="uM")


    # Filtering bioactive compounds with IC50 data in mM units for each index:

    resultado_mM = atividade.filter(target_chembl_id=ds_selecionado_i).filter(standard_type="IC50").filter(units="mM")



    # Filtering bioactive compounds with IC50 data in M (molar) units for each index:

    resultado_M = atividade.filter(target_chembl_id=ds_selecionado_i).filter(standard_type="IC50").filter(units="M")


    # Creating a DataFrame for each unit:

    df_nM_i = pd.DataFrame.from_dict(resultado_nM)

    df_uM_i = pd.DataFrame.from_dict(resultado_uM)

    df_mM_i = pd.DataFrame.from_dict(resultado_mM)

    df_M_i = pd.DataFrame.from_dict(resultado_M)




    # Converting each DataFrame to a standard unit (Molar - M):

    if not df_nM_i.empty and 'value' in df_nM_i:

        df_nM_i['value'] = df_nM_i['value'].astype(float)

        df_nM_i['value'] *= 1e-9

    else:

        pass




    if not df_uM_i.empty and 'value' in df_uM_i:

        df_uM_i['value'] = df_uM_i['value'].astype(float)

        df_uM_i['value'] *= 1e-6

    else:

        pass




    if not df_mM_i.empty and 'value' in df_mM_i:

        df_mM_i['value'] = df_mM_i['value'].astype(float)

        df_mM_i['value'] *= 1e-3

    else:

        pass




    if not df_M_i.empty and 'value' in df_M_i:

        df_M_i['value'] = df_M_i['value'].astype(float)

    else:

        pass




    # Adding the DataFrames to the list:

    dfs.append(df_nM_i)

    dfs.append(df_uM_i)

    dfs.append(df_mM_i)

    dfs.append(df_M_i)





# Concatenating the individual DataFrames into a single DataFrame:

df_assays = pd.concat(dfs, ignore_index=True)

df_assays['units'] = 'M'




# Displaying the final DataFrame:

display(df_assays)

Unnamed: 0,action_type,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,...,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,,306427,[],CHEMBL821183,Inhibition of voltage-gated potassium channel ...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,8.600000e-08
1,,,475295,[],CHEMBL750215,Concentration inhibiting [125I]ChTX (charybdot...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,2.400000e-07
2,,,476425,[],CHEMBL750215,Concentration inhibiting [125I]ChTX (charybdot...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,4.710000e-07
3,,,477539,[],CHEMBL750215,Concentration inhibiting [125I]ChTX (charybdot...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,5.000000e-06
4,,,478943,[],CHEMBL750215,Concentration inhibiting [125I]ChTX (charybdot...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,1.000000e-05
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
453,"{'action_type': 'BLOCKER', 'description': 'Neg...",,24753720,[],CHEMBL5121923,Inhibition of human Kv1.3 expressed in HEK293 ...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,6.500000e-06
454,"{'action_type': 'BLOCKER', 'description': 'Neg...",,24753721,[],CHEMBL5121923,Inhibition of human Kv1.3 expressed in HEK293 ...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,9.000000e-07
455,"{'action_type': 'BLOCKER', 'description': 'Neg...",,24753722,[],CHEMBL5121923,Inhibition of human Kv1.3 expressed in HEK293 ...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,9.000000e-07
456,"{'action_type': 'BLOCKER', 'description': 'Neg...",,24753723,[],CHEMBL5121923,Inhibition of human Kv1.3 expressed in HEK293 ...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,3.190000e-05


In [13]:
df_assays["value"].isnull().sum()

0

In [14]:
# Assuming your DataFrame is df_assays:
df_assays.dropna(subset=['value'], inplace=True)

In [15]:
df_assays

Unnamed: 0,action_type,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,...,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,,306427,[],CHEMBL821183,Inhibition of voltage-gated potassium channel ...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,8.600000e-08
1,,,475295,[],CHEMBL750215,Concentration inhibiting [125I]ChTX (charybdot...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,2.400000e-07
2,,,476425,[],CHEMBL750215,Concentration inhibiting [125I]ChTX (charybdot...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,4.710000e-07
3,,,477539,[],CHEMBL750215,Concentration inhibiting [125I]ChTX (charybdot...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,5.000000e-06
4,,,478943,[],CHEMBL750215,Concentration inhibiting [125I]ChTX (charybdot...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,1.000000e-05
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
453,"{'action_type': 'BLOCKER', 'description': 'Neg...",,24753720,[],CHEMBL5121923,Inhibition of human Kv1.3 expressed in HEK293 ...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,6.500000e-06
454,"{'action_type': 'BLOCKER', 'description': 'Neg...",,24753721,[],CHEMBL5121923,Inhibition of human Kv1.3 expressed in HEK293 ...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,9.000000e-07
455,"{'action_type': 'BLOCKER', 'description': 'Neg...",,24753722,[],CHEMBL5121923,Inhibition of human Kv1.3 expressed in HEK293 ...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,9.000000e-07
456,"{'action_type': 'BLOCKER', 'description': 'Neg...",,24753723,[],CHEMBL5121923,Inhibition of human Kv1.3 expressed in HEK293 ...,B,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,3.190000e-05


In [16]:
# Calculate the percentage of each category in the 'assay_type' column
assay_type_percentages = df_assays['assay_type'].value_counts(normalize=True) * 100
print(assay_type_percentages)

assay_type
B    60.043668
F    39.956332
Name: proportion, dtype: float64


In [17]:
# Filtering the DataFrame to include only rows where 'assay_type' is 'F'

df_assays_f_only = df_assays[df_assays['assay_type'] == 'F']
df_assays_f_only

Unnamed: 0,action_type,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,...,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
7,,,482633,[],CHEMBL750216,Inhibition of outward potassium currents (IKn)...,F,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,3.160000e-06
9,,,482635,[],CHEMBL750216,Inhibition of outward potassium currents (IKn)...,F,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,1.053000e-06
12,,,486204,[],CHEMBL750216,Inhibition of outward potassium currents (IKn)...,F,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,2.310000e-07
15,,,489711,[],CHEMBL750216,Inhibition of outward potassium currents (IKn)...,F,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,2.160000e-07
17,,,490896,[],CHEMBL750216,Inhibition of outward potassium currents (IKn)...,F,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,4.340000e-07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
244,,,550013,[],CHEMBL705534,Inhibition of Kv1.3 ion channel. Measured in t...,F,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,1.000000e-05
245,,,550014,[],CHEMBL705534,Inhibition of Kv1.3 ion channel. Measured in t...,F,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,3.000000e-05
246,,,551242,[],CHEMBL705534,Inhibition of Kv1.3 ion channel. Measured in t...,F,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,3.000000e-05
250,,,1421374,[],CHEMBL834727,Inhibitory activity against voltage-gated pota...,F,,,BAO_0000190,...,Homo sapiens,Voltage-gated potassium channel subunit Kv1.3,9606,,,IC50,M,UO_0000065,,5.600000e-06


In [18]:
## Assigning the class of compounds: active if IC50 < 1000 nM, inactive if IC50 > 10000 nM, and intermediate if IC50 is between 1000 nM and 100000 nM.
## The variable of interest is always "standard_value".

bioactivity_class = []
for i in df_assays_f_only.standard_value:
    if float(i) >= 10000:
        bioactivity_class.append("Inactive")
    elif float(i) < 1000:
        bioactivity_class.append("Active")
    else:
        bioactivity_class.append("Intermediate")

In [19]:
# Viewing the bioactive compounds
df_assays_f_only.molecule_chembl_id

Unnamed: 0,molecule_chembl_id
7,CHEMBL55791
9,CHEMBL56173
12,CHEMBL53676
15,CHEMBL298622
17,CHEMBL56589
...,...
244,CHEMBL278501
245,CHEMBL265334
246,CHEMBL16428
250,CHEMBL360583


In [20]:
## 7.1. Iterating through the bioactive compounds:
mol_cid = []
for i in df_assays_f_only.molecule_chembl_id:
    mol_cid.append(i)

In [21]:
# Printing the variable mol_cid:
mol_cid

['CHEMBL55791',
 'CHEMBL56173',
 'CHEMBL53676',
 'CHEMBL298622',
 'CHEMBL56589',
 'CHEMBL54934',
 'CHEMBL293719',
 'CHEMBL299668',
 'CHEMBL56590',
 'CHEMBL53841',
 'CHEMBL59482',
 'CHEMBL293083',
 'CHEMBL55202',
 'CHEMBL298608',
 'CHEMBL416313',
 'CHEMBL293721',
 'CHEMBL54846',
 'CHEMBL59308',
 'CHEMBL56261',
 'CHEMBL274227',
 'CHEMBL279370',
 'CHEMBL16308',
 'CHEMBL278660',
 'CHEMBL278660',
 'CHEMBL277598',
 'CHEMBL277598',
 'CHEMBL275836',
 'CHEMBL16806',
 'CHEMBL16806',
 'CHEMBL16917',
 'CHEMBL16792',
 'CHEMBL279358',
 'CHEMBL16449',
 'CHEMBL16432',
 'CHEMBL16432',
 'CHEMBL16446',
 'CHEMBL16446',
 'CHEMBL16848',
 'CHEMBL16848',
 'CHEMBL277952',
 'CHEMBL276543',
 'CHEMBL280102',
 'CHEMBL280102',
 'CHEMBL16706',
 'CHEMBL280052',
 'CHEMBL16110',
 'CHEMBL16450',
 'CHEMBL16581',
 'CHEMBL16581',
 'CHEMBL16804',
 'CHEMBL16804',
 'CHEMBL429234',
 'CHEMBL441507',
 'CHEMBL16945',
 'CHEMBL16945',
 'CHEMBL277287',
 'CHEMBL277287',
 'CHEMBL280079',
 'CHEMBL417995',
 'CHEMBL417995',
 'CHEMBL27706

In [22]:
## 7.2. Iterating over canonical SMILES into a list.
canonical_smiles = []
for i in df_assays.canonical_smiles:
    canonical_smiles.append(i)

In [23]:
## 7.3. Iterating over standard_value into a list.
standard_value = []
for i in df_assays.standard_value:
    standard_value.append(i)

In [24]:
## 7.4. Combining the four variables into the same DataFrame.
dados_tupla = list(zip(mol_cid, canonical_smiles, bioactivity_class, standard_value))
df3 = pd.DataFrame( dados_tupla,  columns=['molecule_chembl_id', 'canonical_smiles', 'bioactivity_class', 'standard_value'])

In [25]:
df3

Unnamed: 0,molecule_chembl_id,canonical_smiles,bioactivity_class,standard_value
0,CHEMBL55791,C=C(C)O[C@H]1[C@@H](OC(C)=O)[C@H]2[C@@](C)(CC[...,Intermediate,86.0
1,CHEMBL56173,CCCCCCNc1cc[n+](-c2ccccc2)c2c(-c3ccccc3)cc(OC)...,Intermediate,240.0
2,CHEMBL53676,CCCC/N=c1\ccn(CCCc2ccccc2)c2cc(Cl)ccc12,Active,471.0
3,CHEMBL298622,CCCCCCOc1ccc(Cc2ccccc2)c2ccccc12,Active,5000.0
4,CHEMBL56589,O=C(O)CC/N=c1\ccn(Cc2ccccc2Cl)c2cc(Cl)ccc12,Active,10000.0
...,...,...,...,...
178,CHEMBL278501,COc1ccccc1CCC1(O)C(C)=C[C@@H](OC(C)=O)[C@@]2(C...,Inactive,142.0
179,CHEMBL265334,CCOc1ccccc1CCC1(O)C(C)=C[C@@H](OC(C)=O)[C@@]2(...,Inactive,34.0
180,CHEMBL16428,CCOc1ccccc1CCC1(O)C(C)=C[C@@H](OC(C)=O)[C@@]2(...,Inactive,80.0
181,CHEMBL360583,COc1cccc(CCC2(O)C(C)=C[C@@H](OC(C)=O)[C@@]3(C)...,Intermediate,24.0


In [26]:

# Saving the DataFrame to a CSV file.

df3.to_csv('Voltage-gated potassium channel subunit Kv1.3 F.csv', index=False)
