## ChEMBL for drug discovery

Mining ChEMBL to find potential drug candidates against LARP6 protein and assesing its druggability

In [14]:
! pip install chembl_webresource_client

[33mDEPRECATION: Configuring installation scheme with distutils config files is deprecated and will no longer work in the near future. If you are using a Homebrew or Linuxbrew Python, please see discussion at https://github.com/Homebrew/homebrew-core/issues/76621[0m
You should consider upgrading via the '/usr/local/opt/python@3.9/bin/python3.9 -m pip install --upgrade pip' command.[0m


In [6]:
# Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client
import numpy as np
from rdkit import Chem
from rdkit.Chem import Descriptors, Lipinski

### Target search for LARP6

In [29]:
# Target search for coronavirus
target = new_client.target
target_query = target.search('LARP6')
targets = pd.DataFrame.from_dict(target_query)
targets

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Homo sapiens,La-related protein 6,17.0,False,CHEMBL4739702,"[{'accession': 'Q9BRS8', 'component_descriptio...",SINGLE PROTEIN,9606


There is only one compound associated with LARP6

### Select and retrieve bioactivity data for LARP6

In [8]:
selected_target= targets.target_chembl_id[0]
selected_target

'CHEMBL4739702'

### Get bioactivity data

In [9]:
activity = new_client.activity


In [10]:
df = pd.DataFrame.from_dict(activity.filter(target_chembl_id=selected_target))

In [7]:
df

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,...,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,Active,22491650,"[{'comments': None, 'relation': '=', 'result_f...",CHEMBL4689799,Inhibition of LARP6 La module (unknown origin)...,B,,,BAO_0000179,BAO_0000357,...,Homo sapiens,La-related protein 6,9606,,,INH,,,,
1,Active,22491654,"[{'comments': None, 'relation': '=', 'result_f...",CHEMBL4689803,Inhibition of LARP6 La domain (unknown origin)...,B,,,BAO_0000179,BAO_0000357,...,Homo sapiens,La-related protein 6,9606,,,INH,,,,
2,Active,22491655,"[{'comments': None, 'relation': '=', 'result_f...",CHEMBL4689804,Inhibition of LARP6 La domain (unknown origin)...,B,,,BAO_0000179,BAO_0000357,...,Homo sapiens,La-related protein 6,9606,,,INH,,,,
3,Active,22491656,"[{'comments': None, 'relation': '=', 'result_f...",CHEMBL4689805,Inhibition of LARP6 La module (unknown origin)...,B,,,BAO_0000179,BAO_0000357,...,Homo sapiens,La-related protein 6,9606,,,INH,,,,
4,Active,22491660,"[{'comments': None, 'relation': '=', 'result_f...",CHEMBL4689809,Inhibition of LARP6 La module (unknown origin)...,B,,,BAO_0000179,BAO_0000357,...,Homo sapiens,La-related protein 6,9606,,,INH,,,,


Let's check that there is only one type of bioactivity type

In [8]:
df.standard_type.unique()

array(['Inhibition'], dtype=object)

Let's check information on potency of the drug

In [9]:
df.standard_value

0    None
1    None
2    None
3    None
4    None
Name: standard_value, dtype: object

There is no information on the potency of the drug

In [41]:
df.to_csv('bioativity_data.csv', index=False)

### Lipinski descriptors
Let's extract information to calculate Lipinski descriptors

In [13]:

def lipinski(smiles, verbose=False):

    moldata= []
    for elem in smiles:
        mol=Chem.MolFromSmiles(elem) 
        moldata.append(mol)
       
    baseData= np.arange(1,1)
    i=0  
    for mol in moldata:        
       
        desc_MolWt = Descriptors.MolWt(mol)
        desc_MolLogP = Descriptors.MolLogP(mol)
        desc_NumHDonors = Lipinski.NumHDonors(mol)
        desc_NumHAcceptors = Lipinski.NumHAcceptors(mol)
           
        row = np.array([desc_MolWt,
                        desc_MolLogP,
                        desc_NumHDonors,
                        desc_NumHAcceptors])   
    
        if(i==0):
            baseData=row
        else:
            baseData=np.vstack([baseData, row])
        i=i+1      
    
    columnNames=["MW","LogP","NumHDonors","NumHAcceptors"]   
    descriptors = pd.DataFrame(data=baseData,columns=columnNames)
    
    return descriptors

In [15]:
df_lipinski = lipinski(df.canonical_smiles)
df_lipinski

Unnamed: 0,MW,LogP,NumHDonors,NumHAcceptors
0,633.882,5.30574,2.0,7.0
1,633.882,5.30574,2.0,7.0
2,633.882,5.30574,2.0,7.0
3,633.882,5.30574,2.0,7.0
4,633.882,5.30574,2.0,7.0


Now, let's concatenate both data frames

In [16]:
df_combined = pd.concat([df,df_lipinski], axis=1)


This dataset is not very useful to analyse the chemical Space, since there is no information on IC50 or bioactivity_class