# Project_Data_Science_Drug

# ChEMBL Database

The ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs.

The ChEMBL database contains compound bioactivity data against drug targets. Bioactivity is reported in Ki, Kd, IC50, and EC50. Data can be filtered and analyzed to develop compound screening libraries for lead identification during drug discovery.

ChEMBL version 2 (ChEMBL_02) was launched in January 2010, including 2.4 million bioassay measurements covering 622,824 compounds, including 24,000 natural products. This was obtained from curating over 34,000 publications across twelve medicinal chemistry journals. ChEMBL's coverage of available bioactivity data has grown to become "the most comprehensive ever seen in a public database." In October 2010 ChEMBL version 8 (ChEMBL_08) was launched, with over 2.97 million bioassay measurements covering 636,269 compounds.

ChEMBL_10 saw the addition of the PubChem confirmatory assays, in order to integrate data that is comparable to the type and class of data contained within ChEMBL.

ChEMBLdb can be accessed via a web interface or downloaded by File Transfer Protocol. It is formatted in a manner amenable to computerized data mining, and attempts to standardize activities between different publications, to enable comparative analysis. ChEMBL is also integrated into other large-scale chemistry resources, including PubChem and the ChemSpider system of the Royal Society of Chemistry.

## Installing libraries

In [1]:
! pip install chembl_webresource_client

Collecting chembl_webresource_client
  Downloading chembl-webresource-client-0.10.2.tar.gz (51 kB)
Collecting requests-cache>=0.4.7
  Downloading requests_cache-0.5.2-py2.py3-none-any.whl (22 kB)
Collecting easydict
  Downloading easydict-1.9.tar.gz (6.4 kB)
Building wheels for collected packages: chembl-webresource-client, easydict
  Building wheel for chembl-webresource-client (setup.py): started
  Building wheel for chembl-webresource-client (setup.py): finished with status 'done'
  Created wheel for chembl-webresource-client: filename=chembl_webresource_client-0.10.2-py3-none-any.whl size=55664 sha256=233ca59e3ccf777dd2e788b6c097a81628227f7756be4ff22c9fbb23d9bdcb28
  Stored in directory: c:\users\apc\appdata\local\pip\cache\wheels\91\09\f4\a1a5dd368c537df99eb50800f0ef585e93358ed8ba1581d770
  Building wheel for easydict (setup.py): started
  Building wheel for easydict (setup.py): finished with status 'done'
  Created wheel for easydict: filename=easydict-1.9-py3-none-any.whl size=6

## Importing libraries

In [2]:
#Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client

# Target protein search:

## Coronavirus Target Search:

In [3]:
target = new_client.target
target_query = target.search("coronavirus")
targets = pd.DataFrame.from_dict(target_query)
targets

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Coronavirus,Coronavirus,17.0,False,CHEMBL613732,[],ORGANISM,11119
1,[],SARS coronavirus,SARS coronavirus,15.0,False,CHEMBL612575,[],ORGANISM,227859
2,[],Feline coronavirus,Feline coronavirus,15.0,False,CHEMBL612744,[],ORGANISM,12663
3,[],Human coronavirus 229E,Human coronavirus 229E,13.0,False,CHEMBL613837,[],ORGANISM,11137
4,"[{'xref_id': 'P0C6U8', 'xref_name': None, 'xre...",SARS coronavirus,SARS coronavirus 3C-like proteinase,10.0,False,CHEMBL3927,"[{'accession': 'P0C6U8', 'component_descriptio...",SINGLE PROTEIN,227859
5,[],Middle East respiratory syndrome-related coron...,Middle East respiratory syndrome-related coron...,9.0,False,CHEMBL4296578,[],ORGANISM,1335626
6,"[{'xref_id': 'P0C6X7', 'xref_name': None, 'xre...",SARS coronavirus,Replicase polyprotein 1ab,4.0,False,CHEMBL5118,"[{'accession': 'P0C6X7', 'component_descriptio...",SINGLE PROTEIN,227859
7,[],Severe acute respiratory syndrome coronavirus 2,Replicase polyprotein 1ab,4.0,False,CHEMBL4523582,"[{'accession': 'P0DTD1', 'component_descriptio...",SINGLE PROTEIN,2697049


As we can see there are 7 results, we will be focussed in target_type:Single Protein for Sars Coronavirus

## Select target and take the data for SARS coronavirus 3C-like proteinase (pref_name)

In [6]:
selected_target = targets.target_chembl_id[4]
selected_target

'CHEMBL3927'

Now, we proceed taking the information with regards to coronavirus 3C-like proteinase (CHEMBL3927) acttivity:

In [9]:
activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target)
pd.DataFrame.from_dict(res)

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,...,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,1480934,[],CHEMBL831837,In vitro percent inhibition against SARS coron...,B,,,BAO_0000201,BAO_0000357,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,Inhibition,%,UO_0000187,,25.0
1,,1480935,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,7.2
2,,1480936,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,9.4
3,,1481061,[],CHEMBL830868,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,13.5
4,,1481062,[],CHEMBL832053,In vitro percent inhibition against SARS coron...,B,,,BAO_0000201,BAO_0000357,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,Inhibition,%,UO_0000187,,13.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
324,Dose-dependent effect,12041527,[],CHEMBL2149727,Inhibition of SARS-CoV 3CLpro expressed in Esc...,B,,,BAO_0000376,BAO_0000019,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,INH,,,,
325,Dose-dependent effect,12041528,[],CHEMBL2149727,Inhibition of SARS-CoV 3CLpro expressed in Esc...,B,,,BAO_0000376,BAO_0000019,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,INH,,,,
326,Dose-dependent effect,12041529,[],CHEMBL2149727,Inhibition of SARS-CoV 3CLpro expressed in Esc...,B,,,BAO_0000376,BAO_0000019,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,INH,,,,
327,Dose-dependent effect,12041530,[],CHEMBL2149727,Inhibition of SARS-CoV 3CLpro expressed in Esc...,B,,,BAO_0000376,BAO_0000019,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,INH,,,,


In [10]:
pd.DataFrame.from_dict(res).standard_type.unique()

array(['Inhibition', 'IC50', 'kinact', 'T1/2', 'Activity', 'Ki'],
      dtype=object)

We will select the standard_type "IC50" for the analysis, to have the same unit and have the dataset uniform

In [13]:
res = activity.filter(target_chembl_id=selected_target).filter(standard_type="IC50")
df = pd.DataFrame.from_dict(res)
df

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,...,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,1480935,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,7.2
1,,1480936,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,9.4
2,,1481061,[],CHEMBL830868,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,13.5
3,,1481065,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,13.11
4,,1481066,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
128,,12041507,[],CHEMBL2150313,Inhibition of SARS-CoV PLpro expressed in Esch...,B,,,BAO_0000190,BAO_0000019,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,10.6
129,,12041508,[],CHEMBL2150313,Inhibition of SARS-CoV PLpro expressed in Esch...,B,,,BAO_0000190,BAO_0000019,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,10.1
130,,12041509,[],CHEMBL2150313,Inhibition of SARS-CoV PLpro expressed in Esch...,B,,,BAO_0000190,BAO_0000019,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,11.5
131,,12041510,[],CHEMBL2150313,Inhibition of SARS-CoV PLpro expressed in Esch...,B,,,BAO_0000190,BAO_0000019,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,10.7


We will save the results into a CSV file (SARScov_data.csv)

In [15]:
df.to_csv("SARScov_data.csv",index=False)