# **Computational Drug Discovery [Part 1] Download Bioactivity Data )**



In **Part 1**, we will be performing Data Collection and Pre-Processing from the ChEMBL Database.



---

## **Installing libraries**

Install the ChEMBL web service package so that we can retrieve bioactivity data from the ChEMBL Database.

In [None]:
! pip install chembl_webresource_client

Collecting chembl_webresource_client
[?25l  Downloading https://files.pythonhosted.org/packages/e5/c9/a909331598965376ba15a9e5dbe02b5a007172c0726e6cd56424f558b236/chembl-webresource-client-0.10.4.tar.gz (51kB)
[K     |██████▍                         | 10kB 9.5MB/s eta 0:00:01[K     |████████████▊                   | 20kB 11.9MB/s eta 0:00:01[K     |███████████████████             | 30kB 12.6MB/s eta 0:00:01[K     |█████████████████████████▍      | 40kB 9.1MB/s eta 0:00:01[K     |███████████████████████████████▊| 51kB 5.6MB/s eta 0:00:01[K     |████████████████████████████████| 61kB 3.5MB/s 
Collecting requests-cache>=0.6.0
  Downloading https://files.pythonhosted.org/packages/1d/a1/3062a791c433212c98a7dd2e510913bccd09e8d216d7da273f44736ae7c7/requests_cache-0.6.4-py2.py3-none-any.whl
Collecting url-normalize>=1.4
  Downloading https://files.pythonhosted.org/packages/65/1c/6c6f408be78692fc850006a2b6dea37c2b8592892534e09996e401efc74b/url_normalize-1.4.3-py2.py3-none-any.whl
B

## **Importing libraries**

In [None]:
# Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client

## **Search for Target protein**

### **Target search for BACE1**

In [None]:
# Target search for BACE1
target = new_client.target
target_query = target.search('BACE 1')
targets = pd.DataFrame.from_dict(target_query)
targets

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Homo sapiens,Beta-secretase (BACE),15.0,False,CHEMBL2111390,"[{'accession': 'Q9Y5Z0', 'component_descriptio...",PROTEIN FAMILY,9606.0
1,"[{'xref_id': 'Beta-secretase_1', 'xref_name': ...",Homo sapiens,Beta-secretase 1,13.0,False,CHEMBL4822,"[{'accession': 'P56817', 'component_descriptio...",SINGLE PROTEIN,9606.0
2,"[{'xref_id': 'P56818', 'xref_name': None, 'xre...",Mus musculus,Beta-secretase 1,13.0,False,CHEMBL4593,"[{'accession': 'P56818', 'component_descriptio...",SINGLE PROTEIN,10090.0
3,[],Rattus norvegicus,Beta-secretase 1,13.0,False,CHEMBL3259473,"[{'accession': 'P56819', 'component_descriptio...",SINGLE PROTEIN,10116.0
4,"[{'xref_id': 'PTGS1', 'xref_name': None, 'xref...",Homo sapiens,Cyclooxygenase-1,4.0,False,CHEMBL221,"[{'accession': 'P23219', 'component_descriptio...",SINGLE PROTEIN,9606.0
...,...,...,...,...,...,...,...,...,...
3319,[],Zika virus,Genome polyprotein,0.0,False,CHEMBL4523307,"[{'accession': 'Q32ZE1', 'component_descriptio...",SINGLE PROTEIN,64320.0
3320,[],Severe acute respiratory syndrome coronavirus 2,Replicase polyprotein 1ab,0.0,False,CHEMBL4523582,"[{'accession': 'P0DTD1', 'component_descriptio...",SINGLE PROTEIN,2697049.0
3321,[],Yellow fever virus (strain 17D vaccine) (YFV),Genome polyprotein,0.0,False,CHEMBL4523585,"[{'accession': 'P03314', 'component_descriptio...",SINGLE PROTEIN,11090.0
3322,[],Homo sapiens,Cytochrome P450,0.0,False,CHEMBL4523986,"[{'accession': 'P08684', 'component_descriptio...",PROTEIN FAMILY,9606.0


### **Select and retrieve bioactivity data for *Human BACE1* (first entry)**

We will assign the first entry (which corresponds to the target protein, *Human BACE1*) to the ***selected_target*** variable 

In [None]:
selected_target = targets.target_chembl_id[1]
selected_target

'CHEMBL4822'

Here, we will retrieve only bioactivity data for *Human BACE1* (CHEMBL2487) that are reported as pChEMBL values.

In [None]:
activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target).filter(standard_type="IC50")

In [None]:
df = pd.DataFrame.from_dict(res)

In [None]:
df

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,78857,[],CHEMBL653511,Inhibitory activity against Beta-secretase 1 w...,B,,,BAO_0000190,BAO_0000357,single protein format,CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CCC(=O...,,,CHEMBL1136466,Bioorg. Med. Chem. Lett.,2003,"{'bei': '6.39', 'le': '0.12', 'lle': '7.82', '...",CHEMBL406146,HGLUVALLEUPNSASPALAGLUPHEOH,CHEMBL406146,6.38,False,http://www.openphacts.org/units/Nanomolar,260010,=,1,True,=,,IC50,nM,,413.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,nM,UO_0000065,,413.0
1,,391560,[],CHEMBL653332,Compound was tested for its inhibitory activit...,B,,,BAO_0000190,BAO_0000357,single protein format,CC(C)C[C@H](NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N...,,,CHEMBL1144938,J. Med. Chem.,2003,"{'bei': '9.74', 'le': '0.19', 'lle': '10.44', ...",CHEMBL78946,,CHEMBL78946,8.70,True,http://www.openphacts.org/units/Nanomolar,274693,=,1,True,=,,IC50,nM,,2.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,0.002
2,,391983,[],CHEMBL653512,Inhibition of human Beta-secretase 1,B,,,BAO_0000190,BAO_0000357,single protein format,CCC(C)C[C@H](NC(=O)[C@H](CC(C)C)NC(C)=O)[C@@H]...,,,CHEMBL1147464,Bioorg. Med. Chem. Lett.,2004,"{'bei': '8.43', 'le': '0.17', 'lle': '3.99', '...",CHEMBL324109,,CHEMBL324109,6.34,False,http://www.openphacts.org/units/Nanomolar,219979,=,1,True,=,,IC50,nM,,460.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,0.46
3,,395858,[],CHEMBL653512,Inhibition of human Beta-secretase 1,B,,,BAO_0000190,BAO_0000357,single protein format,CC(=O)NCC(=O)N[C@@H](Cc1ccccc1)[C@@H](O)CC(=O)...,,,CHEMBL1147464,Bioorg. Med. Chem. Lett.,2004,"{'bei': '6.84', 'le': '0.13', 'lle': '3.09', '...",CHEMBL114147,,CHEMBL114147,5.05,False,http://www.openphacts.org/units/Nanomolar,219988,=,1,True,=,,IC50,nM,,9000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,9.0
4,,395859,[],CHEMBL653512,Inhibition of human Beta-secretase 1,B,,,BAO_0000190,BAO_0000357,single protein format,CC(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1...,,,CHEMBL1147464,Bioorg. Med. Chem. Lett.,2004,"{'bei': '6.34', 'le': '0.12', 'lle': '1.68', '...",CHEMBL419949,,CHEMBL419949,5.25,False,http://www.openphacts.org/units/Nanomolar,219982,=,1,True,=,,IC50,nM,,5600.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,5.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10151,,19482230,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,CC(Cc1cc2ccccc2nc1N)C(=O)NC[C@@]12CCCO[C@@H]1C...,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '11.65', 'le': '0.22', 'lle': '1.77', ...",CHEMBL4565226,,CHEMBL4565226,4.47,False,http://www.openphacts.org/units/Nanomolar,3257930,=,1,True,=,,IC50,nM,,34000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,34.0
10152,,19482231,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,Nc1nc2ccccc2cc1CCC(=O)N1CC[C@H]2OCCC[C@@]2(Cc2...,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '9.49', 'le': '0.17', 'lle': '-0.31', ...",CHEMBL4520156,,CHEMBL4520156,4.08,False,http://www.openphacts.org/units/Nanomolar,3257925,=,1,True,=,,IC50,nM,,84000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,84.0
10153,,19482232,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,Nc1nc2ccccc2cc1CCC(=O)NC[C@@]12CCCO[C@@H]1CCOC2,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '12.28', 'le': '0.23', 'lle': '2.09', ...",CHEMBL4585673,,CHEMBL4585673,4.54,False,http://www.openphacts.org/units/Nanomolar,3257921,=,1,True,=,,IC50,nM,,29000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,29.0
10154,,19482233,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,COc1ccc2c(c1)[C@@H](O)[C@@]1(CCN(C(=O)CCc3cc4c...,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '10.20', 'le': '0.19', 'lle': '0.99', ...",CHEMBL4546115,,CHEMBL4546115,4.26,False,http://www.openphacts.org/units/Nanomolar,3257920,=,1,True,=,,IC50,nM,,55000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,55.0


Finally we will save the resulting bioactivity data to a CSV file **bioactivity_data.csv**.

In [None]:
df.to_csv('BACE1_01_bioactivity_data_raw.csv', index=False)

## **Handling missing data**
If any compounds has missing value for the **standard_value** and **canonical_smiles** column then drop it.

In [None]:
df2 = df[df.standard_value.notna()]
df2 = df2[df.canonical_smiles.notna()]
df2

  


Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,78857,[],CHEMBL653511,Inhibitory activity against Beta-secretase 1 w...,B,,,BAO_0000190,BAO_0000357,single protein format,CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CCC(=O...,,,CHEMBL1136466,Bioorg. Med. Chem. Lett.,2003,"{'bei': '6.39', 'le': '0.12', 'lle': '7.82', '...",CHEMBL406146,HGLUVALLEUPNSASPALAGLUPHEOH,CHEMBL406146,6.38,False,http://www.openphacts.org/units/Nanomolar,260010,=,1,True,=,,IC50,nM,,413.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,nM,UO_0000065,,413.0
1,,391560,[],CHEMBL653332,Compound was tested for its inhibitory activit...,B,,,BAO_0000190,BAO_0000357,single protein format,CC(C)C[C@H](NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N...,,,CHEMBL1144938,J. Med. Chem.,2003,"{'bei': '9.74', 'le': '0.19', 'lle': '10.44', ...",CHEMBL78946,,CHEMBL78946,8.70,True,http://www.openphacts.org/units/Nanomolar,274693,=,1,True,=,,IC50,nM,,2.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,0.002
2,,391983,[],CHEMBL653512,Inhibition of human Beta-secretase 1,B,,,BAO_0000190,BAO_0000357,single protein format,CCC(C)C[C@H](NC(=O)[C@H](CC(C)C)NC(C)=O)[C@@H]...,,,CHEMBL1147464,Bioorg. Med. Chem. Lett.,2004,"{'bei': '8.43', 'le': '0.17', 'lle': '3.99', '...",CHEMBL324109,,CHEMBL324109,6.34,False,http://www.openphacts.org/units/Nanomolar,219979,=,1,True,=,,IC50,nM,,460.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,0.46
3,,395858,[],CHEMBL653512,Inhibition of human Beta-secretase 1,B,,,BAO_0000190,BAO_0000357,single protein format,CC(=O)NCC(=O)N[C@@H](Cc1ccccc1)[C@@H](O)CC(=O)...,,,CHEMBL1147464,Bioorg. Med. Chem. Lett.,2004,"{'bei': '6.84', 'le': '0.13', 'lle': '3.09', '...",CHEMBL114147,,CHEMBL114147,5.05,False,http://www.openphacts.org/units/Nanomolar,219988,=,1,True,=,,IC50,nM,,9000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,9.0
4,,395859,[],CHEMBL653512,Inhibition of human Beta-secretase 1,B,,,BAO_0000190,BAO_0000357,single protein format,CC(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1...,,,CHEMBL1147464,Bioorg. Med. Chem. Lett.,2004,"{'bei': '6.34', 'le': '0.12', 'lle': '1.68', '...",CHEMBL419949,,CHEMBL419949,5.25,False,http://www.openphacts.org/units/Nanomolar,219982,=,1,True,=,,IC50,nM,,5600.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,5.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10151,,19482230,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,CC(Cc1cc2ccccc2nc1N)C(=O)NC[C@@]12CCCO[C@@H]1C...,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '11.65', 'le': '0.22', 'lle': '1.77', ...",CHEMBL4565226,,CHEMBL4565226,4.47,False,http://www.openphacts.org/units/Nanomolar,3257930,=,1,True,=,,IC50,nM,,34000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,34.0
10152,,19482231,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,Nc1nc2ccccc2cc1CCC(=O)N1CC[C@H]2OCCC[C@@]2(Cc2...,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '9.49', 'le': '0.17', 'lle': '-0.31', ...",CHEMBL4520156,,CHEMBL4520156,4.08,False,http://www.openphacts.org/units/Nanomolar,3257925,=,1,True,=,,IC50,nM,,84000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,84.0
10153,,19482232,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,Nc1nc2ccccc2cc1CCC(=O)NC[C@@]12CCCO[C@@H]1CCOC2,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '12.28', 'le': '0.23', 'lle': '2.09', ...",CHEMBL4585673,,CHEMBL4585673,4.54,False,http://www.openphacts.org/units/Nanomolar,3257921,=,1,True,=,,IC50,nM,,29000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,29.0
10154,,19482233,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,COc1ccc2c(c1)[C@@H](O)[C@@]1(CCN(C(=O)CCc3cc4c...,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '10.20', 'le': '0.19', 'lle': '0.99', ...",CHEMBL4546115,,CHEMBL4546115,4.26,False,http://www.openphacts.org/units/Nanomolar,3257920,=,1,True,=,,IC50,nM,,55000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,55.0


In [None]:
len(df2.canonical_smiles.unique())

7062

In [None]:
df2_nr = df2.drop_duplicates(['canonical_smiles'])
df2_nr

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,78857,[],CHEMBL653511,Inhibitory activity against Beta-secretase 1 w...,B,,,BAO_0000190,BAO_0000357,single protein format,CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CCC(=O...,,,CHEMBL1136466,Bioorg. Med. Chem. Lett.,2003,"{'bei': '6.39', 'le': '0.12', 'lle': '7.82', '...",CHEMBL406146,HGLUVALLEUPNSASPALAGLUPHEOH,CHEMBL406146,6.38,False,http://www.openphacts.org/units/Nanomolar,260010,=,1,True,=,,IC50,nM,,413.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,nM,UO_0000065,,413.0
1,,391560,[],CHEMBL653332,Compound was tested for its inhibitory activit...,B,,,BAO_0000190,BAO_0000357,single protein format,CC(C)C[C@H](NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N...,,,CHEMBL1144938,J. Med. Chem.,2003,"{'bei': '9.74', 'le': '0.19', 'lle': '10.44', ...",CHEMBL78946,,CHEMBL78946,8.70,True,http://www.openphacts.org/units/Nanomolar,274693,=,1,True,=,,IC50,nM,,2.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,0.002
2,,391983,[],CHEMBL653512,Inhibition of human Beta-secretase 1,B,,,BAO_0000190,BAO_0000357,single protein format,CCC(C)C[C@H](NC(=O)[C@H](CC(C)C)NC(C)=O)[C@@H]...,,,CHEMBL1147464,Bioorg. Med. Chem. Lett.,2004,"{'bei': '8.43', 'le': '0.17', 'lle': '3.99', '...",CHEMBL324109,,CHEMBL324109,6.34,False,http://www.openphacts.org/units/Nanomolar,219979,=,1,True,=,,IC50,nM,,460.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,0.46
3,,395858,[],CHEMBL653512,Inhibition of human Beta-secretase 1,B,,,BAO_0000190,BAO_0000357,single protein format,CC(=O)NCC(=O)N[C@@H](Cc1ccccc1)[C@@H](O)CC(=O)...,,,CHEMBL1147464,Bioorg. Med. Chem. Lett.,2004,"{'bei': '6.84', 'le': '0.13', 'lle': '3.09', '...",CHEMBL114147,,CHEMBL114147,5.05,False,http://www.openphacts.org/units/Nanomolar,219988,=,1,True,=,,IC50,nM,,9000.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,9.0
4,,395859,[],CHEMBL653512,Inhibition of human Beta-secretase 1,B,,,BAO_0000190,BAO_0000357,single protein format,CC(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1...,,,CHEMBL1147464,Bioorg. Med. Chem. Lett.,2004,"{'bei': '6.34', 'le': '0.12', 'lle': '1.68', '...",CHEMBL419949,,CHEMBL419949,5.25,False,http://www.openphacts.org/units/Nanomolar,219982,=,1,True,=,,IC50,nM,,5600.0,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,IC50,uM,UO_0000065,,5.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10138,,19482217,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,CC(Cc1cc2ccccc2nc1N)C(=O)NC[C@@]12CCCO[C@@H]1C...,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '11.68', 'le': '0.22', 'lle': '1.78', ...",CHEMBL4565226,,CHEMBL4565226,4.48,False,http://www.openphacts.org/units/Nanomolar,3257930,=,1,True,=,,IC50,nM,,33113.11,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,pIC50,,UO_0000065,,4.48
10139,,19482218,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,Nc1nc2ccccc2cc1CCC(=O)N1CC[C@H]2OCCC[C@@]2(Cc2...,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '9.47', 'le': '0.17', 'lle': '-0.32', ...",CHEMBL4520156,,CHEMBL4520156,4.07,False,http://www.openphacts.org/units/Nanomolar,3257925,=,1,True,=,,IC50,nM,,85113.8,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,pIC50,,UO_0000065,,4.07
10140,,19482219,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,Nc1nc2ccccc2cc1CCC(=O)NC[C@@]12CCCO[C@@H]1CCOC2,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '12.29', 'le': '0.23', 'lle': '2.09', ...",CHEMBL4585673,,CHEMBL4585673,4.54,False,http://www.openphacts.org/units/Nanomolar,3257921,=,1,True,=,,IC50,nM,,28840.32,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,pIC50,,UO_0000065,,4.54
10141,,19482220,[],CHEMBL4480749,Inhibition of human BACE1 (1 to 460 residues) ...,B,,,BAO_0000190,BAO_0000219,cell-based format,COc1ccc2c(c1)[C@@H](O)[C@@]1(CCN(C(=O)CCc3cc4c...,,,CHEMBL4480382,MedChemComm,2019,"{'bei': '10.20', 'le': '0.19', 'lle': '0.99', ...",CHEMBL4546115,,CHEMBL4546115,4.26,False,http://www.openphacts.org/units/Nanomolar,3257920,=,1,True,=,,IC50,nM,,54954.09,CHEMBL4822,Homo sapiens,Beta-secretase 1,9606,,,pIC50,,UO_0000065,,4.26


## **Data pre-processing of the bioactivity data**

### **Combine the 3 columns (molecule_chembl_id,canonical_smiles,standard_value) and bioactivity_class into a DataFrame**

In [None]:
selection = ['molecule_chembl_id','canonical_smiles','standard_value']
df3 = df2_nr[selection]
df3

Unnamed: 0,molecule_chembl_id,canonical_smiles,standard_value
0,CHEMBL406146,CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CCC(=O...,413.0
1,CHEMBL78946,CC(C)C[C@H](NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N...,2.0
2,CHEMBL324109,CCC(C)C[C@H](NC(=O)[C@H](CC(C)C)NC(C)=O)[C@@H]...,460.0
3,CHEMBL114147,CC(=O)NCC(=O)N[C@@H](Cc1ccccc1)[C@@H](O)CC(=O)...,9000.0
4,CHEMBL419949,CC(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1...,5600.0
...,...,...,...
10138,CHEMBL4565226,CC(Cc1cc2ccccc2nc1N)C(=O)NC[C@@]12CCCO[C@@H]1C...,33113.11
10139,CHEMBL4520156,Nc1nc2ccccc2cc1CCC(=O)N1CC[C@H]2OCCC[C@@]2(Cc2...,85113.8
10140,CHEMBL4585673,Nc1nc2ccccc2cc1CCC(=O)NC[C@@]12CCCO[C@@H]1CCOC2,28840.32
10141,CHEMBL4546115,COc1ccc2c(c1)[C@@H](O)[C@@]1(CCN(C(=O)CCc3cc4c...,54954.09


Saves dataframe to CSV file

In [None]:
df3.to_csv('BACE1_02_bioactivity_data_preprocessed.csv', index=False)

### **Labeling compounds as either being active, inactive or intermediate**
The bioactivity data is in the IC50 unit. Compounds having values of less than 1000 nM will be considered to be **active** while those greater than 10,000 nM will be considered to be **inactive**. As for those values in between 1,000 and 10,000 nM will be referred to as **intermediate**. 

In [None]:
df4 = pd.read_csv('BACE1_02_bioactivity_data_preprocessed.csv')

In [None]:
bioactivity_threshold = []
for i in df4.standard_value:
  if float(i) >= 10000:
    bioactivity_threshold.append("inactive")
  elif float(i) <= 1000:
    bioactivity_threshold.append("active")
  else:
    bioactivity_threshold.append("intermediate")

In [None]:
bioactivity_class = pd.Series(bioactivity_threshold, name='class')
df5 = pd.concat([df4, bioactivity_class], axis=1)
df5

Unnamed: 0,molecule_chembl_id,canonical_smiles,standard_value,class
0,CHEMBL406146,CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@@H](N)CCC(=O...,413.00,active
1,CHEMBL78946,CC(C)C[C@H](NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N...,2.00,active
2,CHEMBL324109,CCC(C)C[C@H](NC(=O)[C@H](CC(C)C)NC(C)=O)[C@@H]...,460.00,active
3,CHEMBL114147,CC(=O)NCC(=O)N[C@@H](Cc1ccccc1)[C@@H](O)CC(=O)...,9000.00,intermediate
4,CHEMBL419949,CC(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](Cc1ccccc1...,5600.00,intermediate
...,...,...,...,...
7057,CHEMBL4565226,CC(Cc1cc2ccccc2nc1N)C(=O)NC[C@@]12CCCO[C@@H]1C...,33113.11,inactive
7058,CHEMBL4520156,Nc1nc2ccccc2cc1CCC(=O)N1CC[C@H]2OCCC[C@@]2(Cc2...,85113.80,inactive
7059,CHEMBL4585673,Nc1nc2ccccc2cc1CCC(=O)NC[C@@]12CCCO[C@@H]1CCOC2,28840.32,inactive
7060,CHEMBL4546115,COc1ccc2c(c1)[C@@H](O)[C@@]1(CCN(C(=O)CCc3cc4c...,54954.09,inactive


Saves dataframe to CSV file

In [None]:
df5.to_csv('BACE1_03_bioactivity_data_curated.csv', index=False)

In [None]:
! zip BACE1.zip *.csv

  adding: BACE1_01_bioactivity_data_raw.csv (deflated 94%)
  adding: BACE1_02_bioactivity_data_preprocessed.csv (deflated 84%)
  adding: BACE1_03_bioactivity_data_curated.csv (deflated 85%)


In [None]:
! ls -l

total 988
-rw-r--r-- 1 root root 754885 Jun 21 00:49 BetaAmyloid_01_bioactivity_data_raw.csv
-rw-r--r-- 1 root root  72657 Jun 21 00:49 BetaAmyloid_02_bioactivity_data_preprocessed.csv
-rw-r--r-- 1 root root  81186 Jun 21 00:49 BetaAmyloid_03_bioactivity_data_curated.csv
-rw-r--r-- 1 root root  90936 Jun 21 00:49 BetaAmyloid.zip
drwxr-xr-x 1 root root   4096 Jun 15 13:37 sample_data


---