<a href="https://colab.research.google.com/github/TanmayeeKolli/Drug-Discovery-Model-Ovarian-Cancer/blob/main/Drug_Discovery_for_Ovarian_Cancer_Part_1_bioactivity_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Computational Drug Discovery [Part 1] Downloading Bioactivity Data**
*Tanmayee Kolli*

#In this project, I collected data from a database, cleaned and processed it, and created a machine learning model via Random Forest Regression to find optimal drug candidates for ovarian cancer. I used my background in Python and online resources to analyze my own dataset of compounds.

Reference: [*'Data Professor' YouTube channel*](http://youtube.com/dataprofessor)

In Part 1, I obtained bioactivity data for the targets of our protein of interest.

In Part 2, I computed molecular descriptors of the compounds that can be used to compare the compounds to each other and eliminate subpar compounds

In Part 3, I prepared my dataset of molecular descriptors

In Part 4, I built my machine learning model using the Random Forest algorithm

In Part 5, I compared several machine learning algorithms to find the best model

---

## **ChEMBL Database**

To obtain molecular data, I used the [*ChEMBL Database*](https://www.ebi.ac.uk/chembl/), which has bioactivity data for many compounds.

## **Installing libraries**

I installed the ChEMBL web service package so that I can retrieve bioactivity data from the ChEMBL Database.

In [1]:
! pip install chembl_webresource_client

Collecting chembl_webresource_client
  Downloading chembl_webresource_client-0.10.9-py3-none-any.whl.metadata (1.4 kB)
Collecting requests-cache~=1.2 (from chembl_webresource_client)
  Downloading requests_cache-1.2.1-py3-none-any.whl.metadata (9.9 kB)
Collecting cattrs>=22.2 (from requests-cache~=1.2->chembl_webresource_client)
  Downloading cattrs-24.1.1-py3-none-any.whl.metadata (8.4 kB)
Collecting url-normalize>=1.4 (from requests-cache~=1.2->chembl_webresource_client)
  Downloading url_normalize-1.4.3-py2.py3-none-any.whl.metadata (3.1 kB)
Downloading chembl_webresource_client-0.10.9-py3-none-any.whl (55 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.2/55.2 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading requests_cache-1.2.1-py3-none-any.whl (61 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.4/61.4 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading cattrs-24.1.1-py3-none-any.whl (66 kB)
[2K   [90m━━━━━━━━━━

## **Importing libraries**

In [2]:
# Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client

## **Search for Target protein**

### **Target search for PARP**
PARP is a protein that helps repair damaged DNA. PARP inhibitors are a form of therapy that stop PARP from repairing
cancer cells. We can use the Chembl database to find targets for PARP that can be used as potential drug therapies for inhibition

In [29]:
# Target search for PARP targets
target = new_client.target
target_query = target.search('PARP')
targets = pd.DataFrame.from_dict(target_query) #creating a data frame from our results after searching for 'PARP'
targets

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Homo sapiens,"PARP 1, 2 and 3",12.0,False,CHEMBL3390820,"[{'accession': 'P09874', 'component_descriptio...",PROTEIN FAMILY,9606
1,"[{'xref_id': 'PARP4', 'xref_name': None, 'xref...",Homo sapiens,Poly [ADP-ribose] polymerase 4,11.0,False,CHEMBL6142,"[{'accession': 'Q9UKK3', 'component_descriptio...",SINGLE PROTEIN,9606
2,"[{'xref_id': 'P18493', 'xref_name': None, 'xre...",Bos taurus,Poly [ADP-ribose] polymerase 1,10.0,False,CHEMBL5691,"[{'accession': 'P18493', 'component_descriptio...",SINGLE PROTEIN,9913
3,[],Cricetulus griseus,Poly [ADP-ribose] polymerase 1,10.0,False,CHEMBL2321638,"[{'accession': 'Q9R152', 'component_descriptio...",SINGLE PROTEIN,10029
4,[],Homo sapiens,Poly [ADP-ribose] polymerase 6,10.0,False,CHEMBL2380187,"[{'accession': 'Q2NL67', 'component_descriptio...",SINGLE PROTEIN,9606
5,[],Homo sapiens,TCDD-inducible poly [ADP-ribose] polymerase,10.0,False,CHEMBL2380188,"[{'accession': 'Q7Z3E1', 'component_descriptio...",SINGLE PROTEIN,9606
6,[],Homo sapiens,Poly [ADP-ribose] polymerase 11,10.0,False,CHEMBL2380189,"[{'accession': 'Q9NR21', 'component_descriptio...",SINGLE PROTEIN,9606
7,[],Homo sapiens,Poly [ADP-ribose] polymerase 10,10.0,False,CHEMBL2429708,"[{'accession': 'Q53GL7', 'component_descriptio...",SINGLE PROTEIN,9606
8,[],Homo sapiens,Poly [ADP-ribose] polymerase 12,10.0,False,CHEMBL2429709,"[{'accession': 'Q9H0J9', 'component_descriptio...",SINGLE PROTEIN,9606
9,[],Homo sapiens,Poly [ADP-ribose] polymerase 8,10.0,False,CHEMBL3091262,"[{'accession': 'Q8N3A8', 'component_descriptio...",SINGLE PROTEIN,9606


### **Select and retrieve bioactivity data for *Poly [ADP-ribose] polymerase 2* (22nd entry)**
Poly [ADP-ribose] polymerase (PARP) 2 plays a key role in DNA repair

I assigned the 22nd entry (which corresponds to the target protein, *PARP 2* ) to the ***selected_target*** variable

In [4]:
selected_target = targets.target_chembl_id[22]
selected_target

'CHEMBL5366'

Here, I retrieved bioactivity data for *PARP 2* (CHEMBL5366) which is reported as IC$_{50}$ values in nM (nanomolar) unit.

When defining the variable 'res' below, I find the activity for the selected_target, which is the PARP 2. I filtered for standard types of IC50,which stands for inhibitory concentrations at 50%. This means that 50% of the target protein, PARP, will be inhibited at a certain concentration.

In [30]:
activity = new_client.activity
res = activity.filter(target_chembl_id=selected_target).filter(standard_type="IC50") #Finding the activity for the selected_target, which is the PARP 2. We are filtering for standard types of IC50,
#which stands for inhibitory concentrations at 50%. This means that 50% of the target protein, PARP, will be inhibited at a certain concentration
pd.options.display.max_columns = None
pd.options.display.max_rows = None



In [6]:
df = pd.DataFrame.from_dict(res)

In [32]:
df.head(5)

Unnamed: 0,action_type,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,,2370214,[],CHEMBL982812,Inhibition of recombinant PARP2 by flashplate ...,B,,,BAO_0000190,BAO_0000357,single protein format,O=C(c1cc(Cc2n[nH]c(=O)c3ccccc23)ccc1F)N1CCN(C(...,,,CHEMBL1140140,J Med Chem,2008,"{'bei': '20.71', 'le': '0.38', 'lle': '6.65', ...",CHEMBL521686,OLAPARIB,CHEMBL521686,9.0,0,http://www.openphacts.org/units/Nanomolar,746511,=,1,1,=,,IC50,nM,,1.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,uM,UO_0000065,,0.001
1,,,2709345,[],CHEMBL1060806,Inhibition of human PARP2 expressed in baculov...,B,,,BAO_0000190,BAO_0000019,assay format,O=C1NCCc2c1ccc1[nH]cc(CCNC(=O)N3CCNCC3)c21,,,CHEMBL1153049,Bioorg Med Chem Lett,2009,"{'bei': '21.95', 'le': '0.41', 'lle': '6.88', ...",CHEMBL558845,,CHEMBL558845,7.5,0,http://www.openphacts.org/units/Nanomolar,826696,=,1,1,=,,IC50,nM,,32.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,32.0
2,,,2709346,[],CHEMBL1060806,Inhibition of human PARP2 expressed in baculov...,B,,,BAO_0000190,BAO_0000019,assay format,O=C1NCCc2c1ccc1[nH]cc(CCNC(=O)N3CCCNCC3)c21,,,CHEMBL1153049,Bioorg Med Chem Lett,2009,"{'bei': '21.86', 'le': '0.41', 'lle': '6.77', ...",CHEMBL560790,,CHEMBL560790,7.77,0,http://www.openphacts.org/units/Nanomolar,826697,=,1,1,=,,IC50,nM,,17.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,17.0
3,,,3084121,[],CHEMBL1074654,Inhibition of PARP2,B,,,BAO_0000190,BAO_0000357,single protein format,O=C1NCC2c3c(cccc31)CCN2C(=O)Cc1cccnc1,,,CHEMBL1153250,Bioorg Med Chem Lett,2010,"{'bei': '29.28', 'le': '0.53', 'lle': '7.51', ...",CHEMBL595018,,CHEMBL595018,9.0,0,http://www.openphacts.org/units/Nanomolar,873989,=,1,1,=,,IC50,nM,,1.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,1.0
4,,,3084122,[],CHEMBL1074654,Inhibition of PARP2,B,,,BAO_0000190,BAO_0000357,single protein format,O=C1NCC2c3c(cccc31)CCN2C(=O)CCc1cnccn1,,,CHEMBL1153250,Bioorg Med Chem Lett,2010,"{'bei': '27.92', 'le': '0.51', 'lle': '7.72', ...",CHEMBL609002,,CHEMBL609002,9.0,0,http://www.openphacts.org/units/Nanomolar,873992,=,1,1,=,,IC50,nM,,1.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,1.0


In [33]:
df.standard_type.unique() #this shows our only standard_type is IC50, which we filtered for above when defining res

array(['IC50'], dtype=object)

The column labeled "standard_value" represents the potency of the drug. The number represents the Inhibitory Concentration of the drug at 50%, so the smaller the value, the more potent the drug is.

Finally we will save the resulting bioactivity data to a CSV file **bioactivity_data.csv**.

In [9]:
df.to_csv('bioactivity_data.csv', index=False)

## **Copying files to Google Drive**

To organize my files, I connect google.colab to my Google Drive, allowing me to store and save files.

In [10]:
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)


Mounted at /content/gdrive/


Creating a "data" folder in Google Drive

In [11]:
! mkdir "/content/gdrive/My Drive/Colab Notebooks/data"

mkdir: cannot create directory ‘/content/gdrive/My Drive/Colab Notebooks/data’: File exists


In [12]:
! cp bioactivity_data.csv "/content/gdrive/My Drive/Colab Notebooks/data"

In [13]:
! ls -l "/content/gdrive/My Drive/Colab Notebooks/data"

-rw------- 1 root root 631490 Sep 17 03:09 '/content/gdrive/My Drive/Colab Notebooks/data'


Checking the CSV files I have saved so far.

In [14]:
! ls

bioactivity_data.csv  gdrive  sample_data


## **Handling missing data**
Dropped compounds if they had missing value for the **standard_value** column

In [16]:
df2 = df[df.standard_value.notna()]
df2

Unnamed: 0,action_type,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,,2370214,[],CHEMBL982812,Inhibition of recombinant PARP2 by flashplate ...,B,,,BAO_0000190,BAO_0000357,single protein format,O=C(c1cc(Cc2n[nH]c(=O)c3ccccc23)ccc1F)N1CCN(C(...,,,CHEMBL1140140,J Med Chem,2008,"{'bei': '20.71', 'le': '0.38', 'lle': '6.65', ...",CHEMBL521686,OLAPARIB,CHEMBL521686,9.0,0,http://www.openphacts.org/units/Nanomolar,746511,=,1,1,=,,IC50,nM,,1.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,uM,UO_0000065,,0.001
1,,,2709345,[],CHEMBL1060806,Inhibition of human PARP2 expressed in baculov...,B,,,BAO_0000190,BAO_0000019,assay format,O=C1NCCc2c1ccc1[nH]cc(CCNC(=O)N3CCNCC3)c21,,,CHEMBL1153049,Bioorg Med Chem Lett,2009,"{'bei': '21.95', 'le': '0.41', 'lle': '6.88', ...",CHEMBL558845,,CHEMBL558845,7.5,0,http://www.openphacts.org/units/Nanomolar,826696,=,1,1,=,,IC50,nM,,32.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,32.0
2,,,2709346,[],CHEMBL1060806,Inhibition of human PARP2 expressed in baculov...,B,,,BAO_0000190,BAO_0000019,assay format,O=C1NCCc2c1ccc1[nH]cc(CCNC(=O)N3CCCNCC3)c21,,,CHEMBL1153049,Bioorg Med Chem Lett,2009,"{'bei': '21.86', 'le': '0.41', 'lle': '6.77', ...",CHEMBL560790,,CHEMBL560790,7.77,0,http://www.openphacts.org/units/Nanomolar,826697,=,1,1,=,,IC50,nM,,17.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,17.0
3,,,3084121,[],CHEMBL1074654,Inhibition of PARP2,B,,,BAO_0000190,BAO_0000357,single protein format,O=C1NCC2c3c(cccc31)CCN2C(=O)Cc1cccnc1,,,CHEMBL1153250,Bioorg Med Chem Lett,2010,"{'bei': '29.28', 'le': '0.53', 'lle': '7.51', ...",CHEMBL595018,,CHEMBL595018,9.0,0,http://www.openphacts.org/units/Nanomolar,873989,=,1,1,=,,IC50,nM,,1.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,1.0
4,,,3084122,[],CHEMBL1074654,Inhibition of PARP2,B,,,BAO_0000190,BAO_0000357,single protein format,O=C1NCC2c3c(cccc31)CCN2C(=O)CCc1cnccn1,,,CHEMBL1153250,Bioorg Med Chem Lett,2010,"{'bei': '27.92', 'le': '0.51', 'lle': '7.72', ...",CHEMBL609002,,CHEMBL609002,9.0,0,http://www.openphacts.org/units/Nanomolar,873992,=,1,1,=,,IC50,nM,,1.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,1.0
5,,,3270465,[],CHEMBL1115413,Inhibition of human PARP2 by trichloroacetic a...,B,,,BAO_0000190,BAO_0000357,single protein format,NC(=O)c1cccc2cn(-c3ccc([C@@H]4CCCNC4)cc3)nc12,,,CHEMBL1157008,J Med Chem,2009,"{'bei': '27.08', 'le': '0.49', 'lle': '6.09', ...",CHEMBL1094636,NIRAPARIB,CHEMBL1094636,8.68,0,http://www.openphacts.org/units/Nanomolar,905675,=,1,1,=,,IC50,nM,,2.1,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,2.1
6,,,3367434,[],CHEMBL1176209,Inhibition of PARP2,B,,,BAO_0000190,BAO_0000357,single protein format,O=c1[nH]c(CCCN2CC=C(c3ccc(F)cc3)CC2)nc2c(Cl)cc...,,,CHEMBL1177697,J Med Chem,2010,"{'bei': '15.84', 'le': '0.31', 'lle': '1.86', ...",CHEMBL251030,,CHEMBL251030,6.3,0,http://www.openphacts.org/units/Nanomolar,924964,=,1,1,=,,IC50,nM,,500.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,500.0
7,,,3367435,[],CHEMBL1176209,Inhibition of PARP2,B,,,BAO_0000190,BAO_0000357,single protein format,N#Cc1ccc(-c2cnc3cccc(C(N)=O)c3n2)cc1,,,CHEMBL1177697,J Med Chem,2010,"{'bei': '29.52', 'le': '0.53', 'lle': '5.83', ...",CHEMBL481603,,CHEMBL481603,8.1,0,http://www.openphacts.org/units/Nanomolar,924966,=,1,1,=,,IC50,nM,,8.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,nM,UO_0000065,,8.0
8,,,10859401,[],CHEMBL2020830,Inhibition of PARP2 autoPARsylation measuring ...,B,,,BAO_0000190,BAO_0000357,single protein format,COc1ccc(-n2c(SCc3nc(-c4ccc(C)cc4)no3)nnc2-c2cc...,,,CHEMBL2016557,J Med Chem,2012,,CHEMBL1552719,,CHEMBL1552719,,0,http://www.openphacts.org/units/Nanomolar,1634061,>,1,1,>,,IC50,nM,,19000.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,uM,UO_0000065,,19.0
9,,,10859410,[],CHEMBL2020830,Inhibition of PARP2 autoPARsylation measuring ...,B,,,BAO_0000190,BAO_0000357,single protein format,Oc1nc(-c2ccc(C(F)(F)F)cc2)nc2c1CSCC2,,,CHEMBL2016557,J Med Chem,2012,"{'bei': '22.01', 'le': '0.45', 'lle': '3.21', ...",CHEMBL1086580,,CHEMBL1086580,6.87,0,http://www.openphacts.org/units/Nanomolar,1634058,=,1,1,=,,IC50,nM,,134.0,CHEMBL5366,Homo sapiens,Poly [ADP-ribose] polymerase 2,9606,,,IC50,uM,UO_0000065,,0.134


## **Data pre-processing of the bioactivity data**

### **Labeling compounds as either being active, inactive or intermediate**
The bioactivity data is in the IC50 unit. Compounds having values of less than 1000 nM will be considered to be **active** while those greater than 10,000 nM will be considered to be **inactive**. As for those values in between 1,000 and 10,000 nM will be referred to as **intermediate**.

In [17]:
bioactivity_class = []
for i in df2.standard_value:
  if float(i) >= 10000:
    bioactivity_class.append("inactive")
  elif float(i) <= 1000:
    bioactivity_class.append("active")
  else:
    bioactivity_class.append("intermediate")

### **Iterate *molecule_chembl_id* to a list**

Each of the molecules under molecule_chembl_id will have an affect on PARP, our protein of interest.


In [18]:
mol_cid = []
for i in df2.molecule_chembl_id:
  mol_cid.append(i)

mol_cid

['CHEMBL521686',
 'CHEMBL558845',
 'CHEMBL560790',
 'CHEMBL595018',
 'CHEMBL609002',
 'CHEMBL1094636',
 'CHEMBL251030',
 'CHEMBL481603',
 'CHEMBL1552719',
 'CHEMBL1086580',
 'CHEMBL1867804',
 'CHEMBL1898239',
 'CHEMBL1086580',
 'CHEMBL562310',
 'CHEMBL562310',
 'CHEMBL2314698',
 'CHEMBL1086580',
 'CHEMBL562310',
 'CHEMBL1086580',
 'CHEMBL2381633',
 'CHEMBL2381946',
 'CHEMBL2381945',
 'CHEMBL2381936',
 'CHEMBL2381958',
 'CHEMBL562310',
 'CHEMBL2419706',
 'CHEMBL2419697',
 'CHEMBL2419716',
 'CHEMBL2419712',
 'CHEMBL2419889',
 'CHEMBL1086580',
 'CHEMBL2419703',
 'CHEMBL2419702',
 'CHEMBL2419701',
 'CHEMBL2419700',
 'CHEMBL2419699',
 'CHEMBL2419698',
 'CHEMBL2419694',
 'CHEMBL2425813',
 'CHEMBL2425812',
 'CHEMBL2425810',
 'CHEMBL2425790',
 'CHEMBL1594868',
 'CHEMBL1086580',
 'CHEMBL2431805',
 'CHEMBL2431867',
 'CHEMBL16861',
 'CHEMBL64780',
 'CHEMBL2431805',
 'CHEMBL2431867',
 'CHEMBL16861',
 'CHEMBL64780',
 'CHEMBL3099718',
 'CHEMBL3099716',
 'CHEMBL3099720',
 'CHEMBL3110100',
 'CHEMBL311

### **Iterate *canonical_smiles* to a list**

These are descriptors of our compounds.

In [19]:
canonical_smiles = []
for i in df2.canonical_smiles:
  canonical_smiles.append(i)

### **Iterate *standard_value* to a list**

In [20]:
standard_value = []
for i in df2.standard_value:
  standard_value.append(i)

### **Combine the 4 lists into a dataframe**

In [21]:
data_tuples = list(zip(mol_cid, canonical_smiles, bioactivity_class, standard_value))
df3 = pd.DataFrame( data_tuples,  columns=['molecule_chembl_id', 'canonical_smiles', 'bioactivity_class', 'standard_value'])

In [22]:
df3

Unnamed: 0,molecule_chembl_id,canonical_smiles,bioactivity_class,standard_value
0,CHEMBL521686,O=C(c1cc(Cc2n[nH]c(=O)c3ccccc23)ccc1F)N1CCN(C(...,active,1.0
1,CHEMBL558845,O=C1NCCc2c1ccc1[nH]cc(CCNC(=O)N3CCNCC3)c21,active,32.0
2,CHEMBL560790,O=C1NCCc2c1ccc1[nH]cc(CCNC(=O)N3CCCNCC3)c21,active,17.0
3,CHEMBL595018,O=C1NCC2c3c(cccc31)CCN2C(=O)Cc1cccnc1,active,1.0
4,CHEMBL609002,O=C1NCC2c3c(cccc31)CCN2C(=O)CCc1cnccn1,active,1.0
5,CHEMBL1094636,NC(=O)c1cccc2cn(-c3ccc([C@@H]4CCCNC4)cc3)nc12,active,2.1
6,CHEMBL251030,O=c1[nH]c(CCCN2CC=C(c3ccc(F)cc3)CC2)nc2c(Cl)cc...,active,500.0
7,CHEMBL481603,N#Cc1ccc(-c2cnc3cccc(C(N)=O)c3n2)cc1,active,8.0
8,CHEMBL1552719,COc1ccc(-n2c(SCc3nc(-c4ccc(C)cc4)no3)nnc2-c2cc...,inactive,19000.0
9,CHEMBL1086580,Oc1nc(-c2ccc(C(F)(F)F)cc2)nc2c1CSCC2,active,134.0


Saves dataframe to CSV file

In [34]:
df3.to_csv('bioactivity_preprocessed_data.csv', index=False)

Copying to Google Drive

In [35]:
! cp bioactivity_preprocessed_data.csv "/content/gdrive/My Drive/Colab Notebooks/Drug Discovery Ovarian Cancer project/data"

In [28]:
! ls "/content/gdrive/My Drive/Colab Notebooks/Drug Discovery Ovarian Cancer project/data"

bioactivity_data.csv  bioactivity_preprocessed_data.csv


---