# **DRUG DISCOVERY Using Machine Learning**

# **Installing libraries**

*Installing the ChEMBL web service package so that we can retrieve bioactivity data from the ChEMBL Database.*

---



In [None]:
! pip install chembl_webresource_client

Collecting chembl_webresource_client
  Downloading chembl_webresource_client-0.10.9-py3-none-any.whl.metadata (1.4 kB)
Collecting requests-cache~=1.2 (from chembl_webresource_client)
  Downloading requests_cache-1.2.1-py3-none-any.whl.metadata (9.9 kB)
Collecting cattrs>=22.2 (from requests-cache~=1.2->chembl_webresource_client)
  Downloading cattrs-24.1.2-py3-none-any.whl.metadata (8.4 kB)
Collecting url-normalize>=1.4 (from requests-cache~=1.2->chembl_webresource_client)
  Downloading url_normalize-1.4.3-py2.py3-none-any.whl.metadata (3.1 kB)
Downloading chembl_webresource_client-0.10.9-py3-none-any.whl (55 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.2/55.2 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading requests_cache-1.2.1-py3-none-any.whl (61 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.4/61.4 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading cattrs-24.1.2-py3-none-any.whl (66 kB)
[2K   [90m━━━━━━━━━━



---

*Running this command installs the chembl_webresource_client library so that you can use it in your Python programs to access and retrieve data from the ChEMBL database, which is helpful for projects like drug discovery or analyzing bioactivity data.*

---



## **Importing libraries**

In [None]:
# Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client

*we are importing pandas libraray to achieve the goal. and we are importing new_client objects from the (chembl_websource_client.new_client) module.*

*new_client is an interface for interacting with the ChEMBL database, which contains bioactivity and drug-like molecule data. Using this, we can easily query the database for targets, compounds, or activities without having to directly interact with the underlying API.*

# **Search for Target protein**

### **Target search for coronavirus**

In [None]:
# Target search for coronavirus
target = new_client.target # Accessing the target endpoint in the ChEMBL client. This endpoint provides functionality to query and retrieve data about biological targets.
target_query = target.search('coronavirus') # Performs a search for targets related to coronavirus in the ChEMBL database. Here coronavirus is the search keyword
targets = pd.DataFrame.from_dict(target_query) # Convert the results into a Pandas DataFrame for easier analysis.
targets # Display the DataFrame

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Coronavirus,Coronavirus,17.0,False,CHEMBL613732,[],ORGANISM,11119
1,[],Feline coronavirus,Feline coronavirus,14.0,False,CHEMBL612744,[],ORGANISM,12663
2,[],Murine coronavirus,Murine coronavirus,14.0,False,CHEMBL5209664,[],ORGANISM,694005
3,[],Canine coronavirus,Canine coronavirus,14.0,False,CHEMBL5291668,[],ORGANISM,11153
4,[],Human coronavirus 229E,Human coronavirus 229E,13.0,False,CHEMBL613837,[],ORGANISM,11137
5,[],Human coronavirus OC43,Human coronavirus OC43,13.0,False,CHEMBL5209665,[],ORGANISM,31631
6,"[{'xref_id': 'P0C6U8', 'xref_name': None, 'xre...",SARS coronavirus,SARS coronavirus 3C-like proteinase,10.0,False,CHEMBL3927,"[{'accession': 'P0C6U8', 'component_descriptio...",SINGLE PROTEIN,227859
7,[],Middle East respiratory syndrome-related coron...,Middle East respiratory syndrome-related coron...,9.0,False,CHEMBL4296578,[],ORGANISM,1335626
8,"[{'xref_id': 'P0C6X7', 'xref_name': None, 'xre...",SARS coronavirus,Replicase polyprotein 1ab,4.0,False,CHEMBL5118,"[{'accession': 'P0C6X7', 'component_descriptio...",SINGLE PROTEIN,227859
9,[],Severe acute respiratory syndrome coronavirus 2,Replicase polyprotein 1ab,4.0,False,CHEMBL4523582,"[{'accession': 'P0DTD1', 'component_descriptio...",SINGLE PROTEIN,2697049


Here target refers to target protein and target organsism.

# **Selecting and retrieve bioactivity data for SARS coronavirus 3C-like proteinase (seventh entry at sixth index)**


*We will assign the seventh entry (which corresponds to the target protein, SARS coronavirus 3C-like proteinase) to the selected_target variable.*

*we use a single-protein target for which we will retrieve bioactivity data. To proceed we need to retrieve and store the target_chembl_id which is the unique identifier for this protein, so we don't have to go around calling it "single-protein target.*

In [None]:
selected_target = targets.target_chembl_id[6] # Selecting the 7th entry (index 6).
selected_target # Display the details of the chosen target.

'CHEMBL3927'

### **We are retreiving the bioactivity data for coronavirus 3C-like proteinase(CHEMBL3927) With IC50 Code.**  
**Retrieving and filtering bioactivity data for a specific target protein from the ChEMBL database, focusing only on activities of type "IC50".**

*The main reason to filter the standard_type="IC50" is because IC50 stands for the "half-maximal inhibitory concentration," a standard measure of a compound's potency in inhibiting a specific biological or biochemical function and ensuring that all bioactivity data retrieved is measured in a consistent and comparable unit, typically in nanomolars (nM) because The ChEMBL database contains bioactivity data measured using various metrics like Ki, Kd, EC50, etc., each with its own significance and units.*

In [None]:
activity = new_client.activity # query bioactivity data related to specific targets or compounds.
res = activity.filter(target_chembl_id=selected_target).filter(standard_type="IC50") # Filters the data to include only entries where the standard_type is "IC50",

In [None]:
df = pd.DataFrame.from_dict(res) #  Converts the data stored in the dictionary-like structure res into a Pandas DataFrame.
#df
df.head(3)

Unnamed: 0,action_type,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,...,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,,1480935,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,7.2
1,,,1480936,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,9.4
2,,,1481061,[],CHEMBL830868,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,...,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,13.5


we cannot see all the columns in a DataFrame, it is typically because Pandas truncates the display to fit the terminal or notebook width. To view all columns, we can adjust Pandas' display options.

**display.max_columns:** Ensures all columns are shown, regardless of their count.
**display.width:** Dynamically adjusts the width of the DataFrame display.

In [None]:
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)  # Removes truncation based on width
#df
df.head(3)


Unnamed: 0,action_type,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,,1480935,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,single protein format,Cc1noc(C)c1CN1C(=O)C(=O)c2cc(C#N)ccc21,,,CHEMBL1139624,Bioorg Med Chem Lett,2005,"{'bei': '18.28', 'le': '0.33', 'lle': '3.25', ...",CHEMBL187579,,CHEMBL187579,5.14,0,http://www.openphacts.org/units/Nanomolar,384103,=,1,1,=,,IC50,nM,,7200.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,7.2
1,,,1480936,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,single protein format,O=C1C(=O)N(Cc2ccc(F)cc2Cl)c2ccc(I)cc21,,,CHEMBL1139624,Bioorg Med Chem Lett,2005,"{'bei': '12.10', 'le': '0.33', 'lle': '1.22', ...",CHEMBL188487,,CHEMBL188487,5.03,0,http://www.openphacts.org/units/Nanomolar,383984,=,1,1,=,,IC50,nM,,9400.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,9.4
2,,,1481061,[],CHEMBL830868,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,single protein format,O=C1C(=O)N(CC2COc3ccccc3O2)c2ccc(I)cc21,,,CHEMBL1139624,Bioorg Med Chem Lett,2005,"{'bei': '11.56', 'le': '0.29', 'lle': '2.21', ...",CHEMBL185698,,CHEMBL185698,4.87,0,http://www.openphacts.org/units/Nanomolar,384106,=,1,1,=,,IC50,nM,,13500.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,13.5


*standard_value is the potency of the drug and number represent potency. lower the number th ebetter potency of drug and higher the drug means worst is the potency and we want to be low as possible.*

*So, basically lower the concentration the better it is and if higher is the number it means we requie the higher concentration of the drug to get the same result in order to achieve the same inhibition at 50 percent.*

# **Exporting the dataframe data to CSV**

we are saving this bioactivity data into a csv as bioactivity_data.csv.

In [None]:
df.to_csv('bioactivity_data.csv', index=False)

## **Using Google Colab with Google Drive Integration for Copying files to Google Drive.**

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True

MessageError: Error: credential propagation was unsuccessful

# **Stage 2: Pre-processing our data**

*It involves preparing raw data for further analysis, ensuring it is clean, consistent, and ready for modeling or visualization.*

*Dataset will be cleaner and ready for visualization or analysis. we can directly identify compounds with promising bioactivity.*

### **2.1. Handling Missing Data and Invalid data**

***We are going to drop the missing standard_value and canonical_smiles.***

In [None]:
df = pd.read_csv('bioactivity_data.csv')  # Load the data
df2 = df[df.standard_value.notna()]  # Filter rows where 'standard_value' is not NaN
df2 = df2[df2.canonical_smiles.notna()]  # Filter rows where 'canonical_smiles' is not NaN in df2
df2  # View the resulting filtered DataFrame


Unnamed: 0,action_type,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,bao_label,canonical_smiles,data_validity_comment,data_validity_description,document_chembl_id,document_journal,document_year,ligand_efficiency,molecule_chembl_id,molecule_pref_name,parent_molecule_chembl_id,pchembl_value,potential_duplicate,qudt_units,record_id,relation,src_id,standard_flag,standard_relation,standard_text_value,standard_type,standard_units,standard_upper_value,standard_value,target_chembl_id,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,,1480935,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,single protein format,Cc1noc(C)c1CN1C(=O)C(=O)c2cc(C#N)ccc21,,,CHEMBL1139624,Bioorg Med Chem Lett,2005,"{'bei': '18.28', 'le': '0.33', 'lle': '3.25', ...",CHEMBL187579,,CHEMBL187579,5.14,0,http://www.openphacts.org/units/Nanomolar,384103,=,1,1,=,,IC50,nM,,7200.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,7.20
1,,,1480936,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,single protein format,O=C1C(=O)N(Cc2ccc(F)cc2Cl)c2ccc(I)cc21,,,CHEMBL1139624,Bioorg Med Chem Lett,2005,"{'bei': '12.10', 'le': '0.33', 'lle': '1.22', ...",CHEMBL188487,,CHEMBL188487,5.03,0,http://www.openphacts.org/units/Nanomolar,383984,=,1,1,=,,IC50,nM,,9400.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,9.40
2,,,1481061,[],CHEMBL830868,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,single protein format,O=C1C(=O)N(CC2COc3ccccc3O2)c2ccc(I)cc21,,,CHEMBL1139624,Bioorg Med Chem Lett,2005,"{'bei': '11.56', 'le': '0.29', 'lle': '2.21', ...",CHEMBL185698,,CHEMBL185698,4.87,0,http://www.openphacts.org/units/Nanomolar,384106,=,1,1,=,,IC50,nM,,13500.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,13.50
3,,,1481065,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,single protein format,O=C1C(=O)N(Cc2cc3ccccc3s2)c2ccccc21,,,CHEMBL1139624,Bioorg Med Chem Lett,2005,"{'bei': '16.64', 'le': '0.32', 'lle': '1.25', ...",CHEMBL426082,,CHEMBL426082,4.88,0,http://www.openphacts.org/units/Nanomolar,384075,=,1,1,=,,IC50,nM,,13110.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,13.11
4,,,1481066,[],CHEMBL829584,In vitro inhibitory concentration against SARS...,B,,,BAO_0000190,BAO_0000357,single protein format,O=C1C(=O)N(Cc2cc3ccccc3s2)c2c1cccc2[N+](=O)[O-],,,CHEMBL1139624,Bioorg Med Chem Lett,2005,"{'bei': '16.84', 'le': '0.32', 'lle': '2.16', ...",CHEMBL187717,,CHEMBL187717,5.70,0,http://www.openphacts.org/units/Nanomolar,384234,=,1,1,=,,IC50,nM,,2000.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,2.00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
128,,,12041507,[],CHEMBL2150313,Inhibition of SARS-CoV PLpro expressed in Esch...,B,,,BAO_0000190,BAO_0000019,assay format,COC(=O)[C@@]1(C)CCCc2c1ccc1c2C(=O)C(=O)c2c(C)c...,,,CHEMBL2146458,Bioorg Med Chem,2012,"{'bei': '14.70', 'le': '0.27', 'lle': '1.57', ...",CHEMBL2146517,METHYL TANSHINONATE,CHEMBL2146517,4.97,0,http://www.openphacts.org/units/Nanomolar,1727226,=,1,1,=,,IC50,nM,,10600.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,10.60
129,,,12041508,[],CHEMBL2150313,Inhibition of SARS-CoV PLpro expressed in Esch...,B,,,BAO_0000190,BAO_0000019,assay format,C[C@H]1COC2=C1C(=O)C(=O)c1c2ccc2c1CCCC2(C)C,,,CHEMBL2146458,Bioorg Med Chem,2012,"{'bei': '16.86', 'le': '0.31', 'lle': '1.56', ...",CHEMBL187460,CRYPTOTANSHINONE,CHEMBL187460,5.00,0,http://www.openphacts.org/units/Nanomolar,1727227,=,1,1,=,,IC50,nM,,10100.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,10.10
130,,,12041509,[],CHEMBL2150313,Inhibition of SARS-CoV PLpro expressed in Esch...,B,,,BAO_0000190,BAO_0000019,assay format,Cc1coc2c1C(=O)C(=O)c1c-2ccc2c(C)cccc12,,,CHEMBL2146458,Bioorg Med Chem,2012,"{'bei': '17.88', 'le': '0.32', 'lle': '0.84', ...",CHEMBL363535,TANSHINONE I,CHEMBL363535,4.94,0,http://www.openphacts.org/units/Nanomolar,1727228,=,1,1,=,,IC50,nM,,11500.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,11.50
131,,,12041510,[],CHEMBL2150313,Inhibition of SARS-CoV PLpro expressed in Esch...,B,,,BAO_0000190,BAO_0000019,assay format,Cc1cccc2c3c(ccc12)C1=C(C(=O)C3=O)[C@@H](C)CO1,,,CHEMBL2146458,Bioorg Med Chem,2012,"{'bei': '17.86', 'le': '0.32', 'lle': '1.68', ...",CHEMBL227075,DIHYDROTANSHINONE I,CHEMBL227075,4.97,0,http://www.openphacts.org/units/Nanomolar,1727229,=,1,1,=,,IC50,nM,,10700.0,CHEMBL3927,SARS coronavirus,SARS coronavirus 3C-like proteinase,227859,,,IC50,uM,UO_0000065,,10.70


Now we are checking whether any missing data has been dropped in standard_value and canonical_smiles.
we are checking the count of Missing Values.

In [None]:
# Before dropping
print("Rows before dropping:", len(df))

# After dropping
df = df.dropna(subset=['standard_value', 'canonical_smiles'])
print("Rows after dropping:", len(df))

Rows before dropping: 133
Rows after dropping: 133


*Here the Row count doesn't decreases and the same output confirms there was no missing data in either of the standard_value and canonical_smiles value. or we can check the rows dropped as done below.*

In [None]:
initial_rows = len(df)
df = df.dropna(subset=['standard_value', 'canonical_smiles'])
final_rows = len(df)

print(f"Rows dropped: {initial_rows - final_rows}")

Rows dropped: 0


*The below code will see check whether there are any missing values left in these columns*

In [None]:
print(df[['standard_value', 'canonical_smiles']].isnull().sum())

standard_value      0
canonical_smiles    0
dtype: int64


## **2.2 : Labeling compounds as either being active, inactive or intermediate**

We will label the data based on their standard value which is in the IC50 unit. Compounds having values of less than 1000 nM will be considered to be active while those greater than 10,000 nM will be considered to be inactive. As for those values in between 1,000 and 10,000 nM will be referred to as intermediate.

*We are doing this for simplifying the process for further annalysis, easier comparison and trend analysis. Researchers can prioritize "active" compounds for deeper investigation, as they show stronger bioactivity.Active compounds (IC50 < 1000 nM) are often the most promising candidates for further development, so categorizing compounds this way helps streamline the drug discovery pipeline. Inactive compounds (IC50 > 10,000 nM) could be discarded early, saving time and resources.*

In [None]:
bioactivity_class = []
for i in df2.standard_value:
  if float(i) >= 10000:
    bioactivity_class.append("inactive")
  elif float(i) <= 1000:
    bioactivity_class.append("active")
  else:
    bioactivity_class.append("intermediate")

## **2.3 : Bringing all necessary columns at one place**

Combining the 3 columns (molecule_chembl_id,canonical_smiles,standard_value) and bioactivity_class into a DataFrame. It is quite hard to check all 46 columns of the table, so we will need to combine the columns we are studying, into one DataFrame.

In [None]:
selection = ['molecule_chembl_id', 'canonical_smiles', 'standard_value']
df3 = df2[selection]
df3

Unnamed: 0,molecule_chembl_id,canonical_smiles,standard_value
0,CHEMBL187579,Cc1noc(C)c1CN1C(=O)C(=O)c2cc(C#N)ccc21,7200.0
1,CHEMBL188487,O=C1C(=O)N(Cc2ccc(F)cc2Cl)c2ccc(I)cc21,9400.0
2,CHEMBL185698,O=C1C(=O)N(CC2COc3ccccc3O2)c2ccc(I)cc21,13500.0
3,CHEMBL426082,O=C1C(=O)N(Cc2cc3ccccc3s2)c2ccccc21,13110.0
4,CHEMBL187717,O=C1C(=O)N(Cc2cc3ccccc3s2)c2c1cccc2[N+](=O)[O-],2000.0
...,...,...,...
128,CHEMBL2146517,COC(=O)[C@@]1(C)CCCc2c1ccc1c2C(=O)C(=O)c2c(C)c...,10600.0
129,CHEMBL187460,C[C@H]1COC2=C1C(=O)C(=O)c1c2ccc2c1CCCC2(C)C,10100.0
130,CHEMBL363535,Cc1coc2c1C(=O)C(=O)c1c-2ccc2c(C)cccc12,11500.0
131,CHEMBL227075,Cc1cccc2c3c(ccc12)C1=C(C(=O)C3=O)[C@@H](C)CO1,10700.0


In [None]:
pd.concat([df3,pd.Series(bioactivity_class)], axis=1) # Adds the bioactivity_class as a new column to the df3 DataFrame.

Unnamed: 0,molecule_chembl_id,canonical_smiles,standard_value,0
0,CHEMBL187579,Cc1noc(C)c1CN1C(=O)C(=O)c2cc(C#N)ccc21,7200.0,intermediate
1,CHEMBL188487,O=C1C(=O)N(Cc2ccc(F)cc2Cl)c2ccc(I)cc21,9400.0,intermediate
2,CHEMBL185698,O=C1C(=O)N(CC2COc3ccccc3O2)c2ccc(I)cc21,13500.0,inactive
3,CHEMBL426082,O=C1C(=O)N(Cc2cc3ccccc3s2)c2ccccc21,13110.0,inactive
4,CHEMBL187717,O=C1C(=O)N(Cc2cc3ccccc3s2)c2c1cccc2[N+](=O)[O-],2000.0,intermediate
...,...,...,...,...
128,CHEMBL2146517,COC(=O)[C@@]1(C)CCCc2c1ccc1c2C(=O)C(=O)c2c(C)c...,10600.0,inactive
129,CHEMBL187460,C[C@H]1COC2=C1C(=O)C(=O)c1c2ccc2c1CCCC2(C)C,10100.0,inactive
130,CHEMBL363535,Cc1coc2c1C(=O)C(=O)c1c-2ccc2c(C)cccc12,11500.0,inactive
131,CHEMBL227075,Cc1cccc2c3c(ccc12)C1=C(C(=O)C3=O)[C@@H](C)CO1,10700.0,inactive


*The resulting DataFrame will have all the original columns from df3 plus an additional column corresponding to the values in bioactivity_class.*


## **Saving pre-processed dataframe to CSV file**

In [None]:
df3.to_csv('bioactivity_preprocessed_data.csv', index=False)