# ChEMBL Data Acquisition

Some notes on the theory:

 - IC50 is used for *inhibitors*
 - EC50 is used for *stimulators*

Confirmed [here](https://en.wikipedia.org/wiki/EC50).

In this tutorial, we will:

1. Connect to the ChEMBL database.
2. Gather *target* data. That is, data regarding a particular protein that we would like to activate or deactivate.
3. Gather *bioactivity* and *compound* data, and merge these into a single table describing the properties and bioactivities of various compounds.

### 1. Connect to the ChEMBL database

First, we set up a new client for connecting to ChEMBL.

In [1]:
from chembl_webresource_client.new_client import new_client

targets_api = new_client.target
compounds_api = new_client.molecule
bioactivities_api = new_client.activity

Now, we define a variable to store the UniProt ID for the target protein of interest: EGFR kinase (UniProt ID P00533, see [here](https://www.uniprot.org/uniprot/P00533)).

In [3]:
uniprot_id = "P00533"

Next, we search ChEMBL for this target.
The `.only()` command here narrows the table returned from ChEMBL to only include certain collumns.

In [4]:
targets = targets_api.get(target_components__accession=uniprot_id).only(
    "target_chembl_id", "organism", "pref_name", "target_type"
)

Now, we convert the values returned from ChEMBL into a pandas dataframe.

In [6]:
import pandas as pd

targets = pd.DataFrame.from_records(targets)
targets

Unnamed: 0,organism,pref_name,target_chembl_id,target_type
0,Homo sapiens,Epidermal growth factor receptor erbB1,CHEMBL203,SINGLE PROTEIN
1,Homo sapiens,Epidermal growth factor receptor erbB1,CHEMBL203,SINGLE PROTEIN
2,Homo sapiens,Epidermal growth factor receptor and ErbB2 (HE...,CHEMBL2111431,PROTEIN FAMILY
3,Homo sapiens,Epidermal growth factor receptor,CHEMBL2363049,PROTEIN FAMILY
4,Homo sapiens,MER intracellular domain/EGFR extracellular do...,CHEMBL3137284,CHIMERIC PROTEIN
5,Homo sapiens,Protein cereblon/Epidermal growth factor receptor,CHEMBL4523680,PROTEIN-PROTEIN INTERACTION
6,Homo sapiens,EGFR/PPP1CA,CHEMBL4523747,PROTEIN-PROTEIN INTERACTION
7,Homo sapiens,VHL/EGFR,CHEMBL4523998,PROTEIN-PROTEIN INTERACTION
