# ChEMBL Programatic acces



Link to ChEMBL web services API live documentation Explorer:
https://www.ebi.ac.uk/chembl/api/data/docs

Link to ChEMBL data web service:
https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services

Steps:

    1. Import the necessary libraries: We will use the requests library to make HTTP requests to the ChEMBL API.

    2. Define the base URL for the ChEMBL API: The base URL will be used to construct the specific API endpoint for querying molecule activities.

    3. Specify the ChEMBL ID of the molecule: This ID uniquely identifies the molecule in the ChEMBL database.
    
    4.  Construct the API URL: Combine the base URL with the ChEMBL ID to form the complete API URL for retrieving molecule activities.

    5. Make the API request:Use the requests.get() method to fetch the data from the ChEMBL API.

    6. Handle the API response: Check the response status code: The status code in the response tells us whether the request was successful or if there was an error. A status code of 200 indicates success (OK), while other status codes may indicate errors. If successful, parse the JSON response to extract the molecule activities information.
    
    7. Display the information

In [82]:
# libraries
import json  # lets us work with the json format
import requests  # allows Python to make web requests
import pandas as pd # analysis of tabular data

# Define the base URL of Chembl API
base_url = "https://www.ebi.ac.uk/chembl/api/data/{:s}" 
# {:s}indicate that this part of the URL should be replaced with an actual string value when making an API request.

## Example 1: Get information about a specific molecule


In [83]:
chembl_id = "CHEMBL266349"
molecule_url = base_url.format(f"molecule/{chembl_id}")

In [84]:
# https://www.ebi.ac.uk/chembl/api/data/molecule/chembl25.json

In [85]:
# Make the API request and fetch the response
response = requests.get(molecule_url)

In [86]:
# Print the content of the response
print("Response content:")
print(response.content)

Response content:


In [87]:
# Send a GET request to the API and retrieve the JSON response
response = requests.get(molecule_url, headers={"Accept": "application/json"})

# Check if the response is successful
if response.status_code == 200:
    # Convert the response json into a dictionary and after that in a data frame
    molecule_request = response.json()
else:
    print(f"Error in the request ({response.status_code}): {response.text}")  
  
molecule_request

{'atc_classifications': ['B01AE04'],
 'availability_type': -1,
 'biotherapeutic': None,
 'chebi_par_id': 43966,
 'chemical_probe': 0,
 'chirality': 1,
 'cross_references': [],
 'dosed_ingredient': True,
 'first_approval': 2004,
 'first_in_class': 0,
 'helm_notation': None,
 'indication_class': None,
 'inorganic_flag': 0,
 'max_phase': '4.0',
 'molecule_chembl_id': 'CHEMBL266349',
 'molecule_hierarchy': {'active_chembl_id': 'CHEMBL266349',
  'molecule_chembl_id': 'CHEMBL266349',
  'parent_chembl_id': 'CHEMBL266349'},
 'molecule_properties': {'alogp': '0.81',
  'aromatic_rings': 1,
  'cx_logd': '-1.73',
  'cx_logp': '-1.29',
  'cx_most_apka': '1.77',
  'cx_most_bpka': '11.48',
  'full_molformula': 'C22H31N5O4',
  'full_mwt': '429.52',
  'hba': 5,
  'hba_lipinski': 9,
  'hbd': 5,
  'hbd_lipinski': 6,
  'heavy_atoms': 31,
  'molecular_species': 'ZWITTERION',
  'mw_freebase': '429.52',
  'mw_monoisotopic': '429.2376',
  'np_likeness_score': '-0.45',
  'num_lipinski_ro5_violations': 1,
  'nu

In [88]:
#Inspect the keys in the JSON data:
print(molecule_request.keys())



In [89]:
# Send a GET request to the API and retrieve the JSON response
response = requests.get(molecule_url, headers={"Accept": "application/json"})

# Check if the response is successful
if response.status_code == 200:
    molecule_request = response.json()
    
    # Extract specific keys from the dictionary
    molecule_properties = molecule_request['molecule_properties']
    
    
    # Create a DataFrame from the extracted dictionary
    molecule_table = pd.DataFrame.from_dict(molecule_properties, orient='index', columns=['Value'])
    
else:
    print(f"Error in the request ({response.status_code}): {response.text}")
    
# Print the resulting DataFrame
molecule_table


Unnamed: 0,Value
alogp,0.81
aromatic_rings,1
cx_logd,-1.73
cx_logp,-1.29
cx_most_apka,1.77
cx_most_bpka,11.48
full_molformula,C22H31N5O4
full_mwt,429.52
hba,5
hba_lipinski,9


Review the documentation: https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services

## Example 2: Retrieve all the activities (assays) associated with a specific molecule (protein) using its ChEMBL ID ##

These assays are conducted to evaluate the biological or pharmacological activity of a molecule against a specific target. The activity measurement provides information about the molecule's ability to bind to or interact with the target, which can be a protein, enzyme, receptor, or other biomolecule.

In [90]:
# ChEMBL ID of the molecule
chembl_id = "CHEMBL206"
activity_url = base_url.format(f"activity?target_chembl_id__exact={chembl_id}")

# https://www.ebi.ac.uk/chembl/api/data/activity?target_chembl_id__exact=CHEMBL206 - check in the browser

In [91]:
response = requests.get(activity_url, headers={"Accept": "application/json"})
if response.status_code == 200:
    activity_request = response.json()
    activity_table = pd.DataFrame.from_dict(activity_request['activities'])[['molecule_chembl_id', 'type', 'standard_value', 'standard_units']]
    
else:
    print(f"Error in the request ({response.status_code}): {response.text}")
activity_table

Unnamed: 0,molecule_chembl_id,type,standard_value,standard_units
0,CHEMBL431611,IC50,2.5,nM
1,CHEMBL316132,IC50,7.5,nM
2,CHEMBL338926,RBA,0.84,
3,CHEMBL338926,RBA,8.7,
4,CHEMBL127736,RBA,75.0,
5,CHEMBL304552,IC50,3.1,nM
6,CHEMBL85881,IC50,3.9,nM
7,CHEMBL127941,RBA,0.5,
8,CHEMBL127941,RBA,,
9,CHEMBL85536,IC50,7.4,nM


In [92]:
#Inspect the keys in the JSON data:
print(activity_request.keys())

dict_keys(['activities', 'page_meta'])


In [93]:
for activity in activity_request['activities']:
    print(activity.keys())

dict_keys(['action_type', 'activity_comment', 'activity_id', 'activity_properties', 'assay_chembl_id', 'assay_description', 'assay_type', 'assay_variant_accession', 'assay_variant_mutation', 'bao_endpoint', 'bao_format', 'bao_label', 'canonical_smiles', 'data_validity_comment', 'data_validity_description', 'document_chembl_id', 'document_journal', 'document_year', 'ligand_efficiency', 'molecule_chembl_id', 'molecule_pref_name', 'parent_molecule_chembl_id', 'pchembl_value', 'potential_duplicate', 'qudt_units', 'record_id', 'relation', 'src_id', 'standard_flag', 'standard_relation', 'standard_text_value', 'standard_type', 'standard_units', 'standard_upper_value', 'standard_value', 'target_chembl_id', 'target_organism', 'target_pref_name', 'target_tax_id', 'text_value', 'toid', 'type', 'units', 'uo_units', 'upper_value', 'value'])
dict_keys(['action_type', 'activity_comment', 'activity_id', 'activity_properties', 'assay_chembl_id', 'assay_description', 'assay_type', 'assay_variant_accessi

In [94]:
#Order by value
activity_table.sort_values(['standard_value'], ascending=[True])

Unnamed: 0,molecule_chembl_id,type,standard_value,standard_units
10,CHEMBL434629,RBA,0.02,
18,CHEMBL127246,RBA,0.1,
11,CHEMBL434629,RBA,0.15,
16,CHEMBL124197,RBA,0.44,
13,CHEMBL124708,RBA,0.45,
7,CHEMBL127941,RBA,0.5,
2,CHEMBL338926,RBA,0.84,
15,CHEMBL267385,IC50,1.0,nM
17,CHEMBL124197,RBA,10.8,
14,CHEMBL124708,RBA,14.8,


In [95]:
activity_table['standard_value']= activity_table['standard_value'].astype('float')
activity_table.sort_values(['standard_value'], ascending=[True])

Unnamed: 0,molecule_chembl_id,type,standard_value,standard_units
10,CHEMBL434629,RBA,0.02,
18,CHEMBL127246,RBA,0.1,
11,CHEMBL434629,RBA,0.15,
16,CHEMBL124197,RBA,0.44,
13,CHEMBL124708,RBA,0.45,
7,CHEMBL127941,RBA,0.5,
2,CHEMBL338926,RBA,0.84,
15,CHEMBL267385,IC50,1.0,nM
0,CHEMBL431611,IC50,2.5,nM
5,CHEMBL304552,IC50,3.1,nM


### Limit of compounds in chembl ( &limit=0)

In [96]:
num_rows = len(activity_table)
num_rows

20

In [97]:
activity_url = base_url.format (f"activity?target_chembl_id__exact={chembl_id}&limit=0")

In [98]:
response = requests.get(activity_url, headers={"Accept": "application/json"})
if response.status_code == 200:
    activity_request = response.json()
    activity_table = pd.DataFrame.from_dict(activity_request['activities'])[['molecule_chembl_id', 'type', 'standard_value', 'standard_units']]
else:
    print(f"Error in the request ({response.status_code}): {response.text}")
activity_table

Unnamed: 0,molecule_chembl_id,type,standard_value,standard_units
0,CHEMBL431611,IC50,2.5,nM
1,CHEMBL316132,IC50,7.5,nM
2,CHEMBL338926,RBA,0.84,
3,CHEMBL338926,RBA,8.7,
4,CHEMBL127736,RBA,75.0,
...,...,...,...,...
995,CHEMBL307827,Relative activation,1.0,%
996,CHEMBL312741,Relative activation,4.0,%
997,CHEMBL80314,IC50,2728.0,nM
998,CHEMBL78254,IC50,5000.0,nM


In [99]:
#Add filters in the url
activity_url = base_url.format (f"activity?target_chembl_id__exact={chembl_id}&limit=0&type=IC50")

In [100]:
response = requests.get(activity_url, headers={"Accept": "application/json"})
if response.status_code == 200:
    activity_request = response.json()
    activity_table = pd.DataFrame.from_dict(activity_request['activities'])[['molecule_chembl_id', 'type', 'standard_value', 'standard_units']]
else:
    print(f"Error in the request ({response.status_code}): {response.text}")
activity_table

Unnamed: 0,molecule_chembl_id,type,standard_value,standard_units
0,CHEMBL431611,IC50,2.5,nM
1,CHEMBL316132,IC50,7.5,nM
2,CHEMBL304552,IC50,3.1,nM
3,CHEMBL85881,IC50,3.9,nM
4,CHEMBL85536,IC50,7.4,nM
...,...,...,...,...
995,CHEMBL204922,IC50,159.0,nM
996,CHEMBL381697,IC50,225.0,nM
997,CHEMBL206441,IC50,4160.0,nM
998,CHEMBL206218,IC50,10000.0,nM


In [101]:
# offset: is used in API calls to specify the starting point or the number of records to 
#skip when retrieving data. 
activity_url = base_url.format (f"activity?target_chembl_id__exact={chembl_id}&limit=10&type=IC50")


response = requests.get(activity_url, headers={"Accept": "application/json"})
if response.status_code == 200:
    activity_request = response.json()
    activity_table = pd.DataFrame.from_dict(activity_request['activities'])[['molecule_chembl_id', 'type', 'standard_value', 'standard_units']]
else:
    print(f"Error in the request ({response.status_code}): {response.text}")

activity_table

Unnamed: 0,molecule_chembl_id,type,standard_value,standard_units
0,CHEMBL431611,IC50,2.5,nM
1,CHEMBL316132,IC50,7.5,nM
2,CHEMBL304552,IC50,3.1,nM
3,CHEMBL85881,IC50,3.9,nM
4,CHEMBL85536,IC50,7.4,nM
5,CHEMBL83451,IC50,490.0,nM
6,CHEMBL267385,IC50,1.0,nM
7,CHEMBL315761,IC50,35.0,nM
8,CHEMBL281499,IC50,4.3,nM
9,CHEMBL25228,IC50,91.0,nM


In [102]:
activity_url = base_url.format (f"activity?target_chembl_id__exact={chembl_id}&offset=8&limit=10&type=IC50")

response = requests.get(activity_url, headers={"Accept": "application/json"})
if response.status_code == 200:
    activity_request = response.json()
    activity_table = pd.DataFrame.from_dict(activity_request['activities'])[['molecule_chembl_id', 'type', 'standard_value', 'standard_units']]
else:
    print(f"Error in the request ({response.status_code}): {response.text}")

activity_table

Unnamed: 0,molecule_chembl_id,type,standard_value,standard_units
0,CHEMBL281499,IC50,4.3,nM
1,CHEMBL25228,IC50,91.0,nM
2,CHEMBL432454,IC50,172.0,nM
3,CHEMBL24950,IC50,35.0,nM
4,CHEMBL419110,IC50,11.0,nM
5,CHEMBL85090,IC50,2.6,nM
6,CHEMBL83060,IC50,542.0,nM
7,CHEMBL85650,IC50,3.0,nM
8,CHEMBL313941,IC50,2.7,nM
9,CHEMBL313825,IC50,7.0,nM


**Exercise**. Now that you found the CHEMBL IDs of some binders, use what we learnt above to retrieve these molecules smiles.

### Exercise 3: Approved drugs: mechanism


To provide you with accurate information about drug interactions with specific targets (proteins)

In [103]:
mechanism_url = base_url.format (f"mechanism?target_chembl_id__exact={chembl_id}&limit=10")
response = requests.get(mechanism_url, headers={"Accept": "application/json"})
if response.status_code == 200:
    mechanism_request = response.json()
    mechanism_table = pd.DataFrame.from_dict(mechanism_request['mechanisms'])[['molecule_chembl_id', 'target_chembl_id', 'max_phase']]
else:
    print(f"Error in the request ({response.status_code}): {response.text}")

mechanism_table


Unnamed: 0,molecule_chembl_id,target_chembl_id,max_phase
0,CHEMBL1201477,CHEMBL206,4
1,CHEMBL786,CHEMBL206,4
2,CHEMBL1405,CHEMBL206,4
3,CHEMBL135,CHEMBL206,4
4,CHEMBL1200430,CHEMBL206,4
5,CHEMBL1511,CHEMBL206,4
6,CHEMBL3185958,CHEMBL206,4
7,CHEMBL691,CHEMBL206,4
8,CHEMBL1018,CHEMBL206,4
9,CHEMBL1200973,CHEMBL206,4


### Identified targets for the UNIPROT code: P03372

UNIPROT and CHEMBL have different identifiers. Let's start by finding the Chembl identifier from Uniprot.


In [104]:
target_protein_url = base_url.format("target_component?accession=P03372")
target_components = requests.get(target_protein_url, headers={"Accept":"application/json"}).json()['target_components']


In [105]:
target_components

[{'accession': 'P03372',
  'component_id': 410,
  'component_type': 'PROTEIN',
  'description': 'Estrogen receptor',
  'go_slims': [{'go_id': 'GO:0003674'},
   {'go_id': 'GO:0003677'},
   {'go_id': 'GO:0003682'},
   {'go_id': 'GO:0004930'},
   {'go_id': 'GO:0005515'},
   {'go_id': 'GO:0005516'},
   {'go_id': 'GO:0005575'},
   {'go_id': 'GO:0005634'},
   {'go_id': 'GO:0005654'},
   {'go_id': 'GO:0005737'},
   {'go_id': 'GO:0005794'},
   {'go_id': 'GO:0005829'},
   {'go_id': 'GO:0005886'},
   {'go_id': 'GO:0006629'},
   {'go_id': 'GO:0007165'},
   {'go_id': 'GO:0008134'},
   {'go_id': 'GO:0008150'},
   {'go_id': 'GO:0008283'},
   {'go_id': 'GO:0008289'},
   {'go_id': 'GO:0009058'},
   {'go_id': 'GO:0016020'},
   {'go_id': 'GO:0019899'},
   {'go_id': 'GO:0030154'},
   {'go_id': 'GO:0030234'},
   {'go_id': 'GO:0034641'},
   {'go_id': 'GO:0040007'},
   {'go_id': 'GO:0042802'},
   {'go_id': 'GO:0043085'},
   {'go_id': 'GO:0043167'},
   {'go_id': 'GO:0048856'},
   {'go_id': 'GO:0065003'}],
  

In [106]:
target_components[0]['targets']

[{'target_chembl_id': 'CHEMBL206'},
 {'target_chembl_id': 'CHEMBL2093866'},
 {'target_chembl_id': 'CHEMBL3885521'},
 {'target_chembl_id': 'CHEMBL4523681'},
 {'target_chembl_id': 'CHEMBL4523713'},
 {'target_chembl_id': 'CHEMBL4523721'},
 {'target_chembl_id': 'CHEMBL4523726'},
 {'target_chembl_id': 'CHEMBL4523754'}]

In some cases we can found more than 1 result. This is usually due to some drugs targetting protein families, protein-protein interactions or lab protein constructs. In order to filter we can ask about the "target type"


In [107]:
targets_list = ';'.join([i['target_chembl_id'] for i in target_components[0]['targets']])
targets_url = base_url.format("target/set/{:s}".format(targets_list))
targets = requests.get(targets_url, headers={"Accept":"application/json"}).json()

In [109]:
for i in targets['targets']:
    print(i['target_chembl_id'], i['target_type'])

CHEMBL206 SINGLE PROTEIN
CHEMBL2093866 PROTEIN FAMILY
CHEMBL3885521 PROTEIN COMPLEX
CHEMBL4523681 PROTEIN-PROTEIN INTERACTION
CHEMBL4523713 PROTEIN-PROTEIN INTERACTION
CHEMBL4523721 PROTEIN-PROTEIN INTERACTION
CHEMBL4523726 PROTEIN-PROTEIN INTERACTION
CHEMBL4523754 PROTEIN-PROTEIN INTERACTION


**Exercise**: Navigate the dictionaries to find out which are the interacting partners.