<a href="https://colab.research.google.com/github/Mohammed-Hassan3/DrugDiscovery/blob/main/Melanoma_DrugDiscovery.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **ChEMBL Database**

The [*ChEMBL Database*](https://www.ebi.ac.uk/chembl/) contains curated bioactivity data.

### Installing libraries

In [None]:
! pip install chembl_webresource_client #The ! tells the notebook to execute the cell as a shell command.

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting chembl_webresource_client
  Downloading chembl_webresource_client-0.10.8-py3-none-any.whl (55 kB)
[?25l[K     |██████                          | 10 kB 34.7 MB/s eta 0:00:01[K     |███████████▉                    | 20 kB 40.6 MB/s eta 0:00:01[K     |█████████████████▉              | 30 kB 47.3 MB/s eta 0:00:01[K     |███████████████████████▊        | 40 kB 29.2 MB/s eta 0:00:01[K     |█████████████████████████████▊  | 51 kB 33.7 MB/s eta 0:00:01[K     |████████████████████████████████| 55 kB 4.4 MB/s 
Collecting requests-cache~=0.7.0
  Downloading requests_cache-0.7.5-py3-none-any.whl (39 kB)
Collecting url-normalize<2.0,>=1.4
  Downloading url_normalize-1.4.3-py2.py3-none-any.whl (6.8 kB)
Collecting itsdangerous>=2.0.1
  Downloading itsdangerous-2.1.2-py3-none-any.whl (15 kB)
Collecting attrs<22.0,>=21.2
  Downloading attrs-21.4.0-py2.py3-none-any.whl (60 kB)
[K  

In [None]:
# Import necessary libraries
import pandas as pd
from chembl_webresource_client.new_client import new_client

### Search for Target protein

In [None]:
 # Target search for melanoma
target = new_client.target
target_query = target.search('melanoma')
targets = pd.DataFrame.from_dict(target_query)
targets #7 targets appear, 13 single proteins and 7 are cellline

Unnamed: 0,cross_references,organism,pref_name,score,species_group_flag,target_chembl_id,target_components,target_type,tax_id
0,[],Homo sapiens,Cell surface glycoprotein MUC18,15.0,False,CHEMBL3712863,"[{'accession': 'P43121', 'component_descriptio...",SINGLE PROTEIN,9606.0
1,[],Homo sapiens,Melanoma cells,14.0,False,CHEMBL614126,[],CELL-LINE,9606.0
2,[],Homo sapiens,Melanoma-associated antigen 4,14.0,False,CHEMBL4296022,"[{'accession': 'P43358', 'component_descriptio...",SINGLE PROTEIN,9606.0
3,[],Homo sapiens,Interferon-inducible protein AIM2,14.0,False,CHEMBL4630802,"[{'accession': 'O14862', 'component_descriptio...",SINGLE PROTEIN,9606.0
4,[],Homo sapiens,Melanoma-associated antigen 3,14.0,False,CHEMBL4662941,"[{'accession': 'P43357', 'component_descriptio...",SINGLE PROTEIN,9606.0
5,[],Homo sapiens,CD63 antigen,13.0,False,CHEMBL3713303,"[{'accession': 'P08962', 'component_descriptio...",SINGLE PROTEIN,9606.0
6,[],Homo sapiens,Melanoma cell line,12.0,False,CHEMBL613892,[],CELL-LINE,9606.0
7,[],Homo sapiens,Cereblon/Melanoma-associated antigen D1,12.0,False,CHEMBL4742325,"[{'accession': 'Q96SW2', 'component_descriptio...",PROTEIN-PROTEIN INTERACTION,9606.0
8,[],Homo sapiens,3677 melanoma cell line,11.0,False,CHEMBL612820,[],CELL-LINE,9606.0
9,[],Homo sapiens,BRO melanoma cell line,11.0,False,CHEMBL614665,[],CELL-LINE,9606.0



### Select and retrieve the bioactivity data for Cell surface glycoprotein MUC18 (first entry)
The expression of the MUC18 antigen on primary human melanoma corresponds with a poor prognosis and the emergence of metastatic melanoma [*Lehmann et al.*](https://pubmed.ncbi.nlm.nih.gov/2602381/#:~:text=The%20MUC18%20antigen%20is%20an,a%20low%20probability%20of%20metastasizing.) .


In [None]:
selected_target = targets.target_chembl_id[12]
selected_target

'CHEMBL5021'

The half maximal inhibitory concentration (IC50) is a measure of the potency of a substance in inhibiting a specific biological or biochemical function. IC50 is a quantitative measure that indicates how much of a particular inhibitory substance (e.g. drug) is needed to inhibit, in vitro, a given biological process or biological component by 50%.

In [None]:
df.standard_type.unique()

array(['IC50', 'Ratio', 'Activity'], dtype=object)

In [None]:
activity = new_client.activity
res= activity.filter(target_chembl_id=selected_target).filter(standard_type='IC50')

In [None]:
df = pd.DataFrame.from_dict(res)

In [None]:
df.head(3)

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,...,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,1418556,[],CHEMBL833769,Inhibitory concentration against p21 deficient...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0
1,,1418557,[],CHEMBL832970,Inhibitory concentration against human p21 pro...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0
2,,1418559,[],CHEMBL833769,Inhibitory concentration against p21 deficient...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0


In [None]:
df.to_csv('bioactivity_data.csv', index=False) #save the results of bioactivity data

### **Handling missing data**
If any compounds has missing value for the **standard_value** column then drop it

In [None]:
df2 = df[df.standard_value.notna()]
df2

Unnamed: 0,activity_comment,activity_id,activity_properties,assay_chembl_id,assay_description,assay_type,assay_variant_accession,assay_variant_mutation,bao_endpoint,bao_format,...,target_organism,target_pref_name,target_tax_id,text_value,toid,type,units,uo_units,upper_value,value
0,,1418556,[],CHEMBL833769,Inhibitory concentration against p21 deficient...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0
1,,1418557,[],CHEMBL832970,Inhibitory concentration against human p21 pro...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0
2,,1418559,[],CHEMBL833769,Inhibitory concentration against p21 deficient...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0
3,,1418560,[],CHEMBL832970,Inhibitory concentration against human p21 pro...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0
4,,1418759,[],CHEMBL833769,Inhibitory concentration against p21 deficient...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,2.3
5,,1418760,[],CHEMBL833768,Inhibitory concentration against human p21 def...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0
6,,1418798,[],CHEMBL833769,Inhibitory concentration against p21 deficient...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0
7,,1418799,[],CHEMBL832970,Inhibitory concentration against human p21 pro...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0
8,,1418801,[],CHEMBL833769,Inhibitory concentration against p21 deficient...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,2.4
9,,1418802,[],CHEMBL832970,Inhibitory concentration against human p21 pro...,F,,,BAO_0000190,BAO_0000019,...,Homo sapiens,CDK-interacting protein 1,9606,,,IC50,uM,UO_0000065,,20.0


In [None]:
df2.molecule_chembl_id

0      CHEMBL368648
1      CHEMBL368648
2      CHEMBL178663
3      CHEMBL178663
4      CHEMBL179730
5      CHEMBL179730
6      CHEMBL179692
7      CHEMBL179692
8      CHEMBL179110
9      CHEMBL179110
10     CHEMBL361956
11     CHEMBL361956
12     CHEMBL179581
13     CHEMBL179581
14     CHEMBL178310
15     CHEMBL178310
16     CHEMBL179178
17     CHEMBL179178
18     CHEMBL368233
19     CHEMBL360687
20     CHEMBL360687
21     CHEMBL368233
22     CHEMBL360323
23     CHEMBL360323
24     CHEMBL178149
25     CHEMBL178149
26     CHEMBL425866
27     CHEMBL425866
28     CHEMBL359861
29     CHEMBL359861
30     CHEMBL180572
31     CHEMBL180572
32     CHEMBL178907
33     CHEMBL178907
34     CHEMBL179694
35     CHEMBL179694
36     CHEMBL359939
37     CHEMBL359939
38     CHEMBL179113
39     CHEMBL179113
40     CHEMBL179187
41     CHEMBL179187
42     CHEMBL179920
43     CHEMBL179920
44     CHEMBL179155
45     CHEMBL179155
46     CHEMBL178807
47     CHEMBL178807
48     CHEMBL179087
49     CHEMBL179087


 **Data pre-processing of the bioactivity data**
 <br>
 **Labeling compounds as either being active, inactive or intermediate**
<br>
The bioactivity data is in the IC50 unit. Compounds having values of less than 1000 nM will be considered to be **active** while those greater than 10,000 nM will be considered to be **inactive**. As for those values in between 1,000 and 10,000 nM will be referred to as **intermediate**.

In [None]:
mol_cid = []
for i in df2.molecule_chembl_id:
  mol_cid.append(i)

In [None]:
mol_cid

['CHEMBL368648',
 'CHEMBL368648',
 'CHEMBL178663',
 'CHEMBL178663',
 'CHEMBL179730',
 'CHEMBL179730',
 'CHEMBL179692',
 'CHEMBL179692',
 'CHEMBL179110',
 'CHEMBL179110',
 'CHEMBL361956',
 'CHEMBL361956',
 'CHEMBL179581',
 'CHEMBL179581',
 'CHEMBL178310',
 'CHEMBL178310',
 'CHEMBL179178',
 'CHEMBL179178',
 'CHEMBL368233',
 'CHEMBL360687',
 'CHEMBL360687',
 'CHEMBL368233',
 'CHEMBL360323',
 'CHEMBL360323',
 'CHEMBL178149',
 'CHEMBL178149',
 'CHEMBL425866',
 'CHEMBL425866',
 'CHEMBL359861',
 'CHEMBL359861',
 'CHEMBL180572',
 'CHEMBL180572',
 'CHEMBL178907',
 'CHEMBL178907',
 'CHEMBL179694',
 'CHEMBL179694',
 'CHEMBL359939',
 'CHEMBL359939',
 'CHEMBL179113',
 'CHEMBL179113',
 'CHEMBL179187',
 'CHEMBL179187',
 'CHEMBL179920',
 'CHEMBL179920',
 'CHEMBL179155',
 'CHEMBL179155',
 'CHEMBL178807',
 'CHEMBL178807',
 'CHEMBL179087',
 'CHEMBL179087',
 'CHEMBL178995',
 'CHEMBL178995',
 'CHEMBL179916',
 'CHEMBL179916',
 'CHEMBL538355',
 'CHEMBL538355',
 'CHEMBL4226806']

In [None]:
canonical_smiles = []
for i in df2.canonical_smiles:
  canonical_smiles.append(i)