# Retrosynthesis AI
### Chemoinformatics Training I
Created by: Margaret Liñán MS MPH
<img src="chem.png" />
In this tutorial, you will learn how to utilize ChemSpider, ChEMBL and PubChem to build proficiency in molecular analyses and virtual screening with small molecule ligands.

## Section 1 - Extracting Drug Data from ChEMBL 
ChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into effective new drugs. In the following exercises you will be trained to use ChEMBL's Web Client for extracting ChEMBL IDs, Synonyms, Indication Class, and whether it is ProDrug, Parenteral and Oral. You will also learn how to reformat JSON string output and generate molecular images.

##### Notes
The following exercises were adapted from the ChEMBL web resource client examples on Binder

##### Resources
<a href="https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services">ChEMBL Data Web Services</a> <br>
<a href="https://bioservices.readthedocs.io/en/master/notebooks.html">Bioservices</a>


#### Find Single Drug attributes from ChEMBL such as Indication, ID, Oral, Parenteral and ProDrug 
    from chembl_webresource_client.new_client import new_client
    import pandas as pd
    molecule = new_client.molecule
    mols = molecule.filter(molecule_synonyms__molecule_synonym__iexact='abacavir').only(['molecule_chembl_id','indication_class','oral','parenteral','prodrug'])

#### Printing the ChEMBL results in different formats
    print("ChEMBL Results JSON String")
    print(mols)
    print("")
    print("")
    print("ChEMBL Results Table")
    df_list = pd.json_normalize(mols)
    print(df_list)

Try the above code by copying it into the cell below and hitting run, don't forget to activate the hardware by clicking on the circuit icon beneath the blue folder on the far upper right.

In [None]:
### Paste code here

### Let's now try the above code in a scenario where we need this information for multiple drugs.

#### Find Multiple Drug attributes from ChEMBL such as Indication, ID, Oral, Parenteral and ProDrug 
    from chembl_webresource_client.new_client import new_client

    molecule = new_client.molecule
    mols = molecule.filter(molecule_chembl_id__in=['CHEMBL25', 'CHEMBL192', 'CHEMBL27']).only(['molecule_chembl_id','indication_class','oral','parenteral','prodrug','molecule_structures'])
#### Printing the ChEMBL results in different formats
    print("ChEMBL Results JSON String")
    print(mols)
    print("")
    print("")
    print("ChEMBL Results Table")
    df_list = pd.json_normalize(mols)
    print(df_list) 

Try the above code by copying it into the cell below and hitting run, don't forget to activate the hardware by clicking on the circuit icon beneath the blue folder on the far upper right.

In [None]:
### Paste code here

### Here we use the ChEMBL web resource to generate an image of the drug by using it's ChEMBL ID.

    from chembl_webresource_client.new_client import new_client
    from IPython.display import SVG

    image = new_client.image
    image.set_format('svg')
    SVG(image.get('CHEMBL1201179'))

<img src="chem_structure.png" />

In [None]:
## Paste code here

### View the full attributes list for a specific drug by ChEMBL ID

    import pandas as pd
    from bioservices import ChEMBL

    c = ChEMBL()
    c.get_drug('CHEMBL1380', limit=10, offset=0, filters=None)


In [None]:
## Paste code here

## Section 2 - Protein Targets, Mechanisms of Action, Bioactivity
In these exercises you will learn how to find protein targets by ChEMBL ID and gene name, as well as find a drug's mechanism of action and bioactivity.

##### Notes
The following exercises were adapted from the ChEMBL web resource client examples on Binder

##### Resources
<a href="https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services">ChEMBL Data Web Services</a> <br>
<a href="https://bioservices.readthedocs.io/en/master/notebooks.html">Bioservices</a>


### Protein Targets by ChEMBL ID using ChEMBL web resource and BioServices

    from chembl_webresource_client.new_client import new_client

    target = new_client.target
    chembl_id = ['CHEMBL3938']
    res = target.filter(target_chembl_id=chembl_id).only(['molecule_synonym', 'organism', 'pref_name', 'target_type'])
    for i in res:
        print(i)

In [None]:
## Paste code here

    from bioservices import *
    import pandas as pd

    c = ChEMBL(verbose=False)

    ## Return all Protein Targets that contain the term 'kinase' in the pref_name attribute:
    res = c.get_target('CHEMBL3938', limit=None, offset=0, filters="pref_name__contains=kinase")
 
    print(res)

In [None]:
## Paste code here


### Protein Targets by Gene Name [ChEMBL]

    from chembl_webresource_client.new_client import new_client

    target = new_client.target
    gene_name = ['BRD4']
    res = target.filter(target_synonym__icontains=gene_name).only(['organism', 'pref_name', 'target_type'])
    for i in res:
        print(i)

In [None]:
## Paste code here


### Mechanisms of Action for FDA-approved Drugs with BioServices

    from bioservices import *
    import pandas as pd

    m = ChEMBL(verbose=False)

    ## Return all Mechanisms of Action that contain the molecule ChEMBL IDs: CHEMBL25, 
    ## Note that there is information for only a subset of all ChEMBL IDs
    
    mech = m.get_mechanism(filters='molecule_chembl_id__in=CHEMBL25')

    print(mech)

In [None]:
## Paste code here

 

## Section 3 - Finding drugs that are similar

In these exercises you will learn how to find similar drugs using their ChEMBL IDs, SMILES and similiarity threshold. 

##### Notes
The following exercises were adapted from the ChEMBL web resource client examples on Binder

##### Resources
<a href="https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services">ChEMBL Data Web Services</a> <br>
 


### Using ChEMBL ID and similarity threshold

    from chembl_webresource_client.new_client import new_client

    similarity = new_client.similarity
    ## Below, the similarity var represents a threshold
    res = similarity.filter(chembl_id='CHEMBL25', similarity=70).only(['molecule_chembl_id', 'pref_name', 'similarity'])

    for h in res:
        print(h)


In [None]:
## Paste code here


### Using SMILES and similarity threshold

    from chembl_webresource_client.new_client import new_client

    similarity = new_client.similarity
    res = similarity.filter(smiles="CO[C@@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-]", similarity=70).only(['molecule_chembl_id', 'similarity'])
    for i in res:
        print(i)

In [None]:
## Paste code here


## Section 4 - Find disease specific drugs with similiar connectivity

Find drugs for a specific disease, then find drugs and those with similiar connectivity


##### Notes
The following exercises were adapted from the ChEMBL web resource client examples on Binder

##### Resources
<a href="https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services">ChEMBL Data Web Services</a> <br>



### Disease specific drugs 

    from chembl_webresource_client.new_client import new_client

    drug_indication = new_client.drug_indication
    molecules = new_client.molecule

    lung_cancer_ind = drug_indication.filter(efo_term__icontains="LUNG CARCINOMA")
    lung_cancer_mols = molecules.filter(
        molecule_chembl_id__in=[x['molecule_chembl_id'] for x in lung_cancer_ind])

    print("The total number of drugs for Lung Cancer: ", len(lung_cancer_mols))
    print("")
    
    ## Remove [0:5] to output all results
    subset = lung_cancer_mols[0:5] 
    print("Here is the subset")
    print("-----------------------------------------------------------------------")
    for i in subset:
        print(i)

In [None]:
## Paste code here


### Drugs with Similar Connectivity

    from chembl_webresource_client.new_client import new_client

    molecule = new_client.molecule
    res = molecule.filter(molecule_structures__canonical_smiles__connectivity='CN(C)C(=N)N=C(N)N').only(['molecule_chembl_id', 'pref_name'])
    for i in res:
        print(i)


In [None]:
## Paste code here


<br>
<br>

## [Back](00_getting_started.ipynb)

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=367703e1-92f2-45b8-a3b3-39f4563b698f' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>