# Retrosynthesis AI
### Chemoinformatics Training II
Created by: Margaret Liñán MS MPH
<img src="/work/Media/prots.png" />
In this tutorial, you will learn how to utilize RDKit and the ZINC Database to visualize multiple molecules (drugs, compounds, etc). 

## Section 1 - Extracting FDA Approved Drug Data
The ZINC database is a great resource for drugs, and compounds that are especially prepared for virtual screening. ZINC is used by pharma companies, biotech companies and research universities. Users can visit their grouped molecules <a href="http://zinc15.docking.org/substances/subsets/">here</a>. Once there click on the "fda" string in the Bioactive and Drugs section. In the next screen you can download a mol2 file as depicted below.

<img src="/work/Media/ZINC.png"/>
<br>

The fda.mol2 file is used in this training module (no need to upload it, as I provide it) and it's collection of molecules will be visualized in Section 2. 
<br>
<br>
##### Notes
The following exercises were adapted from the Cheminformatics Workflows - Building a Multi-Molecule Mol2 reader for RDKit V2 website created by Angel J. Ruiz Moreno

##### Resources
<a href="https://chem-workflows.com/articles/2020/03/23/building-a-multi-molecule-mol2-reader-for-rdkit-v2/">Building a Multi-Molecule Mol2 reader for RDKit V2</a><br>
<a href="https://zinc.docking.org/">ZINC</a><br>
<a href="https://www.rdkit.org/">RDKit</a>

## Section 2 - Visualizing Molecules

The following molecular ligands are FDA Approved according to DrugBank and as posted on the ZINC DB website. You will use the RDKit and py3Dmol to view them.

    import sys
    sys.path.append('/usr/local/lib/python3.7/site-packages/')

In [None]:
## Insert the above code, here

    from rdkit import Chem
    from rdkit.Chem import Draw,AllChem
    from rdkit.Chem.Draw import IPythonConsole
    import py3Dmol
    import pandas as pd

In [None]:
## Insert the above code, here

    ## Prep Function
    
    def Mol2MolSupplier (file=None,sanitize=True):
    mols=[]
    with open(file, 'r') as f:
        doc=[line for line in f.readlines()]

    start=[index for (index,p) in enumerate(doc) if '@<TRIPOS>MOLECULE' in p]
    finish=[index-1 for (index,p) in enumerate(doc) if '@<TRIPOS>MOLECULE' in p]
    finish.append(len(doc))
    
    interval=list(zip(start,finish[1:]))
    for i in interval:
        block = ",".join(doc[i[0]:i[1]]).replace(',','')
        m=Chem.MolFromMol2Block(block,sanitize=sanitize)
        mols.append(m)
    return(mols)

In [None]:
## Insert the above code, here

    ## Specify the fda mol2 file downloaded from ZINC, (it's included in this noteboook)
    filePath ='fda.mol2'

In [None]:
## Insert the above code, here

    ## Sanitize the fda.mol2 data file
    database=Mol2MolSupplier(filePath,sanitize=True)
    

In [None]:
## Insert the above code, here

    ## Create a data table for the molecular dataset
    table=pd.DataFrame()
    index=0
    for mol in database:
        if mol:
            table.loc[index,'Name']=mol.GetProp('_Name')
            table.loc[index,'NumAtoms']=mol.GetNumAtoms()
            table.loc[index,'SMILES']=Chem.MolToSmiles(mol)
            index=index+1

In [None]:
## Insert the above code, here

In [None]:
table.head(10) #The first 10 non None elements in the list 

Unnamed: 0,Name,NumAtoms,SMILES
0,ZINC000001530427,8.0,C[C@@H]1O[C@@H]1[P@](=O)([O-])O
1,ZINC000003807804,25.0,Clc1ccccc1C(c1ccccc1)(c1ccccc1)n1ccnc1
2,ZINC000000120286,19.0,Nc1nc(N)c2nc(-c3ccccc3)c(N)nc2n1
3,ZINC000000008492,11.0,Oc1cccc2cccnc12
4,ZINC000003607120,27.0,COc1c(N2CC[NH2+][C@H](C)C2)c(F)cc2c(=O)c(C(=O)...
5,ZINC000001612996,43.0,CCc1c2c(nc3ccc(OC(=O)N4CCC([NH+]5CCCCC5)CC4)cc...
6,ZINC000000001673,19.0,[NH3+][C@@H](Cc1ccc(N(CCCl)CCCl)cc1)C(=O)[O-]
7,ZINC000000896546,9.0,Nc1nc(=O)[nH]cc1F
8,ZINC000051133897,23.0,CN1C(C(=O)Nc2ccccn2)=C([O-])c2ccccc2S1(=O)=O
9,ZINC000004658290,10.0,S=c1[nH]cnc2nc[nH]c12


In [None]:
_deepnote_run_altair(table, """{"$schema":"https://vega.github.io/schema/vega-lite/v4.json","mark":{"type":"bar","tooltip":{"content":"data"}},"height":220,"autosize":{"type":"fit"},"data":{"name":"placeholder"},"encoding":{"x":{"field":"NumAtoms","type":"quantitative","sort":null,"scale":{"type":"linear","zero":false}},"y":{"field":"SMILES","type":"nominal","sort":null,"scale":{"type":"linear","zero":true}},"color":{"field":"","type":"nominal","sort":null,"scale":{"type":"linear","zero":false}}}}""")

    ## Draw out the FDA Approved Molecules 
    
    no_none=[mol for mol in database if mol] # None element can´t be drawn, this loop keep only valid entries
    [Chem.SanitizeMol(mol) for mol in no_none]
    Draw.MolsToGridImage(no_none[:14],molsPerRow=7,subImgSize=(150,150),legends=[mol.GetProp('_Name') for mol in no_none[:14]],maxMols=100)

In [None]:
## Insert the above code, here

    ## Draw the FDA Approved Molecules
    Draw.IPythonConsole.drawMol3D(no_none[1])
    Draw.IPythonConsole.drawMol3D(no_none[2]) 
    Draw.IPythonConsole.drawMol3D(no_none[3]) 
    Draw.IPythonConsole.drawMol3D(no_none[4])
    Draw.IPythonConsole.drawMol3D(no_none[5]) 
    Draw.IPythonConsole.drawMol3D(no_none[6]) 
    Draw.IPythonConsole.drawMol3D(no_none[7])
    Draw.IPythonConsole.drawMol3D(no_none[8]) 
    Draw.IPythonConsole.drawMol3D(no_none[9]) 


In [1]:
## Insert the above code, here


<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=367703e1-92f2-45b8-a3b3-39f4563b698f' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>