<a href="https://colab.research.google.com/github/DIFACQUIM/Cursos/blob/main/4_4_Molecular_databases_ZINC.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **ZINC**

---
Made by: Brayan Raziel Cedillo González and Karen Pelcastre

Contact: brayanraziel1997@gmail.com

**Last Update:** March 2025



#Contents
---

>[ZINC](#scrollTo=q0HLLI4xnaam)

>[Contents](#scrollTo=RsoKYOCQ4iSG)

>[Introduction](#scrollTo=xNVoF8Lw3dEd)

>[1. Packages: installation and import](#scrollTo=uRGaeOxOnfou)

>[2. Trials](#scrollTo=zIoc7k6onk2N)

>[3. Exercise.](#scrollTo=BqC377MqnqK5)

>[For more information:](#scrollTo=h914yjpln1XF)



#Introduction
---

Zinc is an open access molecular database, provided by Irwin and Shoichet laboratories from the Department of Pharmaceutical Chemistry at the University of California, San Francisco (UCSF), financed by the NIGMS (GM71896). It is characterized by the fact that most of the compounds that contained are commercially available and are used in virtual screening. It contains more than 230 million compounds with tridimensional structures and more than 750 million commercially available compounds among which analogues can be searched in less than a minute.

# *1. Packages: installation and import*
---

In [None]:
from IPython.utils import io
import tqdm.notebook
import os, sys, random, subprocess
total = 100
with tqdm.notebook.tqdm(total=total) as pbar:
    with io.capture_output() as captured:
        from platform import python_version
        pbar.update(20)
        #Graphic libraries
        !pip install matplotlib
        import matplotlib.pyplot as plt
        import matplotlib.font_manager as font_manager
        %matplotlib inline
        !pip install seaborn
        import seaborn as sns
        pbar.update(30)
        # System libraries and primary tools
        import os.path
        os.getcwd()
        !pip install pandas
        import pandas as pd
        #Conect to ZINC20
        !pip install molbloom
        import molbloom
        from molbloom import buy
        from molbloom import BloomFilter
        pbar.update(30)
        from tqdm.auto import tqdm
        pbar.update(10)
        # Mount Google Drive and upload your PyMOL license
        pbar.update(10)

  0%|          | 0/100 [00:00<?, ?it/s]

In [None]:
#See avaliable catalogues
molbloom.catalogs()


{'zinc20': 'All ZINC20 (1,006,651,037 mols) from Oct 2021. FPR of 0.003. Requires download',
 'zinc-instock': 'ZINC20 instock (9,227,726 mols). FPR of 0.0003. Requires download',
 'zinc-instock-mini': 'ZINC20 instock (9,227,726 mols). FPR of 0.07. Included in package',
 'surechembl': 'SureChEMBL (22,843,364 mols). FPR of 0.000025. Requires download'}

#*2. Trials*

In [None]:
#buy('CCCO')
# True
buy('CCCO')
# False

Starting zinc-instock download to cache directory /root/.cache/molbloom
Downloading filter... 100%


True

#*3. Exercise.*
Import the SMILES file: "SMILES_trial.xlsx
This file has a list of SMILES where only the first of them is avaliable in ZINC


In [None]:
id=pd.read_csv('https://raw.githubusercontent.com/DIFACQUIM/Cursos/main/Datasets/Tabla%20para%20disponibilidad%20ZINC.csv')#Read csv file
print(f"The dataframe has the following rows and columns: {id.shape}")
id[0:10]

The dataframe has the following rows and columns: (605, 1)


Unnamed: 0,SMILES
0,COc1ccc(-c2oc3cc(O)c(OC)c(O)c3c(=O)c2OC)cc1
1,C=C1C(=O)O[C@H]2[C@H]1CC[C@]1(C)[C@@H]2C(C)=CC...
2,COC1=CC(=O)[C@@H]2O[C@]2(C)[C@H]1O
3,COC1=CC(=O)C(O)=C(C)C1=O
4,COC1=C(O)C(=O)C(O)=C(C)C1=O
5,COC1=CC(=O)[C@H](Cl)[C@@](C)(O)[C@H]1O
6,CNC1=CC(=O)C(O)=C(C)C1=O
7,COC1=CC(=O)[C@H](Nc2ccccc2)[C@@](C)(O)[C@H]1O
8,CCCCc1ccc(NC2=C(C)C(=O)C(OC)=CC2=O)cc1
9,COc1cc(-c2cc(=O)c3c(O)c(O)c(OC)cc3o2)ccc1Oc1cc...


In [None]:
#Definition that will allow us to retrieve the information from different ZINC catalogues uploaded by the project: https://pypi.org/project/molbloom/2.0.0/
def get_availability(table, column, catalog):

    num = 0
    df=pd.DataFrame()
    for i in table[column]:
        result = buy(i, catalog)
        df.loc[i, 'vendors'] = result
        if result:
            num += 1
    print(f"Available compounds: {num}, from: {catalog}")
    df['catalog'] = catalog
    df=df.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='').rename(columns={'index':'SMILES'})
    df=df[df['vendors']==True] #Mute if you want results without sellers
    return df

In [None]:
a=get_availability(id, column='SMILES', catalog='zinc20')
b=get_availability(id, column='SMILES', catalog='zinc-instock')
c=get_availability(id, column='SMILES', catalog='zinc-instock-mini')
d=get_availability(id, column='SMILES', catalog='surechembl')

Starting zinc20 download to cache directory /root/.cache/molbloom
Downloading filter... 100%
Available compounds: 5, from: zinc20
Available compounds: 154, from: zinc-instock
Available compounds: 183, from: zinc-instock-mini
Starting surechembl download to cache directory /root/.cache/molbloom
Downloading filter... 100%
Available compounds: 142, from: surechembl


In [None]:
df1=pd.merge(a, b, on='SMILES', how='outer').rename(columns={'vendors_x':'Vendor_A','catalog_x': 'Catalog_A', 'vendors_y':'Vendor_B','catalog_y': 'Catalog_B' })
df2=pd.merge(c,d, on='SMILES', how='outer').rename(columns={'vendors_x':'Vendor_C','catalog_x': 'Catalog_C', 'vendors_y':'Vendor_D','catalog_y': 'Catalog_D' })
global_df=pd.merge(df1, df2, on='SMILES', how='outer')
print(global_df.shape)
global_df

(217, 9)


Unnamed: 0,SMILES,Vendor_A,Catalog_A,Vendor_B,Catalog_B,Vendor_C,Catalog_C,Vendor_D,Catalog_D
0,C(C)C1C2N3CC(C1)CC2c1[nH]c2c(c1CC3)cccc2,,,,,True,zinc-instock-mini,,
1,C/C=C(/C)C(=O)OCC(O)(COC(C)=O)c1ccc(C)cc1O,,,,,True,zinc-instock-mini,,
2,C/C=C(/C)C(=O)O[C@@H]1Cc2c(ccc3ccc(=O)oc23)OC1...,,,True,zinc-instock,True,zinc-instock-mini,True,surechembl
3,C/C=C(/C)C(=O)O[C@@H]1[C@H](O)c2c(ccc3ccc(=O)o...,,,True,zinc-instock,True,zinc-instock-mini,,
4,C/C=C(/C)C(=O)O[C@H]1c2c(C)coc2C(=O)C2=CCC[C@H...,,,,,True,zinc-instock-mini,,
...,...,...,...,...,...,...,...,...,...
212,Oc1cc(O)cc(O)c1,,,True,zinc-instock,True,zinc-instock-mini,True,surechembl
213,Oc1cc2c(cc1O)[C@@H]1c3ccc(O)c(O)c3OC[C@]1(O)C2,,,True,zinc-instock,True,zinc-instock-mini,True,surechembl
214,Oc1ccc(CCc2cc(O)cc(O)c2)cc1,,,True,zinc-instock,True,zinc-instock-mini,True,surechembl
215,Oc1ccc2c(c1)OC[C@]1(O)Cc3cc(O)c(O)cc3[C@H]21,,,True,zinc-instock,True,zinc-instock-mini,True,surechembl


---
# For more information:

* https://pypi.org/project/molbloom/2.0.0/
