# Working with Molecules from ZINC and ChEMBL

Graphein provides utilities for retrieving molecules and metadata from [ZINC](https://zinc.docking.org/) and [ChEMBL](https://www.ebi.ac.uk/chembl/)

The ZINC database is a curated collection of commercially available chemical compounds prepared especially for virtual screening.


ChEMBL or ChEMBLdb is a manually curated chemical database of bioactive molecules with drug-like properties. It is maintained by the European Bioinformatics Institute, of the European Molecular Biology Laboratory

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/a-r-j/graphein/blob/master/notebooks/molecules_from_zinc_and_chembl.ipynb) [![GitHub](https://img.shields.io/badge/-View%20on%20GitHub-181717?logo=github&logoColor=ffffff)](https://github.com/a-r-j/graphein/blob/master/notebooks/molecules_from_zinc_and_chembl.ipynb)

In [None]:
# Install Graphein if necessary
# !pip install graphein[extras]

In [None]:
import graphein.molecule as gm

## ZINC
### Mapping between SMILEs and ZINC IDs.

In [None]:
#NBVAL_SKIP
gm.get_smiles_from_zinc("ZINC01234567", backend="zinc15") #Backend allows specification of which ZINC release to use

We can also map a SMILE string back to ZINC identifiers. 

N.B. we may obtain multiple ZINC IDs. In the example below, we retrieve two ZINC IDs for the two enantiomers of the specified molecule

In [None]:
#NBVAL_SKIP
gm.get_zinc_id_from_smile("C[C@H]1CCCCN1CC#CC(O)(c1ccccc1)c1ccccc1")

### Constructing Graphs
We can also retrieve the molecular graphs directly from the ZINC ID

In [None]:
#NBVAL_SKIP
g = gm.construct_graph(zinc_id="ZINC000001234567")
g.graph["rdmol"]

In [None]:
#NBVAL_SKIP
g = gm.construct_graph(zinc_id="ZINC000001234568")
g.graph["rdmol"]

In [None]:
#NBVAL_SKIP
# Here we retrieve the above molecule, generate a 3D conformer and visualise it
config = gm.MoleculeGraphConfig(edge_construction_functions=[gm.add_atom_bonds])

g = gm.construct_graph(zinc_id="ZINC000001234568", generate_conformer=True, config=config)
gm.plotly_molecular_graph(g)

## ChEMBL

Similarly, we can map between ChEMBL identifiers and SMILEs. We can also add ChEMBLs rich metadata to the graph

In [None]:
#NBVAL_SKIP
gm.get_smiles_from_chembl("CHEMBL1234")

In [None]:
#NBVAL_SKIP
gm.get_chembl_id_from_smiles("CC(=O)N(O)CCCCCNC(=O)CCC(=O)N(O)CCCCCNC(=O)CCC(=O)N(O)CCCCCN.CS(=O)(=O)O")

### Constructing Graphs from ChEMBL

In [None]:
#NBVAL_SKIP
g = gm.construct_graph(chembl_id="CHEMBL1234")
g.graph["rdmol"]

### Retrieving Metadata from ChEMBL

We can retrieve metadata from chembl both functionally and via the graph construction config.

In [None]:
#NBVAL_SKIP
# Functionally

g = gm.add_chembl_metadata(g)
print(g.graph["chembl_metadata"])

In [None]:
# Via Config
config = gm.MoleculeGraphConfig(graph_metadata_functions=[gm.add_chembl_metadata])
g = gm.construct_graph(chembl_id="CHEMBL1234", config=config)
g.graph["chembl_metadata"]