# Prerequisites:
- Importing libraries
- Molecule objects in RDKit
- SMILES molecular line notation
- Molecular file formats (MDF Mol)

# Learning outcomes
- Become familiar with the use of RDKit to draw 2D molecular structures
- Use both SMILES strings and molecule files as input to create RDKit molecule objects

# Drawing molecules with RDKit

RDKit is a free, massively powerful library of cheminformatics tools. An overview of the RDKit package (and its full documentation) can be accessed here https://www.rdkit.org/docs/Overview.html


The RDKit Python library makes drawing molecules simple. It requires two steps:


1.   Create a molecule object that RDKit can operate on
2.   Use the Draw function in RDKit's chemistry module to create the 2D image corresponding to the molecule object

In order to do this, we must first tell Python to import the requires functionality from the RDKit package. Then, we are free to build our molecule and plot it.


# SMILES
Here we specify the molecular structure using the Simplified Molecular Input Line Entry System (SMILES) notation.

Inspect, then run the following code to draw the benzene molecule.

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw
m1 = Chem.MolFromSmiles('c1ccccc1')
Draw.MolToImage(m1)

Here, we create a molecule object using the `MolFromSmiles()` function. RDKit reads the SMILES string and creates a 2D representation of the corresponding molecule. This is stored in the variable `m`.

Once we have the `m` object, we can perform many different operations on it with RDKit, but here we focus on drawing the molecular structure. To do this we use the second import statement to give us access to RDKit's drawing functions.

The function that we use in this example, `Draw.MolToImage()`, draws the molecular structure to the screen. However, you could also save your image to a file using the `Draw.MolToFile()` function for later insertion into e.g. Word documents.

By editing the SMILES string, we can alter the structure contained in the RDKit molecule object, and therefore the output image.

Let's create toluene - you can do this by simply adding another carbon to the SMILES string.

Note, however, that we are adding a capital C. In SMILES notation, capital "C" denotes an aliphatic sp$^3$ carbon and lowercase "c" gives an aromatic sp$^2$ carbon.

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw
m = Chem.MolFromSmiles('c1ccccc1C')
Draw.MolToImage(m)

Similarly, we can change our benzene example to give us pyridine by replacing a ring carbon with nitrogen. We use a lowercase "n" because the pyridine nitrogen is also sp$^2$/aromatic.

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw
m = Chem.MolFromSmiles('n1ccccc1')
Draw.MolToImage(m)

For simple molecules, editing the SMILES string by hand is fairly straightforward. This is not really the case for "interesting" molecules. In such cases, one alternative is to use an online source to obtain the SMILES for your molecule.

The PubChem service (https://pubchem.ncbi.nlm.nih.gov/) is particularly useful for this. In the following example, PubChem was searched for "asprin". The entry for this compound contains the corresponding SMILES string, which was used here.

NOTE: An alternative SMILES format is used here that explicitly shows the position of double bonds (denoted "="), so no distinction is made between aromatic and aliphatic atoms. You can read more about SMILES grammar at http://opensmiles.org/opensmiles.html

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw
m = Chem.MolFromSmiles('CC(=O)OC1=CC=CC=C1C(=O)O')
Draw.MolToImage(m)

# Molecular structure files

An alternative input method to the SMILES strings, above, uses external files containing information on the elemental makup of the molecule in question and its geometry/bonding.

RDKit can read several molecule file formats, but the next example uses a file in the "MOL" format.

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw
m = Chem.MolFromMolFile('Asprin.mol')
Draw.MolToImage(m)

NOTE: The orientation of the the asprin molecule is different because the order of the atoms in the .mol file is different from that given in the SMILES string. This means the "starting" atom for the drawing changes.

# Practice
Search PubChem for the data on paracetamol. Copy the SMILES string and edit/run the following Python code to draw the paracetamol molecule.

Remember to put the SMILES string in quotes.

In [None]:
from rdkit import Chem
from rdkit.Chem import Draw
m = Chem.MolFromSmiles()
Draw.MolToImage(m)

# TODO:
- Clarify different SMILES formats (or simply tidy them up to use a single format - probably better for introductory students)
- Add examples using structure files as input instead of SMILES for more complex molecules to show limitations of 2D depiction (then add 3D images?)