Dr Oliviero Andreussi, olivieroandreuss@boisestate.edu

Boise State University, Department of Chemistry and Biochemistry

# Python Tools for Chemistry Applications

Before we start, let us import some of the main modules that we will need for this lecture. These modules have already been introduced in the previous lecture. However, in the following we will introduce some new modules, we will add more details about them in the right sections.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

You should now specify the local path to the folder containing your data files. Remember to put a '/' at the end of the path and double check that the path looks right

In [None]:
base_path = '/content/gdrive/MyDrive/' # this is the default path of your google drive
my_path = 'Colab Notebooks/Test_Files/' # make sure you change this to the correct path of the folder with the files
path = base_path + my_path

## Write Chemical Formulas and Equations in Markdown

You can use a special extension of the formula editor used by Colab in Text cells to write chemical formulas in a nice way and to write chemical equations. You have to make sure that `$$\require{mhchem}$$` is included for Colab to produce the right imaga. More info on how to use it can be found at this [url](https://mhchem.github.io/MathJax-mhchem/).

$$\require{mhchem}$$
I really like $\ce{H2O}$, although $\ce{CH3CH2OH}$ is not bad either. 

Is this correct?

$$\ce{H2(g) + O2(g)->H2O(aq)}$$

## Display Molecular Structures

RDkit is a great module developed by Greg Landrum and collaborators to manage molecules and do some chemoinformatic and machine-learning tasks in the chemical space. It is not part of the standard installation of Python on Colab, so you will need to install rdkit on Colab before using it. Lukily, this process is pretty straightforward, although you may have to wait a bit.

In [None]:
!pip install rdkit
from rdkit import Chem

The main way we will be handling molecules is via SMILES (Simplified molecular-input line-entry system), which is a convention to write organic molecules as a string (some rules and examples can be found [here](https://www.daylight.com/dayhtml/doc/theory/theory.smiles.html))

In [None]:
smile = 'CCO'
molecule = Chem.MolFromSmiles(smile)
molecule

In [None]:
type(molecule)

The object generated by the command is `rdkit.Chem.rdchem.Mol`, an object of RDkit that allows many operations (you can get descriptors, you can modify parts, you can get information on the molecule, etc.). While on a notebook just typing the object will result in displaying its chemical structure, RDkit has a function dedicated to convert a molecule into an image (and it allows you to specify the size of the image)

In [None]:
smile = 'CC(O)'
Chem.Draw.MolToImage( Chem.MolFromSmiles(smile), size=(150, 150) )

These images can be embedded into a plot, if you would like to visualize which molecule was measured in your experiments.

In [None]:
file='bomb.csv'
data=pd.read_csv(path+file,names=['HH:MM:SS','Temperature','Time'])
plt.plot(data['Time'],data['Temperature'],label='Run 1')
plt.xlabel('Time (s)')
plt.ylabel('Temperature ($^\circ$C)')
plt.legend()
#
ax = plt.axes([0.5, 0.2, 0.38, 0.38])
smile='O=C(O)c1ccccc1'
im=Chem.Draw.MolToImage( Chem.MolFromSmiles(smile), size=(150, 150) )
ax.imshow(im)
ax.axis('off')
#
plt.show()

With a similar function, we can convert multiple SMILES into a chemical reaction

In [None]:
Chem.rdChemReactions.ReactionFromSmarts('C=CCBr.[Na+].[I-]>CC(=O)C>C=CCI.[Na+].[Br-]', useSmiles=True)

More information on the syntax of RDkit functions can be found on its online documentation, [here](https://rdkit.org/docs/index.html).

## Connect Names-CAS-SMILES with CIRpy

CIRpy stands for Chemical Identifier Resolver and is a Python module developed by the CADD Group at the NCI/NIH. CIR is a web service that will resolve any chemical identifier to another chemical representation. More documentation can be found [here](https://cirpy.readthedocs.io/en/latest/index.html). Also in this case, we will need to install the module on Colab before using it.

In [None]:
!pip install cirpy
import cirpy

One of the main utilities of CIRpy, is the `resolve()` function. This function can take in input any way to refer to a molecule (its official name, common name, CAS number, SMILES, etc.) and it can convert it into any other info we need on that molecule. In the following I report some examples.

Given that we know who to visualize structures from SMILES with RDkit, we could use CIRpy to give us the smile of a molecule for which we know the common name

In [None]:
smile=cirpy.resolve('paracetamol','smiles')
Chem.MolFromSmiles(smile)

we could ask for the CAS number of a molecule

In [None]:
cirpy.resolve('O=C(O)c1ccccc1','cas')

or the IUPAC name

In [None]:
cirpy.resolve('O=C(O)c1ccccc1','iupac_name') # try specifying 'names' instead

or convert a CAS number into a SMILES

In [None]:
smile=cirpy.resolve('61-73-4','smiles')
Chem.MolFromSmiles(smile)

CIRpy can also be used to access basic properties of a molecule, such as its chemical formula

In [None]:
print(cirpy.resolve('aspirin','formula'))

or the coordinates of its atoms (in many formats useful for different simulation programs)

In [None]:
print(cirpy.resolve('aspirin','pdb'))

Like RDkit, CIRpy has an internal definition of what a molecule is (in this case it is the `CIRpy.Molecule` object). We can create a molecule with any of its names/CAS/SMILES

In [None]:
molecule=cirpy.Molecule('paracetamol')

Once we have a molecule, we can get a link to its 3D representation

In [None]:
molecule.twirl_url

or get its molecular weight

In [None]:
molecule.mw

In [None]:
# This cell is used to allow Google Colab to install the tools to convert the notebook to a pdf file
# Un-comment the following lines when you are ready to export the pdf 
#!apt-get install texlive texlive-xetex texlive-latex-extra pandoc
#!pip install pypandoc

In [None]:
# Use this command to convert the finished worksheet into a pdf 
# NOTE : you may want to change the path of the file, if you are working in a different folder of the Google Drive
#!jupyter nbconvert --no-input --to PDF "/content/drive/MyDrive/Colab Notebooks/Chemistry_Tools.ipynb"