## Recipe: Using the OPSIN API in Python

- Author: [Stuart Chalk](https://orcid.org/0000-0002-0703-7776)
- Reviewer:
- Topic: How to use the OPSIN API to retrieve chemical identifiers based on IUPAC compound names
- Format: Interactive Jupyter Notebook (Python)
- Skills: You should be familiar with
    - [Application Programming Interfaces (APIs)](https://www.ibm.com/topics/api)
    - [IUPAC Naming of Organic Compounds](https://iupac.qmul.ac.uk/BlueBook/)
    - [Chemical Identifiers](https://chem.libretexts.org/Courses/University_of_Arkansas_Little_Rock/ChemInformatics_(2015)%3A_Chem_4399_5399/Text/5_Chemical_Identifiers)
- Learning outcomes: After completing this example you should understand:
    - How to write Python code to request data from a URL (typically an API)
    - How to use a Python variable to call an API and download data for the content of the variable
    - How to access an image file from the OPSIN image API
    - How to use regular expressions (regex) to extract data from strings
- Citation: 'Recipe: Using the OPSIN API in Python', The IUPAC FAIR Chemistry Cookbook, https://iupac.github.io/WFChemCookbook/recipes/opsin.html
- Reuse: This notebook is made available under the IUPAC FAIR Chemistry Cookbook MIT license.

> **_NOTE:_**  This is an interactive recipe! Run it in Binder or Colab (see rocket icon ![icons](../images/icons.png) above)

### Step 1: Import the Python packages needed to run this code

In [None]:
from IPython.display import Image, display  # package to run Python in a Jupyter notebook
import requests                             # package to get data from a URL
import json                                 # package to read/write/display JSON
import re                                   # package to use regular expression (regex) searching

### Step 2: Call the OPSIN data API and get metadata about a compound

In [None]:
# format of API request is 'https://opsin.ch.cam.ac.uk/opsin/<IUPAC compound name>'
path = "https://opsin.ch.cam.ac.uk/opsin/"  # URL path to the OPSIN API
name = "propan-2-one"                       # IUPAC name of a chemical compound, ion or element
reqdata = requests.get(path + name)         # get is a method of request data from the OPSIN server
jsondata = reqdata.json()                   # get the downloaded JSON
del jsondata['cml']                         # remove the cml element of the JSON for nicer display
print(json.dumps(jsondata, indent=4))       # print the JSON in a nice format

### Step 3: Call the OPSIN image API to see a picture of the molecule

In [None]:
# format of API request is 'https://opsin.ch.cam.ac.uk/opsin/<IUPAC compound name>.png'
reqimg = requests.get(path + name + ".png") # request the image of the compound
display(Image(reqimg.content))              # display the image

### Step 4: Extract the formula of the substance from a standard InChI string using regex

In [None]:
match = re.findall('1S/(.+?)/', jsondata['stdinchi'])   # match the formula using regex string
print(match[0])                                         # print the first (only) match

### Step 5: Try other compounds by changing the value of the 'name' variable above and rerunning steps 2, 3 and 4