# Recipe: Accessing the IUPAC Gold Book API in Python

```{dropdown} About this interactive ![icons](../images/rocket.png) recipe
- Author: [Stuart Chalk](https://orcid.org/0000-0002-0703-7776)
- Reviewer:
- Topics: The IUPAC Gold Book, APIs, JSON
- Format: Interactive Jupyter Notebook (Python)
- Scenarios: Retrieve the definition of a chemical concept via code
- Skills: You should be familiar with
    - [Application Programming Interfaces (APIs)](https://www.ibm.com/topics/api)
    - [The JavaScript Object Notation (JSON) file format](https://www.w3schools.com/js/js_json_intro.asp)
    - [Introductory Python](https://www.youtube.com/watch?v=kqtD5dpn9C8)
- Learning outcomes: After completing this example you should understand:
    - What is a Python function (def)
    - How to write Python code to request data from a URL (typically an API)
    - How to use a Python variable to call an API and download data
- Citation: 'Recipe: Accessing the IUPAC Gold Book API on Python', The IUPAC FAIR Chemistry Cookbook, https://iupac.github.io/WFChemCookbook/recipes/goldbook.html
- Reuse: This notebook is made available under a [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
```

## Step 1: Import needed Python packages
Python has a lot of functionality that can be imported using the 'import' function

In [None]:
import requests                             # package to get data from a URL
import json                                 # package to read/write/display JSON formatted data
import re                                   # package to use regular expression (regex) searching

## Step 2: Add a Python function
This function removes HTML tags from textual data

In [None]:
# Source: https://medium.com/@jorlugaqui/how-to-strip-html-tags-from-a-string-in-python-7cb81a2bbf44
def remove_html_tags(text):                 # a 'def' is a (defined) function that can be called later
    clean = re.compile('<.*?>')             # sets up a regular expression to search with
    return re.sub(clean, '', text)          # removes the matches to the regular expression

## Step 3: Download a JSON file
Frab data for all the IUPAC Recommended Terms currently available

In [None]:
allpath = "https://goldbook.iupac.org/terms/index/all/json"  # URL to the IUPAC Gold Book API down
reqdata = requests.get(allpath)                              # download file in JSON
terms = json.loads(reqdata.content)                          # convert JSON to a Python dictionary
print(str(len(terms['terms']['list'])) + ' terms')           # print the number of terms in the list

## Step 4: Search for a term
Search the recommended term list and if present get the terms code

In [None]:
searchterm = "cis-trans isomers"                            # the term to be found
searchcode = None                                           # empty variable to contain the searchcode
for code, term in terms['terms']['list'].items():           # iterate over each term in the list (code (str), term (obj))
    cleaned = remove_html_tags(term['title'])               # remove any HTML formatting in the title
    if cleaned == searchterm:                               # check if the term matches the one we want
        searchcode = code                                   # if it does, get the code for the term
        break                                               # we have found the term so we can get out of the for loop
print(searchcode)                                           # IUPAC Gold Book term code (if found)

## Step 5: Use the term code to retrieve its definition
Generate a URL to get data about a term, print out the term, its code and its definition

In [None]:
path = "https://goldbook.iupac.org/terms/view/**/json"      # URL path to the IUPAC Gold Book API for a term
reqdata = requests.get(path.replace("**", searchcode))      # request data from the Gold Book server
jsondata = json.loads(reqdata.content)                      # get the downloaded JSON
print(searchterm + " (" + searchcode + ")")                 # print the title and Gold Book term code
print(jsondata['term']['definitions'][0]['text'])           # print the definition of the term

## Step 6: Try other terms
Change the value of the 'searchterm' variable above and rerun steps 4 and 5