# CAS Common Chemistry API in Mathematica

by Vishank Patel

These recipe examples were tested on March 31, 2022 using Mathematica 12.3.

**CAS Common Chemistry API Documentation (requires registration):** https://www.cas.org/services/commonchemistry-api

**Attribution:** This tutorial uses the [CAS Common Chemistry](https://commonchemistry.cas.org/) API. Example data shown is licensed under the [CC BY-NC 4.0 license](https://creativecommons.org/licenses/by-nc/4.0/).


## 1. Common Chemistry Record Detail Retrieval

Information about substances in CAS Common Chemistry can be retrieved using the `/detail` API and a CAS RN identifier:

### Setup API Parameters

In [None]:
detailBaseURL = "https://commonchemistry.cas.org/api/detail?";
casrn1 = "10094-36-7" ; (*ethyl cyclohexanepropionate*)

### Request data from CAS Common Chemistry Detail API

In [None]:
casrn1Data = Import[detailBaseURL <> "cas_rn=" <> casrn1, "RawJSON"];
casrn1Data

*Output not shown*

In [None]:
casrn1Data //OutputForm (*Changed to plain text output*)

### Display the Molecule Drawing

In [None]:
MoleculePlot[casrn1Data["smile"]]

### Select some specific data

Get experimental properties:

In [None]:
casrn1Data["experimentalProperties"][[1]]

Get boiling point property

In [None]:
casrn1Data["experimentalProperties"][[1]]["property"]

Get InChIKey

In [None]:
casrn1Data["inchiKey"]

In [None]:
casrn1Data["canonicalSmile"]

## 2. Common Chemistry API record detail retrieval in a loop

### Setup API parameters

In [None]:
detailBaseURLLoop = "https://commonchemistry.cas.org/api/detail?";
casrnList = {"10094-36-7", "10031-92-2", "10199-61-8", "10036-21-2","1019020-13-3"};

### Request data for each CAS RN and save to a list

In [None]:
casrnData = {};
For[i = 1, i <= Length[casrnList], i++,
 AppendTo[casrnData,
  Import[detailBaseURL <> "cas_rn=" <> casrnList[[i]], "RawJSON"]]
 ]

In [None]:
casrnData[[1]]

*Output not shown*

In [None]:
casrnData[[1]] //OutputForm (*Changed to plain text output*)

In [None]:
plotList = {};
For[i = 1, i <= Length[casrnList], i++,
 AppendTo[plotList, MoleculePlot[casrnData[[i]]["smile"]]]
 ]

In [None]:
plotList

### Select some specific data

Get canonical SMILES

In [None]:
cansmiles = {};
For[i = 1, i <= Length[casrnList], i++,
 AppendTo[cansmiles,
  casrnData[[i]]["canonicalSmile"]]
 ]
cansmiles

In [None]:
synonymsList = {};
For[i = 1, i <= Length[casrnList], i++,
 AppendTo[synonymsList,
  casrnData[[i]]["synonyms"]]
 ]
synonymsList

Transform synonym "list of lists" to a flat list

In [None]:
Flatten[synonymsList]

### Create a dataset

In [None]:
Table[casrnData[[All, i]], {i, {"uri", "rn", "name"}}] // Dataset

## 3. Common Chemistry Search

In addition to the `/detail` API, the CAS Common Chemistry API has a `/search` method that allows searching by CAS RN, SMILES, InChI/InChIKey, and name.

### Setup API Parameters

In [None]:
searchBaseURL = "https://commonchemistry.cas.org/api/search?q=";

(*InChIKey for Quinine*)
IK = "InChIKey=LOUPRKONTZGTKE-WZBLMQSHSA-N";

### Request data from CAS Common Chemistry Search API

Search query:

In [None]:
quinineSearchData = Import[searchBaseURL <> IK, "RawJSON"];
quinineSearchData

Note that with the CAS Common Chemistry Search API, only the image data, name, and CAS RN is returned. In order to retrieve the full record, we can combine our search with the related detail API:

In [None]:
quinineRN = quinineSearchData["results"][[1, "rn"]]

In [None]:
detailBaseURL = "https://commonchemistry.cas.org/api/detail?";
quinineDetailData = Import[detailBaseURL <> "cas_rn=" <> quinineRN, "RawJSON"];
quinineDetailData

*Output not shown*

In [None]:
quinineDetailData //OutputForm (*Changed to plain text output*)

### Handle multiple results

Setup search query parameters

In [None]:
searchBaseURL = "https://commonchemistry.cas.org/api/search?q=";
smiBD = "C=CC=C"; (*SMILES for butadiene*)

Request data from CAS Common Chemistry Search API

In [None]:
smiSearchData = Import[searchBaseURL <> smiBD, "RawJSON"];

In [None]:
smiSearchData["count"]

Extract out CAS RNs

In [None]:
smicasRNList = smiSearchData["results"][[All, "rn"]]

Now use the detail API to retrieve full records

In [None]:
detailBaseURL = "https://commonchemistry.cas.org/api/detail?";
smiDetailData = {};
For[i = 1, i <= Length[smicasRNList], i++,
 AppendTo[smiDetailData,
  Import[detailBaseURL <> "cas_rn=" <> smicasRNList[[i]],"RawJSON"]];
 Pause[1]  (*Adding a delay between API calls*)
 ]

Get some specific data such as name from the detail records

In [None]:
names = smiDetailData[[All, "name"]]

### Handle multiple page results

The CAS Common Chemistry API returns 50 results per page, and only the first page is returned by default. If the search returns more than 50 results, the offset option can be added to page through and obtain all results.

Setup Search query parameters

In [None]:
searchBaseURL = "https://commonchemistry.cas.org/api/search?q=";
n = "selen*";

Get results count for CAS Common Chemistry Search

In [None]:
numResults = Import[searchBaseURL <> n, "RawJSON"]["count"]

Request data and save to a list in a loop for each page

In [None]:
nSearchData = {};
For[i = 0, i <= IntegerPart[numResults/50 + 1], i++,
 pageData = Import[searchBaseURL <> n <> "&offset=" <> ToString[i*50],"RawJSON"];
 AppendTo[nSearchData, pageData];
 Pause[1];
 ]

In [None]:
Length[nSearchData[[1]]["results"]]
Length[nSearchData[[2]]["results"]]
Length[nSearchData[[3]]["results"]]
Length[nSearchData[[4]]["results"]]

We can index and extract out the first CAS RN like this

In [None]:
nSearchData[[1]]["results"][[1]]["rn"]

Extract out all CAS RNs from the list of lists

In [None]:
nCasRNList = Flatten[nSearchData[[All, "results"]][[All, All, "rn"]]];
nCasRNList //Shallow //OutputForm

Now we can loop through each casrn and use the detail API to obtain the entire record.
This will query CAS Common Chem 191 times and take ~5 min.

In [None]:
detailBaseURL = "https://commonchemistry.cas.org/api/detail?";
nDetailData = {};
For[i = 1, i <= Length[nCasRNList], i++,
 AppendTo[nDetailData,
  Import[detailBaseURL <> "cas_rn=" <> nCasRNList[[i]], "RawJSON"]];
 Pause[1] (*Add a delay between API calls*)
 ]

Extracting out some data such as Molecular Mass:

In [None]:
nDetailData[[All, "molecularMass"]]

As there are many empty strings, we will replace them all with "Nothing"

In [None]:
mmStrings = nDetailData[[All, "molecularMass"]] /. "" -> Nothing

Converting the string elements into real numbers by mapping ToExpression to each element of the list:

In [None]:
mm = ToExpression /@ mmStrings

Finally, we can even quickly create a simple visualization from the extracted molecularMass values (from the selen* search):

In [None]:
Histogram[mm, AxesLabel -> {"molecularMass", "Count"}]