### Step 1: Download data from the ToxRefDB API

We use the requests module to perform a normal HTTP GET request. The structure of the API is documented at the [official documentation](https://toxrefdb-api.cloud.douglasconnect.com/). We check if the status code is 200 (HTTP OK), then decode the returned Json and extract an array of compounds containing cas number and chemical name.

In [1]:
import requests
r = requests.get('https://toxrefdb-api.cloud.douglasconnect.com/beta/compounds?limit=10000')
print("ToxRefDB Status code: {0}".format(r.status_code))
if r.status_code == 200:
    result = r.json()
    toxRefCompounds =[{'cas': result['casNumber'], 'chemicalName' : result['chemicalName']} for result in result['compounds'] ]
    toxRefCasNrs = set(compound['cas'] for compound in toxRefCompounds)
    print("{0} compounds ingested from ToxRefDB".format(len(toxRefCasNrs)))

ToxRefDB Status code: 200
980 compounds ingested from ToxRefDB


### Step 2: Download data from the ToxCast API

Same as above, but for the ToxCast API

In [2]:
import requests
r = requests.get('https://toxcast-api.cloud.douglasconnect.com/beta/compounds?limit=10000')
print("ToxCast Status code: {0}".format(r.status_code))
if r.status_code == 200:
    result = r.json()
    toxCastCompounds =[{'cas': result['casn'], 'chemicalName' : result['chnm']} for result in result['compounds'] ]
    toxCastCasNrs = set(compound['cas'] for compound in toxCastCompounds)
    print("{0} compounds ingested from ToxCast".format(len(toxCastCasNrs)))

ToxCast Status code: 200
9086 compounds ingested from ToxCast


### Step 3: Compute set intersection, use results

In this simple example we just output the number of items that are in both ToxRefDB and ToxCast and show the names of the first 3 compounds.

In [3]:
sharedCasNrs = toxCastCasNrs.intersection(toxRefCasNrs)

sharedCompounds = [ compound for compound in toxRefCompounds if compound['cas'] in sharedCasNrs ]
print("ToxRefDB and ToxCast share {0} compounds (by CAS number)".format(len(sharedCompounds)))

firstThreeCompoundNames = map(lambda comp: comp['chemicalName'], sharedCompounds[:3])
print("The first 3 compound names from the set of shared compounds are: {0}".format(", ".join(firstThreeCompoundNames)))

ToxRefDB and ToxCast share 790 compounds (by CAS number)
The first 3 compound names from the set of shared compounds are: Tebuthiuron, Hexythiazox, Triflumizole
