In [1]:
import requests
import json
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
URIBASE = 'http://java.epa.gov/chemview/'

# Can we get chemical use classification data?

i.e., lists of chemicals classified by use. 

First, get the controlled vocabulary of uses.

In [2]:
uri = URIBASE + 'uses'
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)

In [3]:
print(len(j))

49


In [4]:
DataFrame(j)

Unnamed: 0,id,useName
0,3982369,Abrasive
1,3999991,Adhesive and sealants
2,3353384,Adsorbent and absorbent
3,3978578,Agricultural chemicals (non-pesticidal)
4,3374957,Anti-erosion agent
5,3354035,Bleaching agent
6,3209634,Chelating agent
7,4449226,Children's Products
8,128090,Cleaning agent
9,4449119,Commercial


### Getting the "details" on a use... not so useful

In [5]:
uri = URIBASE + 'uses/124470' # "Flame retardant"
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j

{'id': 124470, 'useName': 'Flame retardant'}

## Can it return a list of chemicals classified with a specific use ID?

Unfortunately, the monster URI that the [documentation](http://java.epa.gov/chemview/resources/ChemView_WebServices.pdf) provides (item 3, p 3) for doesn't really do much, or I am not using it correctly.

In [6]:
uri = URIBASE + 'chemicals/datatable?isTemplateFilter=false&chemicalIds=&snurUseIds=&useIds=124470&groupIds=&categoryIds=&endpointKeys=&synonymIds=&sourceIds='
# &sEcho=4&iColumns=6&sColumns=&iDisplayStart=0&iDisplayLength=10&mDataProp_0=0&mDataProp_1=1\
# &mDataProp_2=2&mDataProp_3=3&mDataProp_4=4&mDataProp_5=5&sSearch=&bRegex=false&sSearch_0=\
# &bRegex_0=false&bSearchable_0=true&sSearch_1=&bRegex_1=false&bSearchable_1=true&sSearch_2=\
# &bRegex_2=false&bSearchable_2=true&sSearch_3=&bRegex_3=false&bSearchable_3=true&sSearch_4=\
# &bRegex_4=false&bSearchable_4=true&sSearch_5=&bRegex_5=false&bSearchable_5=true&iSortCol_0=0\
# &sSortDir_0=asc&iSortingCols=1&bSortable_0=false&bSortable_1=true&bSortable_2=false\
# &bSortable_3=false&bSortable_4=false&bSortable_5=false'
print(uri)
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j

http://java.epa.gov/chemview/chemicals/datatable?isTemplateFilter=false&chemicalIds=&snurUseIds=&useIds=124470&groupIds=&categoryIds=&endpointKeys=&synonymIds=&sourceIds=


{}

Trying something different: learn from the URIs that ChemView generates when you do a search and export the results. 
* Searched for chemicals matching the use "Flame retardant" (from the drop-down menu) in all sources.
* Replaced `mediaType=xls` to retrieve `json` instead in the resulting URI.

...This doesn't work either.

In [7]:
uri = URIBASE + 'datatable?mediaType=json&useIds=124470&sourceIds=2-5-6-7-3-10-9-8-1-16-4-11-1981377'
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
r.text

'<html><head><title>Apache Tomcat/7.0.59 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 404 - </h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u></u></p><p><b>description</b> <u>The requested resource is not available.</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/7.0.59</h3></body></html>'

# Looking at 'sources': can we get SNUR info?


In [8]:
uri = URIBASE + 'sources'
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)

In [9]:
sources_df = DataFrame(j)

In [10]:
sources_df

Unnamed: 0,chemicals,endpointCategories,externalFileUrl,id,inputMode,sortId,sourceDesc,sourceId,sourceLink,sourceName,sourceType,templateType
0,"[{'id': 3978785, 'identifier': None, 'template...",[],http://java.epa.gov/oppt_chemical_search/downl...,2,ETL,101,Chemical Test Rule Data,2,http://www.epa.gov/opptintr/chemtest/pubs/view...,Chemical Test Rule Data,Data Submitted to EPA,Endpoint
1,"[{'id': 168984, 'identifier': '8EHQ-0990-1066'...",[],http://java.epa.gov/oppt_chemical_search/downl...,5,ETL,102,Substantial Risk Reports,5,,8E,Data Submitted to EPA,Form
2,"[{'id': 171405, 'identifier': '86920000890', '...",[],http://java.epa.gov/oppt_chemical_search/downl...,6,ETL,103,Health and Safety Studies,6,,8D,Data Submitted to EPA,Form
3,"[{'id': 117590, 'identifier': None, 'templateI...",[],http://java.epa.gov/oppt_chemical_search/downl...,7,External,104,High Production Volume Information System,7,,HPVIS,Data Submitted to EPA,External
4,"[{'id': 5833944, 'identifier': None, 'template...",[],http://java.epa.gov/oppt_chemical_search/downl...,3,ETL,201,Hazard Characterizations,3,http://iaspub.epa.gov/oppthpv/hpv_hc_character...,HC,EPA Assessments,Endpoint
5,"[{'id': 90420, 'identifier': None, 'templateId...",[],http://java.epa.gov/oppt_chemical_search/downl...,10,External,203,Integrated Risk Information System,10,,IRIS,EPA Assessments,External
6,[],[],http://java.epa.gov/oppt_chemical_search/downl...,13,ETL,204,Screening Work Plan Chemicals,13,,SWPC,EPA Assessments,Form
7,"[{'id': 175658, 'identifier': None, 'templateI...",[],http://java.epa.gov/oppt_chemical_search/downl...,9,ETL,205,Design for the Environment Alternative Assessm...,9,http://www.epa.gov/dfe/alternative_assessments...,DFE AA,EPA Assessments,Form
8,"[{'id': 3210975, 'identifier': 'Processing Aid...",[],http://java.epa.gov/oppt_chemical_search/downl...,8,ETL,206,Design for the Environment: Safer Chemical Ing...,8,,DFE SCIL,EPA Assessments,External
9,"[{'id': 3496613, 'identifier': None, 'template...",[],http://java.epa.gov/oppt_chemical_search/downl...,1,ETL,301,Significant New Use Rules,1,http://www.epa.gov/opptintr/existingchemicals/...,SNUR,EPA Actions,Form


In [11]:
# Calculate the number of items in the 'chemicals' field for each source.
sources_df['num_chems'] = sources_df['chemicals'].apply(len)
sources_df[['sourceId', 'sourceDesc', 'num_chems']]

Unnamed: 0,sourceId,sourceDesc,num_chems
0,2,Chemical Test Rule Data,2
1,5,Substantial Risk Reports,2
2,6,Health and Safety Studies,2
3,7,High Production Volume Information System,2
4,3,Hazard Characterizations,2
5,10,Integrated Risk Information System,2
6,13,Screening Work Plan Chemicals,0
7,9,Design for the Environment Alternative Assessm...,2
8,8,Design for the Environment: Safer Chemical Ing...,2
9,1,Significant New Use Rules,2


In [12]:
sources_df.ix[9,:]

chemicals             [{'id': 3496613, 'identifier': None, 'template...
endpointCategories                                                   []
externalFileUrl       http://java.epa.gov/oppt_chemical_search/downl...
id                                                                    1
inputMode                                                           ETL
sortId                                                              301
sourceDesc                                    Significant New Use Rules
sourceId                                                              1
sourceLink            http://www.epa.gov/opptintr/existingchemicals/...
sourceName                                                         SNUR
sourceType                                                  EPA Actions
templateType                                                       Form
num_chems                                                             2
Name: 9, dtype: object

In [13]:
DataFrame(sources_df.ix[9,0])

Unnamed: 0,endpoints,externalLink,id,identifier,synonyms,templateId
0,[],http://java.epa.gov/oppt_chemical_search/downl...,3496613,,"[{'id': 3496612, 'isUnregistered': False, 'sor...",3493162
1,[],http://java.epa.gov/oppt_chemical_search/downl...,3497805,,"[{'id': 3497804, 'isUnregistered': False, 'sor...",3493230


This tells us that if you ask ChemView for information form SNUR sources, you will get information about... **just two chemicals?**

In [18]:
uri = URIBASE + 'chemicals/f&sourceIds=1' #&chemicalIds=&snurUseIds=&useIds=&groupIds=&categoryIds=&endpointKeys=&synonymIds='
# &sEcho=4&iColumns=6&sColumns=&iDisplayStart=0&iDisplayLength=10&mDataProp_0=0&mDataProp_1=1\
# &mDataProp_2=2&mDataProp_3=3&mDataProp_4=4&mDataProp_5=5&sSearch=&bRegex=false&sSearch_0=\
# &bRegex_0=false&bSearchable_0=true&sSearch_1=&bRegex_1=false&bSearchable_1=true&sSearch_2=\
# &bRegex_2=false&bSearchable_2=true&sSearch_3=&bRegex_3=false&bSearchable_3=true&sSearch_4=\
# &bRegex_4=false&bSearchable_4=true&sSearch_5=&bRegex_5=false&bSearchable_5=true&iSortCol_0=0\
# &sSortDir_0=asc&iSortingCols=1&bSortable_0=false&bSortable_1=true&bSortable_2=false\
# &bSortable_3=false&bSortable_4=false&bSortable_5=false'
print(uri)
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j

http://java.epa.gov/chemview/chemicals/datatable?isTemplateFilter=false&sourceIds=1


{}

## Try to get SNUR information for a known ID

What if we look up info about one of these chemicals, specifying SNURs as the source.

In [14]:
uri = URIBASE + 'chemicals/3554283?sourceIds=1'
print(uri)
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j

http://java.epa.gov/chemview/chemicals/3554283?sourceIds=1


{}

That returned nothing.

OK, we also know that chemical ID 3565112 corresponds to PMN Number P-11-0607 and that [ChemView has a record of the SNURs linked to this substance](http://java.epa.gov/chemview?tf=0&ch=P-11-0607&su=2-5-6-7&as=3-10-9-8&ac=1-16&ma=4-11-1981377&tds=0&tdl=10&tas1=1&tas2=asc&tas3=undefined&tss=&modal=detail&modalId=3565112&modalSrc=1)...

In [15]:
uri = URIBASE + 'chemicals/3565112?sourceIds=1&synonymIds='
print(uri)
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j

http://java.epa.gov/chemview/chemicals/3565112?sourceIds=1&synonymIds=


{'accessionNo': None,
 'casNo': '',
 'epaId': None,
 'id': 3565112,
 'pmnNo': 'P-11-0607',
 'sourceTypes': ['EPA Actions'],
 'sources': [{'chemicals': [{'endpoints': [],
     'externalLink': 'http://java.epa.gov/oppt_chemical_search/download?filename=77_fr_66149_november_2_2012.pdf',
     'id': 3565115,
     'identifier': None,
     'synonyms': [{'chemicalName': 'Polyaromatic Organophosphorus Compound (generic)',
       'id': 3565113,
       'isIupac': False,
       'isRegistry': False,
       'isSystematic': False,
       'isTscaInv': False,
       'isUnregistered': False,
       'isWorkPlan': False,
       'sortOrder': 5},
      {'chemicalName': 'Polyaromatic organophosphorus compound (generic)',
       'id': 3565114,
       'isIupac': True,
       'isRegistry': False,
       'isSystematic': False,
       'isTscaInv': False,
       'isUnregistered': False,
       'isWorkPlan': False,
       'sortOrder': 1}],
     'templateId': 3564983},
    {'endpoints': [],
     'externalLink': 'htt

In [16]:
print(j['sources'][0]['chemicals'][0]['externalLink'])

http://java.epa.gov/oppt_chemical_search/download?filename=77_fr_66149_november_2_2012.pdf


That did return some actual information. The external links about the specific chemicals both point to a PDF of the SNURs published in the Federal Register. We already know that this is not the extent of EPA's public data on these SNURs, so where is it in ChemView?

I navigated to the ChemView record for PMN number P-09-0248 and clicked on it to get a summary of the SNUR:
![screenshot](../cv-pmn-view.png)

Below, I copied the link that it gives you when you click "E-mail Url", but added `&mediaType=json`.

In [17]:
uri = 'http://java.epa.gov/chemview?tf=0&ch=P-09-0248&su=2-5-6-7&as=3-10-9-8&ac=1-16&ma=4-11-1981377&tds=0&tdl=10&tas1=1&tas2=asc&tas3=undefined&tss=&modal=template&modalId=3517608&modalSrc=1&modalDetailId=3517610&mediaType=json'
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j

{}

Apparently these data are not API-present yet.

Trying something else by tweaking the URL from a different search...

In [22]:
uri = 'http://java.epa.gov/chemview?tf=1&su=2-5-6-7&as=3-10-9-8&ac=1-16&ma=4-11-1981377&tds=0&tdl=10&tas1=1&tas2=asc&tas3=undefined&tss=&modal=template&modalId=103298&modalSrc=3&modalDetailId=5636434&modalVae=0-0-1-0-0&mediaType=json'
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j

{}