The U.S. Fish and Wildlife Service [7-year Work Plan](https://www.fws.gov/endangered/esa-library/pdf/Listing%207-Year%20Workplan%20Sept%202016.pdf) has been the subject of collaboration between USGS and FWS on the current state of research for these species. As part of this, the Biogeographic Characterization Branch has put together a short review of what our process can say about the current state of data for these species. This is based on the Taxa Information Registry module of the Biogeographic Information System, an intelligent platform we are building to bring together all of our work into a cohesive whole.

This notebook is the result of some previous individual experimentation from members of our staff captured in the GitHub repo where this notebook lives. It runs through all of the processes we've used to assemble a master database for further analysis and reporting.

In [5]:
from pybis import db
from IPython.display import display
from datetime import datetime
import pandas as pd
gc2BaseSQLURL = "https://beta-gc2.datadistillery.org/api/v1/sql/bcb"

These are a couple of new functions I created to support the process of getting data from the spreadsheet into our database for further processing. The lookupState function may be useful elsewhere. It uses the "us" Python package to pull together state names and FIPS codes from the state abbreviations in the source data.

In [2]:
def packageESASpeciesRow(row):
    submittedData = {}
    submittedData["Scientific Name"] = row["Scientific Name (Revised List)"]
    submittedData["Species Record Reference"] = row["ScientificNameLink"] 
    submittedData["Common Name"] = row["Species Name (Common)"]
    submittedData["Grouping"] = row["Grouping"]
    submittedData["Lead FWS Region"] = row["Lead FWS Regional Office"]
    submittedData["Species Range"] = row["Species Range"]
    return submittedData

def lookupState(stateAbbr):
    import us
    try:
        return {"name":us.states.mapping('abbr', 'name')[stateAbbr],"fips":us.states.mapping('abbr', 'fips')[stateAbbr]}
    except:
        return None

# Import Spreadsheet
The FWS work plan (linked in the intro) has only common names. Staff in the Ecosystems Mission Area put together a spreadsheet that has all of those species but resolves some things to include scientific name and a link to the FWS Ecological Conservation Online system record for the species. This is a really helpful starting point in making sure we have the right taxonomy and links to other systems. To process this, I pulled the links out as a separate column in the spreadsheet and loaded it up into the same folder with this code for processing. We are loading these data into a MongoDB database system we have running on the ESIP Testbed under the DataDistillery project we are continuing to build there.

This code uses Pandas as an expedient way of reading the spreadsheet data from the file into a dataframe.

In [3]:
df = pd.read_excel("FWS ESA Work Plan Species list for CSS.xlsx", sheet_name='Sheet1')
display (df)

Unnamed: 0,Grouping,Species Name (Common),Scientific Name (Revised List),Lead FWS Regional Office,Species Range,ScientificNameLink
0,Amphibians,streamside salamander,Ambystoma barbouri,R4,"AL, KY, OH, TN, WV",https://ecos.fws.gov/ecp/species/9776
1,Amphibians,Boreal toad (Eastern population),Anaxyrus boreas boreas,R6,"CO, ID, NM, NV, UT, WY",https://ecos.fws.gov/ecp/species/1114
2,Amphibians,Inyo Mountains slender salamander,Batrachoseps campi,R8,CA,https://ecos.fws.gov/ecp/species/2095
3,Amphibians,lesser slender salamander,Batrachoseps minor,R8,CA,https://ecos.fws.gov/ecp/species/9277
4,Amphibians,relictual slender salamander,Batrachoseps relictus,R8,CA,https://ecos.fws.gov/ecp/species/7408
5,Amphibians,Kern Plateau salamander,Batrachoseps robustus,R8,CA,https://ecos.fws.gov/ecp/species/9274
6,Amphibians,Kern Canyon slender salamander,Batrachoseps simatus,R8,CA,https://ecos.fws.gov/ecp/species/5736
7,Amphibians,Oregon slender salamander,Batrachoseps wrighti,R1,"OR, WA",https://ecos.fws.gov/ecp/species/913
8,Amphibians,Arizona toad,Bufo microscaphus microscaphus,R2,"AZ, CA, NM, NV, UT",https://ecos.fws.gov/ecp/species/2077
9,Amphibians,hellbender,Cryptobranchus alleganiensis,R3,"AL, AR, GA, IL, IN, KY, MD, MO, MS, NC, NY, OH...",https://ecos.fws.gov/ecp0/profile/speciesProfi...


This code uses a private package that we keep secure to connect to the MongoDB instance on the DataDistillery platform and establish a connection to a collection for these data.

In [4]:
bisDB = dd.getDB("bis")
esaWPSpecies = bisDB["FWS_Work_Plan_Species"]

This code runs through the spreadsheet and puts all of the data into an array for submission to the collection.

In [5]:
fwsESASpeciesList = []

for index,row in df.iterrows():
    speciesRecord = {}
    speciesRecord["Submitted Data"] = packageESASpeciesRow(row)
    speciesRecord["Processing Metadata"] = {"Date Created from Source":datetime.utcnow().isoformat()}
    fwsESASpeciesList.append(speciesRecord)

esaWPSpecies.delete_many({})
esaWPSpecies.insert_many(fwsESASpeciesList)

<pymongo.results.InsertManyResult at 0x10695eca8>

The FWS work plan includes an indication of species range as a list of states and territories. This seems like an interesting point of comparison with other sources of range information that we might want to analyze, and this code block breaks out the state/territory information into a more robust data structure for later processing and adds in FIPS codes.

In [6]:
for record in esaWPSpecies.find():
    fwsRange = {"US States":[],"US State List":[],"Other Places":[]}
    for rangePlace in record["Submitted Data"]["Species Range"].replace(", ",",").split(","):
        rangeState = lookupState(rangePlace)
        if rangeState is None:
            fwsRange["Other Places"].append({rangePlace:{}})
        else:
            fwsRange["US States"].append({rangePlace:rangeState})
            fwsRange["US State List"].append(rangePlace)

    if len(fwsRange["Other Places"]) == 0:
        del fwsRange["Other Places"]

    esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"FWS Range":fwsRange}})

In [133]:
# Cleanup by putting state abbreviations into a comparable list
for record in esaWPSpecies.find({},{"FWS Range.US State List":1}):
    fwsRange = []
    for stateAbbrev in record["FWS Range"]["US State List"]:
        fwsRange.append(lookupState(stateAbbrev)["name"])
    esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"Synthesis.FWS Range List":fwsRange}})

In [7]:
display(esaWPSpecies.find_one())

{'FWS Range': {'US State List': ['AL', 'KY', 'OH', 'TN', 'WV'],
  'US States': [{'AL': {'fips': '01', 'name': 'Alabama'}},
   {'KY': {'fips': '21', 'name': 'Kentucky'}},
   {'OH': {'fips': '39', 'name': 'Ohio'}},
   {'TN': {'fips': '47', 'name': 'Tennessee'}},
   {'WV': {'fips': '54', 'name': 'West Virginia'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.178251'},
 'Submitted Data': {'Common Name': 'streamside salamander',
  'Grouping': 'Amphibians',
  'Lead FWS Region': 'R4',
  'Scientific Name': 'Ambystoma barbouri',
  'Species Range': 'AL, KY, OH, TN, WV',
  'Species Record Reference': 'https://ecos.fws.gov/ecp/species/9776'},
 '_id': ObjectId('5aa001d40601ba00d3c84a0b')}

# ECOS Scrape
The spreadsheet of species on the petition list included links to the FWS Ecological Conservation Online System for most of the species. After working through a few issues trying to connect to the FWS Threatened and Endangered Species (TESS) API for the species, I found it necessary to scrape the ECOS web pages for some additional information. It turns out that the ECOS system has multiple identifiers for species that seem to be used in various parts of the data model and that are not all readily available through their APIs. The public web pages seem to assemble a lot of this information from various places through a back-end app of some kind, but there is no real API that I could find to work against for everything. In order to reliably understand and work with the connections to other systems that FWS folks have put together, it seemed like we needed to go ahead and parse out some information from the human-readable web pages into usable data. This code does that using BeautifulSoup.

In [8]:
import requests
from bis import tess
from bs4 import BeautifulSoup

In [9]:
for record in esaWPSpecies.find({"$and":[{"ECOS Scrape":{"$exists":False}},{"Submitted Data.Species Record Reference":{"$not":{"$eq":float("nan")}}}]}):
    ecosScrape = {}
    ecosScrape["url"] = record["Submitted Data"]["Species Record Reference"]
    
    if ecosScrape["url"].find("spcode") > -1:
        ecosScrape["SPCODE"] = ecosScrape["url"].split("=")[1]
    
    ecosContent = requests.get(ecosScrape["url"]).content
    soup = BeautifulSoup(ecosContent,"lxml")

    title = str(soup.find("title").string)
    if title.find("(") == -1:
        ecosScrape["Scientific Name"] = title.replace("Species Profile for ","").strip()
    else:
        ecosScrape["Common Name"] = title.replace("Species Profile for ","").split("(")[0].strip()
        ecosScrape["Scientific Name"] = title.replace("Species Profile for ","").split("(")[1].replace(")","").strip()
        
    
    itisDiv = soup.find("div", {"class": "taxonomy new-row"})
    if itisDiv is not None:
        itisLink = itisDiv.find("a", href=True)
        ecosScrape["TSN"] = itisLink["href"].split("&")[1].split("=")[1]
    
    esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"ECOS Scrape":ecosScrape}})

In [10]:
# Show what one of the records looks like at this point with the ECOS scraped information on board.
display(esaWPSpecies.find_one({"ECOS Scrape":{"$exists":True}}))

{'ECOS Scrape': {'Common Name': 'Streamside salamander',
  'Scientific Name': 'Ambystoma barbouri',
  'TSN': '208204',
  'url': 'https://ecos.fws.gov/ecp/species/9776'},
 'FWS Range': {'US State List': ['AL', 'KY', 'OH', 'TN', 'WV'],
  'US States': [{'AL': {'fips': '01', 'name': 'Alabama'}},
   {'KY': {'fips': '21', 'name': 'Kentucky'}},
   {'OH': {'fips': '39', 'name': 'Ohio'}},
   {'TN': {'fips': '47', 'name': 'Tennessee'}},
   {'WV': {'fips': '54', 'name': 'West Virginia'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.178251'},
 'Submitted Data': {'Common Name': 'streamside salamander',
  'Grouping': 'Amphibians',
  'Lead FWS Region': 'R4',
  'Scientific Name': 'Ambystoma barbouri',
  'Species Range': 'AL, KY, OH, TN, WV',
  'Species Record Reference': 'https://ecos.fws.gov/ecp/species/9776'},
 '_id': ObjectId('5aa001d40601ba00d3c84a0b')}

# TESS
The links for most species to ECOS included in the spreadsheet and mentioned above contains the "SPCODE" identifier from that system. This identifier is different from the SPCODE or ENTITY_ID that is available in other parts of ECOS, and there does not appear to be a public API available to key on that identifier. The web links lead to public landing pages for the species that have a collection of useful information that we may look to parse out for analysis later. For now, we use the species scientific name to find the species in ECOS TESS and bring back any of its information for later use.

The presence of an ITIS TSN identifier assigned to an ECOS species record is a pretty solid identifier to use in retrieving data from the TESS system. This code block uses a TSN type query to retrieve as many records as possible back for the data collection.

In [11]:
for record in esaWPSpecies.find({"$and":[{"TESS":{"$exists":False}},{"ECOS Scrape.TSN":{"$exists":True}}]}):
    tessData = tess.queryTESS("TSN",record["ECOS Scrape"]["TSN"])
    if tessData["result"] is not False:
        esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"TESS":tessData}})
    else:
        display(record)

For a few cases, we did not have an ITIS TSN in the data from the ECOS scrape, but we do have an SPCODE identifier in the URL from the link. We can use those to go after TESS data. This code block is meant to run in sequence after trying for TESS data via TSN.

In [12]:
for record in esaWPSpecies.find({"$and":[{"TESS":{"$exists":False}},{"ECOS Scrape.SPCODE":{"$exists":True}}]}):
    tessData = tess.queryTESS("SPCODE",record["ECOS Scrape"]["SPCODE"])
    if tessData["result"] is not False:
        esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"TESS":tessData}})
    else:
        display(record)

In cases where we still don't have any TESS data after trying ITIS TSN and SPCODE identifiers, we can still try to use the scientific name to see if there is anything in the system. If not, then there must be some reason that FWS has not entered information for a particular petition into their core system.

At this point, I also check to see if the scientific name we scraped from a linked ECOS web page matches the scientific name from the FWS pre-listing plan spreadsheet. If it doesn't match, I put a note in the processing metadata indicating that there is an issue we may want to investigate further. Depending on who established a link to ECOS in the spreadsheet, it may just be that we resolved some taxonomic issue with what was originally submitted by a petitioner.

In [13]:
for record in esaWPSpecies.find({"$and":[{"TESS":{"$exists":False}},{"ECOS Scrape.SPCODE":{"$exists":False}},{"ECOS Scrape.Scientific Name":{"$exists":True}}]}):
    if record["Submitted Data"]["Scientific Name"] != record["ECOS Scrape"]["Scientific Name"]:
        processingMetadata = record["Processing Metadata"]
        processingMetadata["ECOS Match Annotation"] = "Scientific name from spreadsheet didn't match with referenced ECOS record"
        esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"Processing Metadata":processingMetadata}})

    tessData = tess.queryTESS("SCINAME",record["ECOS Scrape"]["Scientific Name"])
    if tessData["result"] is not False:
        esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"TESS":tessData}})
    else:
        print("No TESS record found on scientific name search")
        display(record)

At this point, there are a number of records that did not include any ECOS link to follow and scrape and for which we've not been able to retrieve any information from TESS. I go ahead and try to use the original scientific name supplied to run a search with the TESS API to see if we find any results. At this point, we've exhausted all our possibilities of finding a link to TESS without also running ITIS processes to potentially turn up a TSN to search with, so we go ahead and insert a TESS result for every remaining record, indicating that no result was found if that's the case.

In [14]:
for record in esaWPSpecies.find({"TESS":{"$exists":False}}):
    esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"TESS":tess.queryTESS("SCINAME",record["Submitted Data"]["Scientific Name"])}})


In [15]:
# Show what one of the records looks like at this point with the ECOS scraped information on board.
display(esaWPSpecies.find_one({"TESS.result":True}))

{'ECOS Scrape': {'Common Name': 'Streamside salamander',
  'Scientific Name': 'Ambystoma barbouri',
  'TSN': '208204',
  'url': 'https://ecos.fws.gov/ecp/species/9776'},
 'FWS Range': {'US State List': ['AL', 'KY', 'OH', 'TN', 'WV'],
  'US States': [{'AL': {'fips': '01', 'name': 'Alabama'}},
   {'KY': {'fips': '21', 'name': 'Kentucky'}},
   {'OH': {'fips': '39', 'name': 'Ohio'}},
   {'TN': {'fips': '47', 'name': 'Tennessee'}},
   {'WV': {'fips': '54', 'name': 'West Virginia'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.178251'},
 'Submitted Data': {'Common Name': 'streamside salamander',
  'Grouping': 'Amphibians',
  'Lead FWS Region': 'R4',
  'Scientific Name': 'Ambystoma barbouri',
  'Species Range': 'AL, KY, OH, TN, WV',
  'Species Record Reference': 'https://ecos.fws.gov/ecp/species/9776'},
 'TESS': {'COMNAME': 'Streamside salamander',
  'COUNTRY': '1',
  'DPS': '0',
  'ENTITY_ID': '10742',
  'FAMILY': 'Ambystomatidae',
  'INVNAME': 'salamander, St

# ITIS
One of the interesting data sources that we can tap into is the Integrated Taxonomic Information System, which is managed by part of our group. ITIS is used as the primary taxonomic authority behind the TESS system, and we used previous processes to pull out the ITIS Taxonomic Serial Number (TSN) for many of the species. That is available at this point in both the ECOS Scrape and the TESS data packets.

In [16]:
from bis import itis

First, we check just to see if there were any cases where there is a mismatch between the TSNs that we have onboard the items at this point from the ECOS Scrape and the TESS data retrieval processes. If there is an issue, then we need to go back and evaluate the data further to see where there is a misalignment in the FWS data system.

In [17]:
for record in esaWPSpecies.find({"$and":[{"TESS.TSN":{"$exists":True}},{"ECOS Scrape.TSN":{"$exists":True}}]}):
    if record["ECOS Scrape"]["TSN"] != record["TESS"]["TSN"]:
        display (record)

For any of the species records where we could deterimine an ITIS TSN from scraping ECOS landing pages and/or querying TESS, we have to assume that someone did the work to line up with the taxonomic reference. Our first code block here goes through and grabs the base record for the supplied TSN. If that ITIS record shows invalid or unaccepted usage, however, we follow the accepted TSN from the record to also add in the valid/accepted ITIS record.

TESS seems to use some idea of a negative number for ITIS TSN to indicate that a link with ITIS taxonomy has not been established. It's not clear what significance the actual number has, but at this point we skip those and only process records with a TSN greater than 0.

There may be cases where ITIS indicates that the specified record by TSN is invalid or unaccepted. It's not clear what may be happening in these cases. It could be an actual disagreement over taxonomy or something about the information being out of date. In any case, we do go ahead and follow ITIS to retrieve the accepted record, but we also store the record directly linked from the FWS data. Reporting on these cases later may lead to fruitful discussion.

In [18]:
for record in esaWPSpecies.find({"$and":[{"ITIS":{"$exists":False}},{"TESS.TSN":{"$exists":True}}]},{"TESS.TSN":1,"Processing Metadata":1}):
    if int(record["TESS"]["TSN"]) > 0:
        itisByTSN = itis.getITISSearchURL(record["TESS"]["TSN"],False,False)
        try:
            itisResults = requests.get(itisByTSN).json()
        except:
            print ("FAILED QUERY", itisByTSN)
        
        if len(itisResults["response"]["docs"]) == 1:
            itisDocs = [itis.packageITISJSON(itisResults["response"]["docs"][0])]
        else:
            display (itisResults)
            
        if "itisDocs" in locals() and itisDocs[0]["usage"] in ["invalid","unaccepted"]:
            if len(itisDocs[0]["acceptedTSN"]) == 1:
                itisByAcceptedTSN = itis.getITISSearchURL(itisDocs[0]["acceptedTSN"][0],False,False)
                itisResults = requests.get(itisByAcceptedTSN).json()
                
                if len(itisResults["response"]["docs"]) == 1:
                    itisDocs.append(itis.packageITISJSON(itisResults["response"]["docs"][0]))
                else:
                    display (itisResults)
            else:
                display (itisResults)

    if len(itisDocs) > 0:
        esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"ITIS":itisDocs}})
    else:
        display (itisResults)


In [19]:
# Show what one of the records with more than one ITIS doc looks like
display(esaWPSpecies.find_one({"ITIS.1":{"$exists":True}}))

{'ECOS Scrape': {'Common Name': 'Arizona toad',
  'Scientific Name': 'Bufo microscaphus microscaphus',
  'TSN': '207135',
  'url': 'https://ecos.fws.gov/ecp/species/2077'},
 'FWS Range': {'US State List': ['AZ', 'CA', 'NM', 'NV', 'UT'],
  'US States': [{'AZ': {'fips': '04', 'name': 'Arizona'}},
   {'CA': {'fips': '06', 'name': 'California'}},
   {'NM': {'fips': '35', 'name': 'New Mexico'}},
   {'NV': {'fips': '32', 'name': 'Nevada'}},
   {'UT': {'fips': '49', 'name': 'Utah'}}]},
 'ITIS': [{'acceptedTSN': ['773525'],
   'cacheDate': '2018-03-07T15:22:09.962477',
   'commonnames': [{'language': 'English', 'name': 'Arizona Toad'}],
   'createDate': '1996-06-13 14:51:08',
   'hierarchy': ['Animalia',
    'Bilateria',
    'Deuterostomia',
    'Chordata',
    'Vertebrata',
    'Gnathostomata',
    'Tetrapoda',
    'Amphibia',
    'Anura',
    'Bufonidae',
    'Anaxyrus',
    'Anaxyrus microscaphus'],
   'kingdom': 'Animalia',
   'nameWInd': 'Bufo microscaphus microscaphus',
   'nameWOInd': '

In [22]:
# What do the records look like that are left over; what do we have to work with in finding possible ITIS matches
for record in esaWPSpecies.find({"ITIS":{"$exists":False}}):
    display (record)

{'FWS Range': {'US State List': ['CT',
   'DE',
   'FL',
   'GA',
   'MA',
   'MD',
   'ME',
   'NC',
   'NH',
   'NJ',
   'NY',
   'PA',
   'RI',
   'SC',
   'VA'],
  'US States': [{'CT': {'fips': '09', 'name': 'Connecticut'}},
   {'DE': {'fips': '10', 'name': 'Delaware'}},
   {'FL': {'fips': '12', 'name': 'Florida'}},
   {'GA': {'fips': '13', 'name': 'Georgia'}},
   {'MA': {'fips': '25', 'name': 'Massachusetts'}},
   {'MD': {'fips': '24', 'name': 'Maryland'}},
   {'ME': {'fips': '23', 'name': 'Maine'}},
   {'NC': {'fips': '37', 'name': 'North Carolina'}},
   {'NH': {'fips': '33', 'name': 'New Hampshire'}},
   {'NJ': {'fips': '34', 'name': 'New Jersey'}},
   {'NY': {'fips': '36', 'name': 'New York'}},
   {'PA': {'fips': '42', 'name': 'Pennsylvania'}},
   {'RI': {'fips': '44', 'name': 'Rhode Island'}},
   {'SC': {'fips': '45', 'name': 'South Carolina'}},
   {'VA': {'fips': '51', 'name': 'Virginia'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.182139'},


{'FWS Range': {'US State List': ['OK'],
  'US States': [{'OK': {'fips': '40', 'name': 'Oklahoma'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.191451'},
 'Submitted Data': {'Common Name': 'Delaware County cave crayfish',
  'Grouping': 'Crustaceans',
  'Lead FWS Region': 'R2',
  'Scientific Name': 'Cambarus subterraneus',
  'Species Range': 'OK',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Cambarus subterraneus',
  'dateCached': '2018-03-07T15:21:43.663762',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84a68')}

{'FWS Range': {'US State List': ['TX'],
  'US States': [{'TX': {'fips': '48', 'name': 'Texas'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.192231'},
 'Submitted Data': {'Common Name': 'Texas troglobitic water slater',
  'Grouping': 'Crustaceans',
  'Lead FWS Region': 'R2',
  'Scientific Name': 'Lirceolus smithii',
  'Species Range': 'TX',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Lirceolus smithii',
  'dateCached': '2018-03-07T15:21:43.939491',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84a70')}

{'FWS Range': {'US State List': ['MO'],
  'US States': [{'MO': {'fips': '29', 'name': 'Missouri'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.192887'},
 'Submitted Data': {'Common Name': 'Big Creek crayfish',
  'Grouping': 'Crustaceans',
  'Lead FWS Region': 'R3',
  'Scientific Name': 'Orconectes peruncus',
  'Species Range': 'MO',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Orconectes peruncus',
  'dateCached': '2018-03-07T15:21:44.061496',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84a75')}

{'FWS Range': {'US State List': ['MO'],
  'US States': [{'MO': {'fips': '29', 'name': 'Missouri'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.193196'},
 'Submitted Data': {'Common Name': 'St. Francis River crayfish',
  'Grouping': 'Crustaceans',
  'Lead FWS Region': 'R3',
  'Scientific Name': 'Orconectes quadruncus',
  'Species Range': 'MO',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Orconectes quadruncus',
  'dateCached': '2018-03-07T15:21:44.200340',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84a76')}

{'FWS Range': {'US State List': ['KY', 'TN'],
  'US States': [{'KY': {'fips': '21', 'name': 'Kentucky'}},
   {'TN': {'fips': '47', 'name': 'Tennessee'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.196246'},
 'Submitted Data': {'Common Name': 'redlips darter (broken out from ashy darter complex)',
  'Grouping': 'Fishes',
  'Lead FWS Region': 'R4',
  'Scientific Name': 'Etheostoma maydeni',
  'Species Range': 'KY, TN',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Etheostoma maydeni',
  'dateCached': '2018-03-07T15:21:44.317158',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84a92')}

{'FWS Range': {'Other Places': [{'KA': {}}],
  'US State List': ['CO', 'NM', 'OK', 'TX'],
  'US States': [{'CO': {'fips': '08', 'name': 'Colorado'}},
   {'NM': {'fips': '35', 'name': 'New Mexico'}},
   {'OK': {'fips': '40', 'name': 'Oklahoma'}},
   {'TX': {'fips': '48', 'name': 'Texas'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.197295'},
 'Submitted Data': {'Common Name': 'Arkansas River speckled chub',
  'Grouping': 'Fishes',
  'Lead FWS Region': 'R2',
  'Scientific Name': 'Macrhybopsis aestivalis tetranemus',
  'Species Range': 'CO, KA, NM, OK, TX',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Macrhybopsis aestivalis tetranemus',
  'dateCached': '2018-03-07T15:21:44.437670',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84a9e')}

{'FWS Range': {'US State List': ['TX'],
  'US States': [{'TX': {'fips': '48', 'name': 'Texas'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.199902'},
 'Submitted Data': {'Common Name': 'Navasota false foxglove',
  'Grouping': 'Flowering Plants',
  'Lead FWS Region': 'R2',
  'Scientific Name': 'Agalinis navasotensis',
  'Species Range': 'TX',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Agalinis navasotensis',
  'dateCached': '2018-03-07T15:21:44.565337',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84ab3')}

{'FWS Range': {'US State List': ['LA', 'MS', 'TX'],
  'US States': [{'LA': {'fips': '22', 'name': 'Louisiana'}},
   {'MS': {'fips': '28', 'name': 'Mississippi'}},
   {'TX': {'fips': '48', 'name': 'Texas'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.200304'},
 'Submitted Data': {'Common Name': 'rough stemmed aster',
  'Grouping': 'Flowering Plants',
  'Lead FWS Region': 'R2',
  'Scientific Name': 'Aster puniceus scabricaulis',
  'Species Range': 'LA, MS, TX',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Aster puniceus scabricaulis',
  'dateCached': '2018-03-07T15:21:44.679764',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84ab7')}

{'FWS Range': {'Other Places': [{'Mexico': {}}],
  'US State List': ['NM'],
  'US States': [{'NM': {'fips': '35', 'name': 'New Mexico'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.201369'},
 'Submitted Data': {'Common Name': 'glowing Indian paintbrush',
  'Grouping': 'Flowering Plants',
  'Lead FWS Region': 'R2',
  'Scientific Name': 'Castilleja ornata',
  'Species Range': 'NM, Mexico',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Castilleja ornata',
  'dateCached': '2018-03-07T15:21:44.795030',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84ac2')}

{'FWS Range': {'US State List': ['FL'],
  'US States': [{'FL': {'fips': '12', 'name': 'Florida'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.203308'},
 'Submitted Data': {'Common Name': 'Cape Sable orchid',
  'Grouping': 'Flowering Plants',
  'Lead FWS Region': 'R4',
  'Scientific Name': 'Oncidium undulatum',
  'Species Range': 'FL',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Oncidium undulatum',
  'dateCached': '2018-03-07T15:21:44.915893',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84ad6')}

{'FWS Range': {'US State List': ['AZ', 'CA', 'NV', 'UT'],
  'US States': [{'AZ': {'fips': '04', 'name': 'Arizona'}},
   {'CA': {'fips': '06', 'name': 'California'}},
   {'NV': {'fips': '32', 'name': 'Nevada'}},
   {'UT': {'fips': '49', 'name': 'Utah'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.204663'},
 'Submitted Data': {'Common Name': 'Joshua tree',
  'Grouping': 'Flowering Plants',
  'Lead FWS Region': 'R8',
  'Scientific Name': 'Yucca brevifolia',
  'Species Range': 'AZ, CA, NV, UT',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Yucca brevifolia',
  'dateCached': '2018-03-07T15:21:45.033270',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84ae5')}

{'FWS Range': {'US State List': ['AZ',
   'CA',
   'CO',
   'ID',
   'MT',
   'ND',
   'NE',
   'NM',
   'NV',
   'OR',
   'SD',
   'UT',
   'WA',
   'WY'],
  'US States': [{'AZ': {'fips': '04', 'name': 'Arizona'}},
   {'CA': {'fips': '06', 'name': 'California'}},
   {'CO': {'fips': '08', 'name': 'Colorado'}},
   {'ID': {'fips': '16', 'name': 'Idaho'}},
   {'MT': {'fips': '30', 'name': 'Montana'}},
   {'ND': {'fips': '38', 'name': 'North Dakota'}},
   {'NE': {'fips': '31', 'name': 'Nebraska'}},
   {'NM': {'fips': '35', 'name': 'New Mexico'}},
   {'NV': {'fips': '32', 'name': 'Nevada'}},
   {'OR': {'fips': '41', 'name': 'Oregon'}},
   {'SD': {'fips': '46', 'name': 'South Dakota'}},
   {'UT': {'fips': '49', 'name': 'Utah'}},
   {'WA': {'fips': '53', 'name': 'Washington'}},
   {'WY': {'fips': '56', 'name': 'Wyoming'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.205329'},
 'Submitted Data': {'Common Name': 'western bumble bee',
  'Grouping': 'Insects',
  'L

{'FWS Range': {'Other Places': [{'Canada': {}}],
  'US State List': ['AL',
   'AR',
   'CT',
   'DC',
   'DE',
   'FL',
   'GA',
   'IL',
   'IN',
   'KS',
   'KY',
   'LA',
   'MA',
   'MD',
   'MI',
   'NC',
   'NH',
   'NJ',
   'NY',
   'OH',
   'OK',
   'PA',
   'RI',
   'SC',
   'TN',
   'TX',
   'VA',
   'VT',
   'WI',
   'WV'],
  'US States': [{'AL': {'fips': '01', 'name': 'Alabama'}},
   {'AR': {'fips': '05', 'name': 'Arkansas'}},
   {'CT': {'fips': '09', 'name': 'Connecticut'}},
   {'DC': {'fips': '11', 'name': 'District of Columbia'}},
   {'DE': {'fips': '10', 'name': 'Delaware'}},
   {'FL': {'fips': '12', 'name': 'Florida'}},
   {'GA': {'fips': '13', 'name': 'Georgia'}},
   {'IL': {'fips': '17', 'name': 'Illinois'}},
   {'IN': {'fips': '18', 'name': 'Indiana'}},
   {'KS': {'fips': '20', 'name': 'Kansas'}},
   {'KY': {'fips': '21', 'name': 'Kentucky'}},
   {'LA': {'fips': '22', 'name': 'Louisiana'}},
   {'MA': {'fips': '25', 'name': 'Massachusetts'}},
   {'MD': {'fips': '24',

{'FWS Range': {'Other Places': [{'Canada': {}}],
  'US State List': ['NY', 'WI'],
  'US States': [{'NY': {'fips': '36', 'name': 'New York'}},
   {'WI': {'fips': '55', 'name': 'Wisconsin'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.206795'},
 'Submitted Data': {'Common Name': 'bog buck moth',
  'Grouping': 'Insects',
  'Lead FWS Region': 'R5',
  'Scientific Name': 'Hemileuca spp.',
  'Species Range': 'NY, WI, Canada',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Hemileuca spp.',
  'dateCached': '2018-03-07T15:21:47.192983',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84af9')}

{'FWS Range': {'US State List': ['MI', 'MN', 'ND', 'WI'],
  'US States': [{'MI': {'fips': '26', 'name': 'Michigan'}},
   {'MN': {'fips': '27', 'name': 'Minnesota'}},
   {'ND': {'fips': '38', 'name': 'North Dakota'}},
   {'WI': {'fips': '55', 'name': 'Wisconsin'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.209847'},
 'Submitted Data': {'Common Name': 'northwestern moose',
  'Grouping': 'Mammals',
  'Lead FWS Region': 'R3',
  'Scientific Name': 'Alces alces andersoni',
  'Species Range': 'MI, MN, ND, WI',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Alces alces andersoni',
  'dateCached': '2018-03-07T15:21:47.953296',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84b19')}

{'FWS Range': {'US State List': ['TX'],
  'US States': [{'TX': {'fips': '48', 'name': 'Texas'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.211900'},
 'Submitted Data': {'Common Name': 'Donrichardsonia macroneuron (unnamed moss)',
  'Grouping': 'Non-Flowering Plants',
  'Lead FWS Region': 'R2',
  'Scientific Name': 'Donrichardsonia macroneuron',
  'Species Range': 'TX',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Donrichardsonia macroneuron',
  'dateCached': '2018-03-07T15:21:48.076672',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84b24')}

{'FWS Range': {'Other Places': [{'Canada': {}}],
  'US State List': ['CT',
   'DC',
   'DE',
   'FL',
   'GA',
   'IL',
   'IN',
   'MA',
   'MD',
   'ME',
   'MI',
   'NC',
   'NH',
   'NJ',
   'NY',
   'OH',
   'PA',
   'RI',
   'SC',
   'VA',
   'VT',
   'WV'],
  'US States': [{'CT': {'fips': '09', 'name': 'Connecticut'}},
   {'DC': {'fips': '11', 'name': 'District of Columbia'}},
   {'DE': {'fips': '10', 'name': 'Delaware'}},
   {'FL': {'fips': '12', 'name': 'Florida'}},
   {'GA': {'fips': '13', 'name': 'Georgia'}},
   {'IL': {'fips': '17', 'name': 'Illinois'}},
   {'IN': {'fips': '18', 'name': 'Indiana'}},
   {'MA': {'fips': '25', 'name': 'Massachusetts'}},
   {'MD': {'fips': '24', 'name': 'Maryland'}},
   {'ME': {'fips': '23', 'name': 'Maine'}},
   {'MI': {'fips': '26', 'name': 'Michigan'}},
   {'NC': {'fips': '37', 'name': 'North Carolina'}},
   {'NH': {'fips': '33', 'name': 'New Hampshire'}},
   {'NJ': {'fips': '34', 'name': 'New Jersey'}},
   {'NY': {'fips': '36', 'name': 'New

{'FWS Range': {'US State List': ['NM', 'TX'],
  'US States': [{'NM': {'fips': '35', 'name': 'New Mexico'}},
   {'TX': {'fips': '48', 'name': 'Texas'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.214418'},
 'Submitted Data': {'Common Name': 'Rio Grande cooter',
  'Grouping': 'Reptiles',
  'Lead FWS Region': 'R2',
  'Scientific Name': 'Pseudemys gorzugi',
  'Species Range': 'NM, TX',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'Pseudemys gorzugi',
  'dateCached': '2018-03-07T15:21:48.772009',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84b39')}

{'FWS Range': {'US State List': ['CA'],
  'US States': [{'CA': {'fips': '06', 'name': 'California'}}]},
 'Processing Metadata': {'Date Created from Source': '2018-03-07T15:14:28.215253'},
 'Submitted Data': {'Common Name': 'Mojave shoulderband snail',
  'Grouping': 'Snails',
  'Lead FWS Region': 'R8',
  'Scientific Name': 'HELMINTHOGLYPTA GREGGI',
  'Species Range': 'CA',
  'Species Record Reference': nan},
 'TESS': {'criteria': 'HELMINTHOGLYPTA GREGGI',
  'dateCached': '2018-03-07T15:21:48.899186',
  'queryType': 'SCINAME',
  'result': False},
 '_id': ObjectId('5aa001d40601ba00d3c84b42')}

These all look like species that may not have very complete information in the FWS databases as yet. We didn't find any TESS information for them based on scientific name search. We'll try these against ITIS, but it looks like we may need to do a little bit of name cleanup along the way.

In [30]:
from bis import bis
for record in esaWPSpecies.find({"ITIS":{"$exists":False}}):
    itisByName = itis.getITISSearchURL(bis.cleanScientificName(record["Submitted Data"]["Scientific Name"]),False,False)
    try:
        itisResults = requests.get(itisByName).json()
    except:
        print ("FAILED QUERY", itisByName)

    if len(itisResults["response"]["docs"]) == 1:
        itisDocs = [itis.packageITISJSON(itisResults["response"]["docs"][0])]
    else:
        display (itisResults)

    if "itisDocs" in locals() and itisDocs[0]["usage"] in ["invalid","unaccepted"]:
        if len(itisDocs[0]["acceptedTSN"]) == 1:
            itisByAcceptedTSN = itis.getITISSearchURL(itisDocs[0]["acceptedTSN"][0],False,False)
            itisResults = requests.get(itisByAcceptedTSN).json()

            if len(itisResults["response"]["docs"]) == 1:
                itisDocs.append(itis.packageITISJSON(itisResults["response"]["docs"][0]))
            else:
                validITISResultDocs = [d for d in itisResults["response"]["docs"] if d["usage"] in ["accepted","valid"]]
                if len(validITISResultDocs) == 1:
                    itisDocs.append(itis.packageITISJSON(validITISResultDocs[0]))
                else:
                    display (itisResults)
        else:
            display (itisResults)

    if len(itisDocs) > 0:
        esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"ITIS":itisDocs}})
    else:
        display (itisResults)


# BISON
One of the other USGS systems we can look to for information is the occurrence data in BISON. In this section, we use ITIS TSN where we've nailed that down for a precise search and scientific name when we don't have an ITIS TSN to retrieve a quick summary of what BISON has to offer. This is based on the code that Abby Benson started but adds a little more to the summary. This summarization is slightly complicated in that some of the originally submitted records may or may not be related to more than one species identifier in BISON.

We are still working on exactly how to go about summarizing BISON data for a given species query, and there are lots of details to work out. For now, I've packaged the same basic logical query that Abby started in an R script here in a Python function. This will have to be tweaked over time and become a new core TIR function at some point as we work out nuances in how the query should operate and how we want to package the data into a more logical format than what is provided by the low level Solr API.

In [57]:
def bisonSummary(query):
    import requests
    
    queryBase = "https://data.usgs.gov/solr/occurrences/select/?wt=json&rows=0&facet=true&facet.mincount=1&facet.field=basisOfRecord&facet.field=calculatedState&q="
    queryURL = queryBase+query
    
    bisonSummaryResults = requests.get(queryURL).json()
    
    summaryData = {"query":bisonSummaryResults["responseHeader"]["params"]["q"]}
    summaryData["Total Occurrence Records"] = bisonSummaryResults["response"]["numFound"]

    summaryData["Basis of Record"] = []
    for index,basisOfRecord in enumerate(bisonSummaryResults["facet_counts"]["facet_fields"]["basisOfRecord"]):
        try:
            basisOfRecord += 1
        except:
            summaryData["Basis of Record"].append({basisOfRecord:bisonSummaryResults["facet_counts"]["facet_fields"]["basisOfRecord"][index+1]})

            
    summaryData["US State Occurrences"] = []
    for index,calculatedState in enumerate(bisonSummaryResults["facet_counts"]["facet_fields"]["calculatedState"]):
        try:
            calculatedState += 1
        except:
            if len(calculatedState) == 0:
                calculatedStateValue = "Unknown"
            else:
                calculatedStateValue = calculatedState
            summaryData["US State Occurrences"].append({calculatedStateValue:bisonSummaryResults["facet_counts"]["facet_fields"]["calculatedState"][index+1]})

    return summaryData


We are still working out what to use in querying BISON. We could use some combination of names, but the BISON API is supposed to be intelligent enough to leverage ITIS in the background to return records for synonyms, homonyms, etc. For now, I simply wrote this to try and use the original submitted scientific name to see what we turn up and start fiddling with how to use the summary information.

The potentially interesting thing to examine here at a very crude level of sophistication is the difference between stated range of a species from the standpoint of the FWS review process or the original petition for listing and the number of states where there appear to be occurrence records. Further examination of the BISON data would be needed to determine whether some of the occurrences should be filtered out based on basis of record (e.g., maybe we don't want to consider fossil records), age of the record, lack of completeness, spatial data quality concerns, or other factors.

In [58]:
for record in esaWPSpecies.find({"BISON":{"$exists":False}}):
    esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"BISON":bisonSummary("scientificName:"+record["Submitted Data"]["Scientific Name"])}})

For later comparison with other lists of states, we send a simple list of any state in the BISON data with an occurrence record to a list in the synthesis structure.

In [137]:
import us
for record in esaWPSpecies.find({"BISON":{"$exists":True}},{"BISON.US State Occurrences":1}):
    bisonMasterList = list(set().union(*(d.keys() for d in record["BISON"]["US State Occurrences"])))
    bisonStateList = []
    for state in bisonMasterList:
        if us.states.lookup(state) is not None:
            bisonStateList.append(state)
    esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"Synthesis.States with BISON Occurrence Data":bisonStateList}})

In [59]:
# Show what one of the records with a BISON summary looks like
display(esaWPSpecies.find_one({"BISON":{"$exists":True}}))

{'BISON': {'Basis of Record': [{'specimen': 5564},
   {'fossil': 1986},
   {'unknown': 66},
   {'observation': 16}],
  'Total Occurrence Records': 7632,
  'US State Occurrences': [{'Michigan': 1156},
   {'Ohio': 709},
   {'New York': 575},
   {'Connecticut': 337},
   {'California': 306},
   {'Ontario Canada': 184},
   {'New Jersey': 155},
   {'Vermont': 154},
   {'Massachusetts': 121},
   {'Kentucky': 78},
   {'Maine': 67},
   {'Indiana': 63},
   {'Florida': 56},
   {'Illinois': 51},
   {'Pennsylvania': 48},
   {'Tennessee': 45},
   {'Arkansas': 36},
   {'Texas': 26},
   {'Virginia': 26},
   {'New Hampshire': 23},
   {'Missouri': 18},
   {'Wisconsin': 16},
   {'Nebraska': 15},
   {'Idaho': 12},
   {'South Carolina': 12},
   {'Kansas': 11},
   {'Quebec Canada': 11},
   {'Mississippi': 8},
   {'Colorado': 7},
   {'Louisiana': 5},
   {'Alabama': 4},
   {'Maryland': 4},
   {'North Carolina': 4},
   {'Washington': 4},
   {'West Virginia': 4},
   {'British Columbia Canada': 3},
   {'Georgia'

# Name Check
At this point, we have started with a submitted scientific name and then pulled together several other potential scientific names for the species.

* Name scraped from referenced ECOS web pages
* Name accessed from the TESS system API
* One or more names from ITIS

Before advancing further, I put unique names by which a species may be known together into a list so that we can look for cases where there is some disagreement and need for future analysis or annotation.

In [100]:
for record in esaWPSpecies.find():
    synthesis = {"Scientific Names":[{"Scientific Name":" ".join(record["Submitted Data"]["Scientific Name"].split()),"Source":"Submitted Data"}]}

    if "ECOS Scrape" in record.keys():
        synthesis["Scientific Names"].append({"Scientific Name":" ".join(record["ECOS Scrape"]["Scientific Name"].split()),"Source":"ECOS Scrape"})

    if "SCINAME" in record["TESS"].keys():
        synthesis["Scientific Names"].append({"Scientific Name":" ".join(record["TESS"]["SCINAME"].split()),"Source":"TESS"})

    for itisDoc in record["ITIS"]:
        synthesis["Scientific Names"].append({"Scientific Name":" ".join(itisDoc["nameWInd"].split()),"Source":"ITIS","Usage":itisDoc["usage"]})

    synthesis["Unique Scientific Names"] = list(set([d["Scientific Name"] for d in synthesis["Scientific Names"]]))
    
    esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"Synthesis":synthesis}})    
    

# SGCN
One of the other data systems we maintain is a synthesis of State Species of Greatest Conservation Need. Building on work that Abby contributed, I've pulled the same information together here in a slightly different format to be consistent with other "data packets" we are assembling.

In [60]:
sgcnSynthesis = bisDB["SGCN Synthesis"]

Deviating a little bit from what Abby started, where she did not have an ITIS TSN to go on, this first process works through the work plan records, grabs a valid ITIS TSN, and uses that to search against the SGCN Synthesis (which also happens to run in the same database infrastructure we are building on here). This should give us a good first cut, and then we'll have to rely on other methods to find additional matches to SGCN data.

In [67]:
for record in esaWPSpecies.find({"SGCN":{"$exists":False}}):
    l_validITIStsn = [i for i in record["ITIS"] if i["usage"] in ["valid","accepted"]]
    
    if len(l_validITIStsn) == 1:
        validITIStsn = l_validITIStsn[0]["tsn"]
    
        sgcnSynthesisRecord = sgcnSynthesis.find_one({"ITIS":{"$elemMatch":{"tsn":validITIStsn}}})

        if sgcnSynthesisRecord is not None:
            sgcnSummary = {"Scientific Name":sgcnSynthesisRecord["_id"]}
            sgcnSummary["Common Name"] = sgcnSynthesisRecord["Common Name"]
            sgcnSummary["Match Method"] = sgcnSynthesisRecord["Match Method"]
            sgcnSummary["Taxonomic Authority ID"] = sgcnSynthesisRecord["Taxonomic Authority ID"]
            sgcnSummary["Taxonomy"] = sgcnSynthesisRecord["Taxonomy"]
            sgcnSummary["State List Summary"] = sgcnSynthesisRecord["Source Data Summary"]

            esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"SGCN":sgcnSummary}})


For the leftovers, this code block tries to find a match on the originally submitted scientific name with the SGCN Synthesis.

In [70]:
for record in esaWPSpecies.find({"SGCN":{"$exists":False}}):
    sgcnSynthesisRecord = sgcnSynthesis.find_one({"_id":record["Submitted Data"]["Scientific Name"]})

    if sgcnSynthesisRecord is not None:
        sgcnSummary = {"Scientific Name":sgcnSynthesisRecord["_id"]}
        sgcnSummary["Common Name"] = sgcnSynthesisRecord["Common Name"]
        sgcnSummary["Match Method"] = sgcnSynthesisRecord["Match Method"]
        sgcnSummary["Taxonomic Authority ID"] = sgcnSynthesisRecord["Taxonomic Authority ID"]
        sgcnSummary["Taxonomy"] = sgcnSynthesisRecord["Taxonomy"]
        sgcnSummary["State List Summary"] = sgcnSynthesisRecord["Source Data Summary"]
        
        esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"SGCN":sgcnSummary}})


In [71]:
# Show what one of the records with an SGCN summary looks like
display(esaWPSpecies.find_one({"SGCN":{"$exists":True}}))

{'BISON': {'Basis of Record': [{'specimen': 5564},
   {'fossil': 1986},
   {'unknown': 66},
   {'observation': 16}],
  'Total Occurrence Records': 7632,
  'US State Occurrences': [{'Michigan': 1156},
   {'Ohio': 709},
   {'New York': 575},
   {'Connecticut': 337},
   {'California': 306},
   {'Ontario Canada': 184},
   {'New Jersey': 155},
   {'Vermont': 154},
   {'Massachusetts': 121},
   {'Kentucky': 78},
   {'Maine': 67},
   {'Indiana': 63},
   {'Florida': 56},
   {'Illinois': 51},
   {'Pennsylvania': 48},
   {'Tennessee': 45},
   {'Arkansas': 36},
   {'Texas': 26},
   {'Virginia': 26},
   {'New Hampshire': 23},
   {'Missouri': 18},
   {'Wisconsin': 16},
   {'Nebraska': 15},
   {'Idaho': 12},
   {'South Carolina': 12},
   {'Kansas': 11},
   {'Quebec Canada': 11},
   {'Mississippi': 8},
   {'Colorado': 7},
   {'Louisiana': 5},
   {'Alabama': 4},
   {'Maryland': 4},
   {'North Carolina': 4},
   {'Washington': 4},
   {'West Virginia': 4},
   {'British Columbia Canada': 3},
   {'Georgia'

# NatureServe
The NatureServe species data system provides another potentially useful suite of information for us to assemble and work against. We've used it in other applications of the Taxa Information Registry and have some functions that handle lookup and secure, authorized retrieval of information.

In [104]:
from bis import natureserve
from bis2 import natureserve as ns

Because we previously put together a synthesis of the possible scientific names for a record and laid out a unique names list, we can now work through those unique names looking for matches from the NatureServe system. Because this list is an ordered list, starting with the originally submitted name, that order suits us here. We will break off searching as soon as we find a viable match and return an Element Global ID for further exploration.

In [102]:
for record in esaWPSpecies.find({"NatureServe":{"$exists":False}},{"Synthesis.Unique Scientific Names":1}):
    for scientificName in record["Synthesis"]["Unique Scientific Names"]:
        elementGlobalID = natureserve.queryNatureServeID(scientificName)
        if elementGlobalID is not None:
            searchResults = {"Search Name":scientificName}
            searchResults["Element Global ID"] = elementGlobalID
            esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"NatureServe":{"Search Results":[searchResults]}}})
            break

Once we have NatureServe identifiers to work with, we can use a secure API connection to the NatureServe system to retrieve other details for use. There are a number of interesting information elements to work with in the NatureServe species data, and our current process only pulls a few of these out for use.

In [114]:
for record in esaWPSpecies.find({"$and":[{"NatureServe":{"$exists":True}},{"NatureServe.Data":{"$exists":False}}]},{"NatureServe":1}):
    natureServeData = natureserve.packageNatureServeJSON(ns.speciesAPI(),record["NatureServe"]["Search Results"][0]["Element Global ID"])
    if natureServeData["result"]:
        esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"NatureServe.Data":{"Element Global ID":record["NatureServe"]["Search Results"][0]["Element Global ID"],"Data":natureServeData}}})

In [116]:
# Show what one of the records looks like with NatureServe data onboard
display(esaWPSpecies.find_one({"$and":[{"NatureServe":{"$exists":True}},{"NatureServe.Data":{"$exists":True}}]}))

{'BISON': {'Basis of Record': [{'specimen': 5564},
   {'fossil': 1986},
   {'unknown': 66},
   {'observation': 16}],
  'Total Occurrence Records': 7632,
  'US State Occurrences': [{'Michigan': 1156},
   {'Ohio': 709},
   {'New York': 575},
   {'Connecticut': 337},
   {'California': 306},
   {'Ontario Canada': 184},
   {'New Jersey': 155},
   {'Vermont': 154},
   {'Massachusetts': 121},
   {'Kentucky': 78},
   {'Maine': 67},
   {'Indiana': 63},
   {'Florida': 56},
   {'Illinois': 51},
   {'Pennsylvania': 48},
   {'Tennessee': 45},
   {'Arkansas': 36},
   {'Texas': 26},
   {'Virginia': 26},
   {'New Hampshire': 23},
   {'Missouri': 18},
   {'Wisconsin': 16},
   {'Nebraska': 15},
   {'Idaho': 12},
   {'South Carolina': 12},
   {'Kansas': 11},
   {'Quebec Canada': 11},
   {'Mississippi': 8},
   {'Colorado': 7},
   {'Louisiana': 5},
   {'Alabama': 4},
   {'Maryland': 4},
   {'North Carolina': 4},
   {'Washington': 4},
   {'West Virginia': 4},
   {'British Columbia Canada': 3},
   {'Georgia'

# GAP Species
The Gap Analysis Project is in the process of releasing a full set of 1,719 habitat distribution maps for the terrestrial vertebrates with range in the continguous US. We are also working on a set of fish distribution models that follow a somewhat different methodology. As these products evolve, we will include methods that summarize this information for use with the work plan species.

In [142]:
for record in esaWPSpecies.find({"GAP":{"$exists":False}},{"Synthesis.Unique Scientific Names":1}):
    for name in record["Synthesis"]["Unique Scientific Names"]:
        speciesCheck = requests.get(gc2BaseSQLURL+"?q=SELECT commonname,gap_speciescode,doi FROM gap.gapspecies WHERE scientificname = '"+name+"'").json()
        if len(speciesCheck["features"]) > 0:
            esaWPSpecies.update_one({"_id":record["_id"]},{"$set":{"GAP":speciesCheck["features"][0]["properties"]}})

Next step here is to generate a representation of the range for the GAP species we discovered by state for comparison with the other range by state information we have already assembled. That requires a little more work that I started in the Spatial Feature Registry, so I'm not going to get to that quite yet.

In [11]:
fwsSESSA = requests.get("https://ecos.fws.gov/ServCat/Reference/Profile/75903")

In [12]:
ssaSoup = BeautifulSoup(fwsSESSA.content, "html.parser")

In [13]:
print (ssaSoup.prettify())

<!DOCTYPE html>
<html>
 <head>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <!-- IMPORTANT!  Forces IE in Compatibility Mode to interpret this document as the latest IE version -->
  <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
  <link href="https://irmafiles.nps.gov/WebContentV2/FWS/v1_0_0/Images/favicon.ico" rel="shortcut icon" type="image/x-icon"/>
  <!-- Style sheets -->
  <link href="https://irmafiles.nps.gov/WebContentV2/ThirdParty/extjs/ext-4.2.1/resources/css/ext-all.css" rel="stylesheet" type="text/css"/>
  <link href="https://irmafiles.nps.gov/WebContentV2/FWS/v1_0_0/DynamicDriveMenu/ddsmoothmenu.css" rel="stylesheet" type="text/css"/>
  <link href="https://irmafiles.nps.gov/WebContentV2/FWS/v1_0_0/Styles/Site.css" rel="stylesheet" type="text/css"/>
  <link href="https://irmafiles.nps.gov/WebContentV2/FWS/v1_0_0/web/css/ecos-secure/ecos-secure.css" rel="stylesheet" type="text/css"/>
  <link href="https://irmafiles.nps.gov/WebContentV2/FWS