# Show changes made to the SGCN data from submitted to final

In the production of the integrated species lists from State Wildlife Action Plans in 2005 and submitted Species of Greatest Conservation Need in 2015, the USGS goes through some data harmonization steps to integrate a complete list, find alignment with taxonomic authorities, and produce a synthesized national list. In this process, the original submitted species names are "cleaned" to make them into searchable strings, corrected for misspellings, and eventually aligned with a taxonomic authority to integrate the information into a national list.

This notebook presents a different way of looking at the changed/corrected names than what is presented in the current SWAP app. It is entirely driven by a simple query of the data to look for cases where the original submitted name does not match the final accepted name. We will need to modify the application to work against this query and include the following documentation.

The table shows all of the cases where the original submitted species name (Submitted column) was altered in some way to produce the final state/national list of accepted names (Accepted column).

* A value in the "Cleaned" column indicates that some alteration was made to the submitted name in order to make it feasible to run a taxonomic authority search.
* A value in the "Corrected" column indicates that a basic correction was made in spelling (often Latin spelling corrections) in order to make the taxonomic authority search more direct.
* The "TaxonomicID Discovered" column shows the identifier discovered based on the cleaned or corrected name in the relevant taxonomic authority (ITIS or WoRMS).
* The "TaxonomicID Accepted" column shows the identifier that was used to produce the accepted name for the species.
* A difference between the Discovered and Accepted taxonomic IDs indicates a case where the taxonomic authority pointed to a current or corrected taxonomic record for the species.

In [30]:
import requests

gc2BaseURL = "https://gc2.mapcentia.com/api/v1/sql/bcb"

In [31]:
#Class to render tables
class ListTable(list):
    def _repr_html_(self):
        html = ["<table>"]
        for row in self:
            html.append("<tr>")
            
            for col in row:
                html.append("<td>{0}</td>".format(col))
            
            html.append("</tr>")
        html.append("</table>")
        return ''.join(html)

This query grabs all of the cases where the final accepted name did not match the original submitted name and shows the relevant properties to understand what was changed. The query can be modified to include a "where state=???" to show on each state page.

In [32]:
sqlGetChangedNames = "\
    SELECT sgcnyear,scientificname_accepted,scientificname_cleaned,scientificname_submitted,\
    scientificname_corrected,taxonomicauthorityid_discovered,taxonomicauthorityid_accepted \
    FROM sgcn \
    WHERE scientificname_submitted <> scientificname_accepted \
    AND scientificname_accepted <> '' \
    ORDER BY sgcnyear,scientificname_accepted"
queryURL = gc2BaseURL+"?q="+sqlGetChangedNames
changedNames = requests.get(queryURL).json()

print ('Number of original names submitted that did not align with final taxonomic authority names: '+str(len(changedNames['features'])))

table = ListTable()
table.append(['Year', 'Submitted', 'Cleaned', 'Corrected', 'Accepted', 'TaxonomicID Discovered', 'TaxonomicID Accepted'])
for feature in changedNames['features']:
    table.append([\
        feature['properties']['sgcnyear'], \
        feature['properties']['scientificname_submitted'], \
        feature['properties']['scientificname_cleaned'], \
        feature['properties']['scientificname_corrected'], \
        feature['properties']['scientificname_accepted'], \
        feature['properties']['taxonomicauthorityid_discovered'], \
        feature['properties']['taxonomicauthorityid_accepted'] \
        ])
table

Number of original names submitted that did not align with final taxonomic authority names: 4654


0,1,2,3,4,5,6
Year,Submitted,Cleaned,Corrected,Accepted,TaxonomicID Discovered,TaxonomicID Accepted
2005,Abacion tessalatum,Abacion tessalatum,Abacion tesselatum,Abacion tesselatum,http://services.itis.gov/?q=tsn:570281,http://services.itis.gov/?q=tsn:570281
2005,Abagrotis barnesi,Abagrotis barnesi,,Abagrotis orbis,http://services.itis.gov/?q=tsn:940756,http://services.itis.gov/?q=tsn:771360
2005,Acalypta lillianus,Acalypta lillianus,Acalypta lillianis,Acalypta lillianis,http://services.itis.gov/?q=tsn:104186,http://services.itis.gov/?q=tsn:104186
2005,Acantharcus pomotis,Acantharcus pomotis,Acantharchus pomotis,Acantharchus pomotis,http://services.itis.gov/?q=tsn:168095,http://services.itis.gov/?q=tsn:168095
2005,Carduelis hornemanni,Carduelis hornemanni,,Acanthis hornemanni,http://services.itis.gov/?q=tsn:179231,http://services.itis.gov/?q=tsn:179238
2005,Acanthocyclops Columbiensis,Acanthocyclops Columbiensis,,Acanthocyclops columbiensis,http://services.itis.gov/?q=tsn:667129,http://services.itis.gov/?q=tsn:667129
2005,Accipiter Cooperii,Accipiter Cooperii,,Accipiter cooperii,http://services.itis.gov/?q=tsn:175309,http://services.itis.gov/?q=tsn:175309
2005,Accipiter gentilus,Accipiter gentilus,Accipiter gentilis,Accipiter gentilis,http://services.itis.gov/?q=tsn:175300,http://services.itis.gov/?q=tsn:175300
2005,Accipiter gentilus atricapillus,Accipiter gentilus atricapillus,Accipiter gentilis atricapillus,Accipiter gentilis atricapillus,http://services.itis.gov/?q=tsn:175301,http://services.itis.gov/?q=tsn:175301
