---
syncID: 
title: "Querying Taxonomy Data with NEON API and Python"
description: "Querying the 'taxonomy/' NEON API endpoint with Python and navigating the response"
dateCreated: 2020-04-24
authors: Maxwell J. Burner
contributors: Donal O'Leary
estimatedTime:
packagesLibraries: requests, json, pandas
topics: api
languagesTool: python
dataProduct:
code1: 
tutorialSeries: python-neon-api-series
urlTitle: neon_api_taxonomy
---

In this tutorial we will learn to query the *taxonomy/* endpoint of the NEON API using Python.

<div id="ds-objectives" markdown="1">

### Objectives
After completing this tutorial, you will be able to:

* Query the taxonomy endpoint of the NEON API to obtain taxonomic data
* Search NEON taxonomic data using different criteria
* Use the various options of the taxonomy endpoint to customize the results of a call
* Navigate the data returned by a call to the taxonomy endpoint of the NEON API
* Navigate the parent-child relationships between NEON locations


### Install Python Packages

* **requests**
* **json** 
* **pandas**



</div>

In this tutorial we will learn to use Python and the *taxonomy/* endpoint of the NEON API to query information from NEON's taxonomic data. 

NEON maintains a great deal of taxonomic data, used in species identification during field observations and laboratory processing of samples. NEON taxonomy data can be obtained through the API, or through an interactive interface called the [Taxon Viewer](http://data.neonscience.org/static/taxon.html). Just as the *locations/* endpoint can provide more context for a location referenced in NEON studies, the *taxonomy/* endpoint can provide additional information on species identified in NEON observational data.




## Making the Request

Unlike other endpoints, the *locations/* endpoint does not take a single target in its URL. Instead, the query can make use of a number of different options, which are specified in the URL string itself. Each option is assigned a value with an equals sign, for example 'family=Pineceae'; these are placed after a question mark '?' at the end of the endpoint URL, which signals a 'query string' will follow. Multiple query options are separated by an ampersand '&' in the URL string.

Each call must have one of the following options, but cannot use multiple:
* **taxonTypeCode**, a four-letter that indicates which NEON taxonomy is being queried, such as FISH or BIRD
* One of the major taxonomic ranks from genus through kingdom
* **scientificName** a specific name of format genus + specific epithet + (authority); this is used to search for an exact result

In addition, any number of the following options can also be added to modify the results of the query:
* **verbose** takes a 'true' for a more detailed response or 'false' for a shorter response
* **offset** takes an integer indicating the number of starting rows of the list of results to skip; the default is 0
* **limit** takes an integer indicating the maximum length of the list returned; the default is 50

Let's request data on up to 20 members of the Pine family, skipping the first 11, with the short response.

In [None]:
import requests
import json

In [None]:
#Choose values for each option
SERVER = 'http://data.neonscience.org/api/v0/'
FAMILY = 'Pinaceae'
OFFSET = 11
LIMIT = 20
VERBOSE = 'false'

In [None]:
#Create 'options' portion of API call
OPTIONS = '?family={family}&offset={offset}&limit={limit}&verbose={verbose}'.format(
    family = FAMILY,
    offset = OFFSET,
    limit = LIMIT,
    verbose = VERBOSE)

#Print out the completed options string. This is the query string that is appended to the endpoint URL in the taxonomy API call
print(OPTIONS)

In [None]:
#Make request
pine_req = requests.get(SERVER+'taxonomy/'+OPTIONS)
pine_json = pine_req.json()

## Navigating the Response

Unlike most API call responses, the taxonomy JSON at the uppermost level has more elements that just 'data'. The other elements include:

- **count**- how many species were returned in this response
- **total**- how many species entries are available from NEON (if offset was zero and limit was infinity). 
- **prev**- the API url that could get the 'previous' set of entries (if offset was not zero) matching the other parameters.
- **next**- the API url that could get the next set of entries (if limit was not infinity, and the limit parameter resulted in some entries being excluded).

The **prev** and **next** urls could be used to effectively break up a larger API call into several segments; we ask for a smaller set than we actually want, then use the "next" url to get the next set of entries in a seperate call.

In [None]:
#Print out values in the top level of the pine_json taxonomy dictionary, other than the 'data' entry.
for key in pine_json.keys():
    if(key != 'data'):
        print(key,':',pine_json[key])

 Within the '**data**' element is a list with entries for each taxa returned by the call. Each species entry is a dictionary with atttributes for:

- The full taxonomy, with a separate attribute for each taxonomic level
- The NEON taxonomy type the data was obtained from (taxonTypeCode)
- The short taxon code used by NEON (taxonID, acceptedTaxonID)
- The author of the scientific name
- The common/vernacular name, if any
- The reference text used (nameAccordingToID)

In [None]:
#Print data for one species
sample = pine_json['data'][7]
for key in sample.keys():
    print("{:28}: {}".format(key, sample[key]))

The "dwc" at the beginning of many atttribute names indicates that the terms used for each field are matched to those used by Darwin Core, an official standard maintained for biodiversity reference. The "gbif" refers to the Global Biodiversity Information Facility.

We can also print vernacular names alongside the scientific names of each species entry. 

In [None]:
for species in pine_json['data']:
    print("{:19}| {}".format(species['dwc:vernacularName'], species['dwc:scientificName']))

## Using Taxon Type Code

Let's make another API call, using taxonTypeCode this time. We'll look through some of the NEON Fish Taxonomy, but try the verbose description.

In [None]:
#Set options
SERVER = 'http://data.neonscience.org/api/v0/'
TAXONCODE = 'FISH'
OFFSET = 0
LIMIT = 20
VERBOSE = 'true'

In [None]:
#Create 'options' portion of API call
OPTIONS = '?taxonTypeCode={taxoncode}&offset={offset}&limit={limit}&verbose={verbose}'.format(
    taxoncode = TAXONCODE,
    offset = OFFSET,
    limit = LIMIT,
    verbose = VERBOSE)
print(OPTIONS)

In [None]:
#Make request
fish_req = requests.get(SERVER+'taxonomy/'+OPTIONS)
fish_json = fish_req.json()

Choose an arbitrary species and see what data its dictionary contains.

In [None]:
#Print data for one species in the result
sample = fish_json['data'][7]
for key in sample.keys():
    print("{:28}: {}".format(key, sample[key]))

This is a more verbose entry than what we've seen, so there are more attributes, though many lack values. The 'gbif' attributes indicate terms matched to those used by the Global Biodiversity Forum.

In [None]:
#Print common and scientific name for each fish
for species in fish_json['data']:
    print(species['dwc:vernacularName'],'|', species['dwc:scientificName'])

## Finding a Specific Species

Many NEON data products, such as the land bird breeding counts used in a previous tutorial, include species idetnification data in the form of species name. We can use the NEON *taxonomy/* endpoint to search for a specific species mentioned in the NEON data. Let's look at the 2018-06 Lower Teakettle Bird Counts again, and get more detail on one of the observed species.

In [None]:
import pandas as pd

In [None]:
#Establish target for API search
SITECODE = 'TEAK'
PRODUCTCODE = 'DP1.10003.001'

In [None]:
#Get data on available files
bird_request = requests.get(SERVER+'data/'+PRODUCTCODE+'/'+SITECODE+'/'+'2018-06')
bird_json = bird_request.json()

In [None]:
#Extract the URL for just the 'basic' package of the 'count' data, 
#and read that csv into a pandas data.frame falled 'bird_df'
for file in bird_json['data']['files']:
    if('count' in file['name']):
        if('basic' in file['name']):
            bird_df = pd.read_csv(file['url'])

In [None]:
#View all columns of the first 5 rows
bird_df.head()

The *unique* method for Pandas series, which include individual columns of dataframes, returns the series with all duplicate values removed.

In [None]:
#Use pandas .unique method to see what species were observed
bird_df['scientificName'].unique()

More information on 'Troglodytes aedon' would be interesting. When using a scientific name in a taxonomy API call, which will be encoded as a URL, we replace any spaces in the name with '%20'; also, remember to capitalize the genus name, but not the species name.

In [None]:
#Make request 
aedon_request = requests.get(SERVER+'taxonomy/'+'?scientificname=Troglodytes%20aedon')
aedon_json = aedon_request.json()

Because only a single result was returned, count and total entries will be one, and there will be no urls for the previous or next batch of entries.

It is important to note that the data element is still treated as a list; it is simply a list with only one element.

In [None]:
#Print elements of JSON other than data
for key in aedon_json.keys():
    if(key != 'data'):
        print(key,':',aedon_json[key])

#Print elements of species dict in data list
for key in aedon_json['data'][0].keys():
    print(key,':',aedon_json['data'][0][key])