In [2]:
---
syncID: 
title: "Using the NEON data API in Python"
description: "Use the data endpoint of the NEON API in Python, using the requests package."
dateCreated: 2019-02-14
authors: Bridget Hass
contributors: 
estimatedTime: 30 minutes
packagesLibraries: requests
topics: data-management,rep-sci
languagesTool: python
dataProduct: 
code1: /Python/
tutorialSeries: 
urlTitle: data-api-python
---

SyntaxError: invalid syntax (<ipython-input-2-9f55b5e67be1>, line 1)

## Get packages and set up

This tutorial contains code and instructions for downloading NEON data via the 
API, using the data product `DP1.10098.001 - Woody Plant Vegetation Structure` 
as an example. It follows a similar workflow to the online tutorial 
<a href="https://www.neonscience.org/neon-api-usage" target="_blank">Using the NEON API in R</a>. 
See the R tutorial for further details about the overall API structure, and 
instructions in using other endpoints of the API (locations, taxonomy, etc).

Required packages:
- `requests`: http://docs.python-requests.org/en/master/
- `urllib` : https://docs.python.org/3.5/library/urllib.html#module-urllib

In [3]:
import requests, urllib, os

ImportError: No module named 'requests'

We can use the requests module to see which Veg Structure data is availabe for all sites. For more details on the anatomy of an API call, refer to https://www.neonscience.org/neon-api-usage. Since we are looking for the DP1.10098.001 product, we can attach this endpoint to the NEON data *base* API url - http://data.neonscience.org/api/v0/ as follows: 

In [2]:
r = requests.get("http://data.neonscience.org/api/v0/products/DP1.10098.001")

### info on status codes 
(from https://www.dataquest.io/blog/python-api-tutorial/)

Status codes are returned with every request that is made to a web server. Status codes indicate information about what happened with a request. Here are some codes that are relevant to GET requests:

- 200 : everything went okay, and the result has been returned (if any)
- 301 : the server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
- 401 : the server thinks you're not authenticated. This happens when you don't send the right credentials to access an API (we'll talk about authentication in a later post).
- 400 : the server thinks you made a bad request. This can happen when you don't send along the right data, among other things.
- 403 : the resource you're trying to access is forbidden -- you don't have the right permissions to see it.
- 404 : the resource you tried to access wasn't found on the server.

Let's make sure this request was successful by checking the status code:

In [3]:
r.status_code

200

Good news, the request was successful! You can get some additional information about the request by using the `.headers` method:

In [4]:
r.headers

{'Date': 'Thu, 10 Jan 2019 18:10:42 GMT', 'Access-Control-Allow-Origin': '*', 'X-Content-Type-Options': 'nosniff', 'Server': 'Apache/2.2.15 (Oracle)', 'Content-Type': 'application/json;charset=UTF-8', 'X-XSS-Protection': '1', 'Set-Cookie': 'JSESSIONID=25759F10A5ABB1BFA887C433AF6B972E.dmz-portal-web-1; Path=/; HttpOnly', 'Transfer-Encoding': 'chunked', 'X-Frame-Options': 'SAMEORIGIN', 'Connection': 'close'}

Finally, to pull out information from the request, use `.json()`, which pulls the data contained from the api into a python dictionary. **Note:** you can also use the `.text()` option but this prints everything without any formatting. 

In [5]:
r.json()

{'data': {'changeLogs': None,
  'keywords': ['plant productivity',
   'production',
   'carbon cycle',
   'biomass',
   'vegetation',
   'productivity',
   'plants',
   'vegetation structure',
   'tree height',
   'canopy height',
   'woody plants',
   'trees',
   'net primary productivity (NPP)',
   'annual net primary productivity (ANPP)',
   'shrubs',
   'lianas',
   'saplings'],
  'productAbstract': 'This data product contains the quality-controlled, native sampling resolution data from in-situ measurements of live and standing dead woody individuals and shrub groups, from all terrestrial NEON sites with qualifying woody vegetation. The exact measurements collected per individual depend on growth form, and these measurements are focused on enabling biomass and productivity estimation, estimation of shrub volume and biomass, and calibration / validation of multiple NEON airborne remote-sensing data products. In general, comparatively large individuals that are visible to remote-sens

To display the content headers, you can use the `dictionary` `.keys()` method:

In [6]:
r.json().keys()

dict_keys(['data'])

We can see that everything is nested under the `'data'` key, so let's look at the keys in the data dictionary: 

In [7]:
r.json()['data'].keys()

dict_keys(['productScienceTeamAbbr', 'changeLogs', 'productDescription', 'productHasExpanded', 'productScienceTeam', 'productStudyDescription', 'productCodePresentation', 'productDesignDescription', 'productCode', 'productStatus', 'productRemarks', 'productName', 'productAbstract', 'siteCodes', 'themes', 'productSensor', 'productCategory', 'productCodeLong', 'keywords', 'specs'])

Or to print each key on a separate line (to more easily view), run the following:

In [8]:
for k, v in r.json().items():
    for k1, v1 in v.items():
        print(k1)

productScienceTeamAbbr
changeLogs
productDescription
productHasExpanded
productScienceTeam
productStudyDescription
productCodePresentation
productDesignDescription
productCode
productStatus
productRemarks
productName
productAbstract
siteCodes
themes
productSensor
productCategory
productCodeLong
keywords
specs


We want to extract the `'availableDataUrls'` from the 'siteCodes' key, and only list the ones that match the site in question, in this case `'ABBY'`. One way to do that is to loop through all the site codes, search for a match with the site we want, and then print the list of available data for that site: 

In [9]:
site = 'ABBY'
for i in range(len(r.json()['data']['siteCodes'])):
    if site in r.json()['data']['siteCodes'][i]['siteCode']:
        data_urls = r.json()['data']['siteCodes'][i]['availableDataUrls']
data_urls #display all ABBY DP1.10098.001 data urls

['http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2015-07',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2015-08',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2016-08',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2016-09',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2016-10',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2016-11',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2017-03',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2017-04',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2017-07',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2017-08',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2017-09']

If you only want to extract the most recent year's data (in this case 2017), you can further subset these urls by date as follows:

In [10]:
abby_2017_urls = [url for url in data_urls if '2017' in url]
abby_2017_urls

['http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2017-03',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2017-04',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2017-07',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2017-08',
 'http://data.neonscience.org:80/api/v0/data/DP1.10098.001/ABBY/2017-09']

Now that we have the list of api urls corresponding to the data products we want to download, we can make another request. Let's start with the first url as an example, then loop through all of the urls:

In [11]:
r = requests.get(abby_2017_urls[0])
r.json()

{'data': {'files': [{'crc32': 'dfcc253696bc54002e48ed0587aab7f1',
    'name': 'NEON.D16.ABBY.DP1.10098.001.vst_mappingandtagging.basic.20180508T211326Z.csv',
    'size': '855968',
    'url': 'https://neon-prod-pub-1.s3.data.neonscience.org/NEON.DOM.SITE.DP1.10098.001/PROV/ABBY/20170301T000000--20170401T000000/basic/NEON.D16.ABBY.DP1.10098.001.vst_mappingandtagging.basic.20180508T211326Z.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20190110T181042Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3599&X-Amz-Credential=pub-internal-read%2F20190110%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=d56210cc41a6f87b263e0453c36ccdad3c1f194f64ecfceb049d5eebae451cf2'},
   {'crc32': 'abc4f207b8803db906cd520e580ca7f6',
    'name': 'NEON.D16.ABBY.DP1.10098.001.2017-03.basic.20180508T211326Z.zip',
    'size': '152440',
    'url': 'https://neon-prod-pub-1.s3.data.neonscience.org/NEON.DOM.SITE.DP1.10098.001/PROV/ABBY/20170301T000000--20170401T000000/basic/NEON.D16.ABBY.DP1.10098.001.2017-03.basic.20180508T21

Display the keys to show the information provided for each file:

In [12]:
r.json()['data'].keys()

dict_keys(['files', 'siteCode', 'month', 'productCode'])

The download url links are nested under `['data']['files']`:

In [13]:
r.json()['data']['files']

[{'crc32': 'dfcc253696bc54002e48ed0587aab7f1',
  'name': 'NEON.D16.ABBY.DP1.10098.001.vst_mappingandtagging.basic.20180508T211326Z.csv',
  'size': '855968',
  'url': 'https://neon-prod-pub-1.s3.data.neonscience.org/NEON.DOM.SITE.DP1.10098.001/PROV/ABBY/20170301T000000--20170401T000000/basic/NEON.D16.ABBY.DP1.10098.001.vst_mappingandtagging.basic.20180508T211326Z.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20190110T181042Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3599&X-Amz-Credential=pub-internal-read%2F20190110%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=d56210cc41a6f87b263e0453c36ccdad3c1f194f64ecfceb049d5eebae451cf2'},
 {'crc32': 'abc4f207b8803db906cd520e580ca7f6',
  'name': 'NEON.D16.ABBY.DP1.10098.001.2017-03.basic.20180508T211326Z.zip',
  'size': '152440',
  'url': 'https://neon-prod-pub-1.s3.data.neonscience.org/NEON.DOM.SITE.DP1.10098.001/PROV/ABBY/20170301T000000--20170401T000000/basic/NEON.D16.ABBY.DP1.10098.001.2017-03.basic.20180508T211326Z.zip?X-Amz-Algorithm=AWS4-HM

Now we have all the information we need to download the files for each of the months data is available. First, make a directory to download the data:

In [14]:
tos_data_folder = './Data/ABBY_TOS_WoodyVegStructure/' #create a folder in current directory to store TOS data
os.mkdir('./Data/')
os.mkdir(tos_data_folder)

Loop through the 2017 data urls and use the `urllib.request.urlretrieve` method download all the files except the zip folder (to avoid redundance) to the data folder, with a subfolder named after the month: `./Data/ABBY_TOS_WoodyVegStructure/yyyy_mm`:

In [15]:
for url in abby_2017_urls:
    month = url.split('/')[-1]
    download_folder = tos_data_folder + month + '/'
    os.mkdir(download_folder)
    r = requests.get(url)
    files = r.json()['data']['files']
    for i in range(len(files)):
        if '.zip' not in files[i]['name']:
            print('downloading ' + files[i]['name'] + ' to ' + download_folder)
            urllib.request.urlretrieve(files[i]['url'], download_folder + files[i]['name'])

downloading NEON.D16.ABBY.DP1.10098.001.vst_mappingandtagging.basic.20180508T211326Z.csv to ./Data/ABBY_TOS_WoodyVegStructure/2017-03/
downloading NEON.D16.ABBY.DP1.10098.001.EML.20170314-20170331.20180508T211326Z.xml to ./Data/ABBY_TOS_WoodyVegStructure/2017-03/
downloading NEON.D16.ABBY.DP1.10098.001.vst_apparentindividual.2017-03.basic.20180508T211326Z.csv to ./Data/ABBY_TOS_WoodyVegStructure/2017-03/
downloading NEON.D16.ABBY.DP1.10098.001.vst_perplotperyear.2017-03.basic.20180508T211326Z.csv to ./Data/ABBY_TOS_WoodyVegStructure/2017-03/
downloading NEON.D16.ABBY.DP1.10098.001.readme.20180508T211326Z.txt to ./Data/ABBY_TOS_WoodyVegStructure/2017-03/
downloading NEON.D16.ABBY.DP0.10098.001.validation.20180508T211326Z.csv to ./Data/ABBY_TOS_WoodyVegStructure/2017-03/
downloading NEON.D16.ABBY.DP1.10098.001.variables.20180508T211326Z.csv to ./Data/ABBY_TOS_WoodyVegStructure/2017-03/
downloading NEON.D16.ABBY.DP0.10098.001.validation.20180508T210449Z.csv to ./Data/ABBY_TOS_WoodyVegStru

Alternatively, you could loop through all the 2017 data and download only the zip files directly to the `./Data` folder (without separating into monthly subfolders) as follows: 

In [16]:
for url in abby_2017_urls:
    r = requests.get(url)
    files = r.json()['data']['files']
    for i in range(len(files)):
        if '.zip' in files[i]['name']:
            print('downloading ' + files[i]['name'] + ' to ' + download_folder)
            urllib.request.urlretrieve(files[i]['url'], tos_data_folder + files[i]['name'])

downloading NEON.D16.ABBY.DP1.10098.001.2017-03.basic.20180508T211326Z.zip to ./Data/ABBY_TOS_WoodyVegStructure/2017-09/
downloading NEON.D16.ABBY.DP1.10098.001.2017-04.basic.20180508T210449Z.zip to ./Data/ABBY_TOS_WoodyVegStructure/2017-09/
downloading NEON.D16.ABBY.DP1.10098.001.2017-07.basic.20180508T205059Z.zip to ./Data/ABBY_TOS_WoodyVegStructure/2017-09/
downloading NEON.D16.ABBY.DP1.10098.001.2017-08.basic.20180508T205121Z.zip to ./Data/ABBY_TOS_WoodyVegStructure/2017-09/
downloading NEON.D16.ABBY.DP1.10098.001.2017-09.basic.20180508T205224Z.zip to ./Data/ABBY_TOS_WoodyVegStructure/2017-09/


## References & Additional Resources

https://www.neonscience.org/neon-api-usage
https://www.dataquest.io/blog/python-api-tutorial/