---
syncID: 
title: "Downlaoding NEON Observation Data with Python"
description: ""
dateCreated: 2020-04-24
authors: Maxwell J. Burner
contributors: 
estimatedTime: 
packagesLibraries: requests, json, pandas
topics: api, data management
languagesTool: python
dataProduct: DP1.10003.001
code1: 
tutorialSeries: python-neon-api-series
urlTitle: python_neon_api_downloading_observational
---

In this tutorial we will learn to download Observational Sampling (OS) data from the NEON API using Python.

<div id="ds-objectives" markdown="1">

### Objectives
After completing this tutorial, you will be able to:

* Navigate a NEON API request from the *data/* endpoint
* Describe the naming conventions of NEON OS data files
* Understand how to download NEON observational data using the Python Pandas library
* Describe the basic components of a Pandas dataframe


### Install Python Packages

* **requests**
* **json** 
* **numpy**
* **pandas**

We will not actually use the NumPy package in this tutorial; it is listed here because the Pandas package is built on top of NumPy, and requires that the latter be present.

</div>

In this tutorial we will learn how to download specific NEON data files into Python. We will specifically look at how to use the Pandas package to read in CSV files of observational data.

NEON has three basic types of data: Observational Sampling (OS), Instrumentation Sampling (IS), and Remote Sensing or Aerial Observation Plane(?) data (AOP). The process for request data is about the same for all three, but downloading and navigating the data tends to be very different depending on which category we want. Here we will discuss downloading observational data, as it tends to be the simplest to handle.

## Libraries Downloaded

In addition to used requests and json packages again, we will use the Pandas package to read in the data. Pandas is a library that adds data frame objects to Python, based on the data frames of the R programming language; these offer a great way to store and manipulate tabular data.

In [1]:
import requests
import json
import pandas as pd

## Look up Data Product Availability

Lets look up the avilability of breeding landbird point counts for the San Joaquin Site

In [2]:
SERVER = 'http://data.neonscience.org/api/v0/'
SITECODE = 'SJER'
PRODUCTCODE = 'DP1.10003.001'

In [3]:
#Request data on site data products
site_request = requests.get(SERVER+'sites/'+SITECODE)
site_json = site_request.json()

In [4]:
#Determine available dates for landbird point counts at San Joaquin
for product in site_json['data']['dataProducts']:
    if(product['dataProductCode'] == PRODUCTCODE):
        print(product['availableMonths'])

['2017-04', '2018-04', '2019-04']


In [9]:
product_request = requests.get(SERVER+'products/'+PRODUCTCODE)
product_json = product_request.json()

In [11]:
product_json['data']['productName']

'Breeding landbird point counts'

## Look up Data Files

In [5]:
#Make Request
data_request = requests.get(SERVER+'data/'+PRODUCTCODE+'/'+SITECODE+'/'+'2019-04')
data_json = data_request.json()

In [6]:
#View names of files
for file in data_json['data']['files']:
    print(file['name'])

NEON.D17.SJER.DP1.10003.001.EML.20190403-20190410.20191205T150154Z.xml
NEON.D17.SJER.DP0.10003.001.validation.20191205T150154Z.csv
NEON.D17.SJER.DP1.10003.001.2019-04.basic.20191205T150154Z.zip
NEON.D17.SJER.DP1.10003.001.variables.20191205T150154Z.csv
NEON.D17.SJER.DP1.10003.001.readme.20191205T150154Z.txt
NEON.D17.SJER.DP1.10003.001.brd_countdata.2019-04.basic.20191205T150154Z.csv
NEON.D17.SJER.DP1.10003.001.brd_perpoint.2019-04.basic.20191205T150154Z.csv
NEON.D17.SJER.DP0.10003.001.validation.20191205T150154Z.csv
NEON.D17.SJER.DP1.10003.001.brd_references.expanded.20191205T150154Z.csv
NEON.D17.SJER.DP1.10003.001.2019-04.expanded.20191205T150154Z.zip
NEON.Bird_Conservancy_of_the_Rockies.brd_personnel.csv
NEON.D17.SJER.DP1.10003.001.brd_perpoint.2019-04.expanded.20191205T150154Z.csv
NEON.D17.SJER.DP1.10003.001.brd_countdata.2019-04.expanded.20191205T150154Z.csv
NEON.D17.SJER.DP1.10003.001.readme.20191205T150154Z.txt
NEON.D17.SJER.DP1.10003.001.EML.20190403-20190410.20191205T150154Z.xm

Let's take a closer look at a file name.

In [8]:
print(data_json['data']['files'][6]['name'])

NEON.D17.SJER.DP1.10003.001.brd_perpoint.2019-04.basic.20191205T150154Z.csv


The format for most NEON data product file names is:

**NEON.[domain number].[site code].[data product ID].[file-specific name].[date of file creation]**

So the file whose name we singled out is domain 17, San Joaquin Site, Breeding Landbird point counts (DP1.10003.001), brd_perpoint.2019-04.basic, created 2019-12-05 at 15:01:54. The file name brd_perpoint.2019-04.basic indicates that this is the basic version of bird counts by point for April 2019.

Bird counts and other observational data are usually kept in CSV files in the NEON database. Often the data for a particular month-site combination will be available in through two different .csv files, two different 'download packages'; a 'basic' package storing only the main measurements, and an 'expanded' package that also lists the uncertainties involved in each measurement. Let's save the url for the basic count data CSV file.

In [12]:
#Print names and URLs of files with birdcount data
for file in data_json['data']['files']:
    if('countdata' in file['name']): #Show both basic and expanded files
        print(file['name'],file['url'])
        if('basic' in file['name']):
            bird_count_url = file['url'] #save url of file with basic bird count data


NEON.D17.SJER.DP1.10003.001.brd_countdata.2019-04.basic.20191205T150154Z.csv https://neon-prod-pub-1.s3.data.neonscience.org/NEON.DOM.SITE.DP1.10003.001/PROV/SJER/20190401T000000--20190501T000000/basic/NEON.D17.SJER.DP1.10003.001.brd_countdata.2019-04.basic.20191205T150154Z.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200427T131905Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=pub-internal-read%2F20200427%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Signature=51423568192d8c084e81b9c94cb4b830e7597891a48650c35fe49c4583475693
NEON.D17.SJER.DP1.10003.001.brd_countdata.2019-04.expanded.20191205T150154Z.csv https://neon-prod-pub-1.s3.data.neonscience.org/NEON.DOM.SITE.DP1.10003.001/PROV/SJER/20190401T000000--20190501T000000/expanded/NEON.D17.SJER.DP1.10003.001.brd_countdata.2019-04.expanded.20191205T150154Z.csv?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20200427T131905Z&X-Amz-SignedHeaders=host&X-Amz-Expires=3600&X-Amz-Credential=pub-internal-read%2F20200427%2Fus-west-2%2Fs3%2

## Read file into Pandas Dataframe

There are a couple options for reading CSV files into Python. For files read directly from NEON's data repository, the best option seems to be the 'read_csv' function from the Pandas package. This function converts the contents of the target file into a pandas dataframe object, and has the added advantage of being able to read data files accessed through the web (Python has it's own built in package for reading CSV files, but this package can only read files present on your machine).

In [13]:
#Read bird count CSV data into a Pandas Dataframe
df_bird = pd.read_csv(bird_count_url)

A dataframe is a two-dimensional table of data; the columns correspond to the different variables being measured, while the rows correspond to each entry or measurement taken (in this case, each bird counted). Dataframes also have a header containing labels for each column, and an index containing labels for each row; both are 'index' objects stored as attributes of the dataframe object.

Python dataframes store their contents, header, and index in different attributes of the dataframe object. Other attributes contain metadata such as the overall shape of the dataframe, and the data type of each column.

In [14]:
#View the column names
df_bird.columns

Index(['uid', 'namedLocation', 'domainID', 'siteID', 'plotID', 'plotType',
       'pointID', 'startDate', 'eventID', 'pointCountMinute',
       'targetTaxaPresent', 'taxonID', 'scientificName', 'taxonRank',
       'vernacularName', 'observerDistance', 'detectionMethod',
       'visualConfirmation', 'sexOrAge', 'clusterSize', 'clusterCode',
       'identifiedBy'],
      dtype='object')

In [15]:
#Print out dimensions of the new dataframe
print('Number of columns: ',df_bird.shape[1])
print('Number of Rows: ',df_bird.shape[0])

Number of columns:  22
Number of Rows:  2261


In [16]:
#Print out names and data types of dataframe columns
print(df_bird.dtypes)

uid                    object
namedLocation          object
domainID               object
siteID                 object
plotID                 object
plotType               object
pointID                object
startDate              object
eventID                object
pointCountMinute        int64
targetTaxaPresent      object
taxonID                object
scientificName         object
taxonRank              object
vernacularName         object
observerDistance      float64
detectionMethod        object
visualConfirmation     object
sexOrAge               object
clusterSize           float64
clusterCode            object
identifiedBy           object
dtype: object


In [17]:
#View first five rows of dataframe using the 'head' method
df_bird.head(5)

Unnamed: 0,uid,namedLocation,domainID,siteID,plotID,plotType,pointID,startDate,eventID,pointCountMinute,...,scientificName,taxonRank,vernacularName,observerDistance,detectionMethod,visualConfirmation,sexOrAge,clusterSize,clusterCode,identifiedBy
0,c7a51c83-749e-4660-9717-ec0cbf5b7c97,SJER_013.birdGrid.brd,D17,SJER,SJER_013,distributed,C1,2019-04-03T14Z,SJER_013.C1.2019-04-03T06:57-07:00[US/Pacific],4,...,Setophaga coronata,species,Yellow-rumped Warbler,56.0,calling,No,Unknown,1.0,,JTIET
1,8788db9f-5a0b-4363-baba-9f49380c3f3a,SJER_013.birdGrid.brd,D17,SJER,SJER_013,distributed,C1,2019-04-03T14Z,SJER_013.C1.2019-04-03T06:57-07:00[US/Pacific],3,...,Regulus calendula,species,Ruby-crowned Kinglet,89.0,singing,No,Unknown,1.0,,JTIET
2,0708534e-5d49-45bc-80ed-d503e9a5b99e,SJER_013.birdGrid.brd,D17,SJER,SJER_013,distributed,C1,2019-04-03T14Z,SJER_013.C1.2019-04-03T06:57-07:00[US/Pacific],1,...,Baeolophus inornatus,species,Oak Titmouse,103.0,singing,No,Unknown,1.0,,JTIET
3,13561db9-14fa-419a-a614-7f0df074d4a2,SJER_013.birdGrid.brd,D17,SJER,SJER_013,distributed,C1,2019-04-03T14Z,SJER_013.C1.2019-04-03T06:57-07:00[US/Pacific],1,...,Melozone crissalis,species,California Towhee,85.0,singing,No,Unknown,1.0,,JTIET
4,f7221fc0-50e6-4019-87a3-9d9a50fa5b18,SJER_013.birdGrid.brd,D17,SJER,SJER_013,distributed,C1,2019-04-03T14Z,SJER_013.C1.2019-04-03T06:57-07:00[US/Pacific],1,...,Regulus calendula,species,Ruby-crowned Kinglet,85.0,singing,No,Unknown,1.0,,JTIET


We can now manipulate this dataframe using the various methods and functions of the Pandas library.