---
syncID: 
title: "Introduction to NEON API in Python"
description: "Use the NEON API in Python, via requests package and json package."
dateCreated: 2020-04-24
authors: Maxwell J. Burner
contributors: 
estimatedTime:
packagesLibraries: requests, json, pandas
topics: api
languagesTool: python
dataProduct: DP1.00041.001
code1: 
tutorialSeries: python-neon-api-series
urlTitle: neon_api_instrumentation
---

In this tutorial we will learn about reading in NEON Instrumentational Sampling (IS) data using Python.

<div id="ds-objectives" markdown="1">

### Objectives
After completing this tutorial, you will be able to:

* Understand naming conventions of NEON IS Data
* Navigate NEON API data on availability of IS data product files
* Download files with context for interpreting IS data
* Download NEON IS data using the Python Pandas package

### Install Python Packages

* **requests**
* **json** 
* **pandas**



</div>

In this tutorial we will learn how to download IS data from the NEON portal using Python. We will cover how to get necessary metadata and context, and how to download the data itself into Pandas dataframes.

In the previous tutorial, we saw an example of how to download NEON data through the NEON API. Specifically, we saw how to query and download data from observational sampling, OS data, which is directly gathered by NEON field sicentists. NEON also uses automated instruments, and thus has Instrumentation Sampling Data or IS Data. IS Data tends to be stored in different formats compared to OS data.

## Request Data using NEON API

In [10]:
import requests
import json
import pandas as pd

Let's get soil temperature data from NEON's Woodworth site.

In [11]:
SERVER = 'http://data.neonscience.org/api/v0/'
SITECODE = 'WOOD'
PRODUCTCODE = 'DP1.00041.001'

In [12]:
#Get availability
site_request = requests.get(SERVER+'sites/'+SITECODE)
site_json = site_request.json()

for product in site_json['data']['dataProducts']:
    if(product['dataProductCode'] == PRODUCTCODE):
        print(product['availableMonths'])

['2017-07', '2017-08', '2017-09', '2017-10', '2017-11', '2017-12', '2018-01', '2018-02', '2018-03', '2018-04', '2018-05', '2018-06', '2018-07', '2018-08', '2018-09', '2018-10', '2018-11', '2018-12', '2019-01', '2019-02', '2019-03', '2019-04', '2019-05', '2019-06', '2019-07', '2019-08', '2019-09', '2019-10', '2019-11', '2019-12', '2020-01', '2020-02', '2020-03']


In [13]:
#Request available files
data_request = requests.get(SERVER+'data/'+PRODUCTCODE+'/'+SITECODE+'/'+'2018-08')
data_json = data_request.json()

In [14]:
for file in data_json['data']['files'][0:20]:
    print(file['name'])

NEON.D09.WOOD.DP1.00041.001.readme.20190320T153550Z.txt
NEON.D09.WOOD.DP1.00041.001.001.505.001.ST_1_minute.2018-08.basic.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.004.504.001.ST_1_minute.2018-08.basic.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.005.509.030.ST_30_minute.2018-08.basic.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.002.507.030.ST_30_minute.2018-08.basic.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.001.507.001.ST_1_minute.2018-08.basic.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.003.507.030.ST_30_minute.2018-08.basic.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.001.503.001.ST_1_minute.2018-08.basic.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.004.505.001.ST_1_minute.2018-08.basic.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.001.506.001.ST_1_minute.2018-08.basic.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.004.509.001.ST_1_minute.2018-08.basic.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.004.503.001.ST_1_minute.2018-08.basic.20190

Let's break down the name of one of these files.

In [15]:
print(data_json['data']['files'][7]['name'])

NEON.D09.WOOD.DP1.00041.001.001.503.001.ST_1_minute.2018-08.basic.20190320T153550Z.csv


The format for naming is:

NEON.[Domain Number].[site code].[data product ID].[soil plot number].[depth].[averaging interval].[data table name].[year]-[month].[data package].[date of file creation]

So this is Domain 09, Woodsworth, soil temperature data (DP1.00041.001), collected in plot one, collected at third depth [503] averaged over one-minute intervals, collected 2018-08, basic data package, created 2019-03-20 at 15:35:55

## Getting Context

Not all of the files listed in this request are CSV files with recorded data. Some store other information, in tables or in text, used to provide context to the data.

In [16]:
#View names of files that don't containg recoded sensor data
for file in data_json['data']['files']:
    if(not ('basic' in file['name'])):
        if(not ('expanded' in file['name'])): #Avoid csv files of basic or expanded data
            print(file['name'])

NEON.D09.WOOD.DP1.00041.001.readme.20190320T153550Z.txt
NEON.D09.WOOD.DP1.00041.001.EML.20180801-20180901.20190320T153550Z.xml
NEON.D09.WOOD.DP1.00041.001.sensor_positions.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.variables.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.readme.20190320T153550Z.txt
NEON.D09.WOOD.DP1.00041.001.variables.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.sensor_positions.20190320T153550Z.csv
NEON.D09.WOOD.DP1.00041.001.EML.20180801-20180901.20190320T153550Z.xml


These files include "read me" text files, and files with descriptions of the variables being measured. These provide useful context in interpreting and using the data we download. First we take a quick look at the readme file.

In [17]:
#Obtain url of text file and readme
for file in data_json['data']['files']:
    if('readme' in file['name']):
        readme_url = file['url']
    elif('variables' in file['name']):
        variable_url = file['url']
        
readme_req = requests.get(readme_url)

In [18]:
#Print contents of text file
print(readme_req.text)

This data package been produced by and downloaded from the National Ecological Observatory Network, managed cooperatively by Battelle. These data are provided under the terms of the NEON data policy at http://data.neonscience.org/data-policy. 

DATA PRODUCT INFORMATION
------------------------

ID: NEON.DOM.SITE.DP1.00041.001

Name: Soil temperature

Description: Temperature of the soil at various depth below the soil surface from 2 cm up to 200 cm at non-permafrost sites (up to 300 cm at Alaskan sites). Data are from all five Instrumented Soil Plots per site and presented as 1-minute and 30-minute averages.

NEON Science Team Supplier: TIS

Abstract: Soil temperature is measured at various depths below the soil surface from approximately 2 cm up to 200 cm at non-permafrost sites (up to 300 cm at Alaskan sites). Soil temperature influences the rate of biogeochemical cycling, decomposition, and root and soil biota activity. In addition, soil temperature can impact the hydro

Next let's look at the 'variables' CSV file listed above. This contains a table with a row for every varibale in the basic and expanded data CSVs, and columns containing various information about each variable.

In [21]:
#Read variables csv into pandas dataframe
df_variables = pd.read_csv(variable_url)


#Print rows for variables in a 1-minute-average table in basic download package
df_variables[(df_variables['table'] == 'ST_1_minute')&(df_variables['downloadPkg'] == 'basic')]

Unnamed: 0,table,fieldName,description,dataType,units,downloadPkg
34,ST_1_minute,startDateTime,Date and time at which a sampling is initiated,dateTime,,basic
35,ST_1_minute,endDateTime,Date and time at which a sampling is completed,dateTime,,basic
36,ST_1_minute,soilTempMean,Arithmetic mean of Soil Temperature,real,celsius,basic
37,ST_1_minute,soilTempMinimum,Minimum Soil Temperature,real,celsius,basic
38,ST_1_minute,soilTempMaximum,Maximum Soil Temperature,real,celsius,basic
39,ST_1_minute,soilTempVariance,Variance in Soil Temperature,real,celsiusSquared,basic
40,ST_1_minute,soilTempNumPts,Number of points used to calculate the arithme...,real,number,basic
41,ST_1_minute,soilTempExpUncert,Expanded uncertainty for Soil Temperature,real,celsius,basic
42,ST_1_minute,soilTempStdErMean,Standard error of the mean for Soil Temperature,real,celsius,basic
66,ST_1_minute,finalQF,Quality flag indicating whether a data product...,unsigned integer,,basic


## Downloading the Recorded Data

Now that we have context for each variable, let's read in the csv file whose name we examined earlier. We will again use the Pandas library.

In [22]:
#Read in file
df_soil_1min = pd.read_csv(data_json['data']['files'][7]['url'])

#Display dimensions:
print('Number of columns: ',df_soil_1min.shape[1])
print('Number of rows: ', df_soil_1min.shape[0])

Number of columns:  10
Number of rows:  44640


In [23]:
#Display names and types of columns
df_soil_1min.dtypes

startDateTime         object
endDateTime           object
soilTempMean         float64
soilTempMinimum      float64
soilTempMaximum      float64
soilTempVariance     float64
soilTempNumPts         int64
soilTempExpUncert    float64
soilTempStdErMean    float64
finalQF                int64
dtype: object

In [24]:
#Print first ten rows of data
df_soil_1min.head(10)

Unnamed: 0,startDateTime,endDateTime,soilTempMean,soilTempMinimum,soilTempMaximum,soilTempVariance,soilTempNumPts,soilTempExpUncert,soilTempStdErMean,finalQF
0,2018-08-01T00:00:00Z,2018-08-01T00:01:00Z,21.2,21.199,21.203,3e-06,6,0.0327,0.000643,0
1,2018-08-01T00:01:00Z,2018-08-01T00:02:00Z,21.194,21.191,21.198,6e-06,6,0.03274,0.000974,0
2,2018-08-01T00:02:00Z,2018-08-01T00:03:00Z,21.189,21.187,21.192,6e-06,6,0.03274,0.000965,0
3,2018-08-01T00:03:00Z,2018-08-01T00:04:00Z,21.183,21.18,21.185,4e-06,6,0.03272,0.000799,0
4,2018-08-01T00:04:00Z,2018-08-01T00:05:00Z,21.178,21.175,21.18,4e-06,6,0.03272,0.000774,0
5,2018-08-01T00:05:00Z,2018-08-01T00:06:00Z,21.173,21.171,21.175,3e-06,6,0.0327,0.000642,0
6,2018-08-01T00:06:00Z,2018-08-01T00:07:00Z,21.168,21.166,21.17,3e-06,6,0.03271,0.000713,0
7,2018-08-01T00:07:00Z,2018-08-01T00:08:00Z,21.162,21.16,21.164,3e-06,6,0.03271,0.000695,0
8,2018-08-01T00:08:00Z,2018-08-01T00:09:00Z,21.157,21.154,21.161,5e-06,6,0.03273,0.000918,0
9,2018-08-01T00:09:00Z,2018-08-01T00:10:00Z,21.152,21.149,21.154,3e-06,6,0.03271,0.000757,0
