---
syncID: 
title: "Introduction to NEON API in Python"
description: "Use the NEON API in Python, via requests package and json package."
dateCreated: 2020-04-24
authors: Maxwell J. Burner
contributors: Donal O'Leary
estimatedTime:
packagesLibraries: requests, json, pandas
topics: api
languagesTool: python
dataProduct: DP1.00041.001
code1: 
tutorialSeries: python-neon-api-series
urlTitle: neon-api-03-instrumentation-data
---

In this tutorial we will learn about reading in NEON instrument systems (IS) data using Python.

<div id="ds-objectives" markdown="1">

### Objectives
After completing this tutorial, you will be able to:

* Understand naming conventions of NEON IS Data
* Navigate NEON API data on availability of IS data product files
* Download files with context for interpreting IS data
* Download NEON IS data using the Python Pandas package

### Install Python Packages

* **requests**
* **json** 
* **pandas**



</div>

In this tutorial we will learn how to download IS data from the NEON portal using Python. We will cover how to get necessary metadata and context, and how to download the data itself into Pandas dataframes.

In the previous tutorial, we saw an example of how to download NEON data through the NEON API. Specifically, we saw how to query and download data from observational sampling, OS data, which is directly gathered by NEON field scientists. NEON also uses automated instruments, and thus has Instrument System Data or IS Data. IS Data tends to be stored in files with different naming and labeling formats compared to OS data

## Request Data Availability using NEON API

In [None]:
import requests
import json
import pandas as pd

Let's get soil temperature data from NEON's Woodworth site. Soil temperature data is measured and recorded automatically by soil temeprature probes.

In [None]:
SERVER = 'http://data.neonscience.org/api/v0/'
SITECODE = 'WOOD'
PRODUCTCODE = 'DP1.00041.001'

In [None]:
#Get availability
site_request = requests.get(SERVER+'sites/'+SITECODE)
site_json = site_request.json()

for product in site_json['data']['dataProducts']:
    if(product['dataProductCode'] == PRODUCTCODE):
        print(product['availableMonths'])

Seeing as this dataset is collected by automated instruments, it is available (mostly continuously) since the site was established. Let's get the first 20 data file names available for August 2018.

In [None]:
#Request available files
data_request = requests.get(SERVER+'data/'+PRODUCTCODE+'/'+SITECODE+'/'+'2018-08')
data_json = data_request.json()

In [None]:
for file in data_json['data']['files'][0:20]:
    print(file['name'])

Let's break down the name of one of these files.

In [None]:
print(data_json['data']['files'][7]['name'])

The format for naming instrumentation data files, specifically soil temperature measurements, is:

NEON.D[Domain Number].[site code].[data product ID].[soil plot number].[depth].[averaging interval].[data table name].[year]-[month].[data package type].[date of file creation].[file extension]

So this is Domain 09, Woodworth, soil temperature IS data (DP1.00041.001), collected in plot 001, collected at depth [508] averaged over 30-minute intervals, data table 2018-08.basic, created 2019-03-20 at 15:35:55. Similarly to observational data packages, instrumentational data can be downloaded in a basic or expanded package; the data table 2018-08.basic is the basic data package for August 2018.

## Getting Context

Not all of the files listed in this request are CSV files with recorded data. Some store other information, in tables or in text, used to provide context to the data.

In [None]:
#View names of files that don't containg recoded sensor data
for file in data_json['data']['files']:
    if(not ('basic' in file['name'])):
        if(not ('expanded' in file['name'])): #Avoid csv files of basic or expanded data
            print(file['name'])

These files include "read me" text files, and files with descriptions of the variables being measured. These provide useful context in interpreting and using the data we download. First we take a quick look at the readme file.

In [None]:
#Obtain url of text file and readme
for file in data_json['data']['files']:
    if('readme' in file['name']):
        readme_url = file['url']
    elif('variables' in file['name']):
        variable_url = file['url']
        
readme_req = requests.get(readme_url)

In [None]:
#Print contents of text file
print(readme_req.text)

Next let's look at the 'variables' CSV file listed above. As with observational data products, this contains a table with a row for every variable in the basic and expanded data CSVs, and columns containing various information about each variable.

In [None]:
#Read variables csv into pandas dataframe
df_variables = pd.read_csv(variable_url)


#Filter and show rows for variables in a 1-minute-average table and basic download package
df_variables[(df_variables['table'] == 'ST_1_minute')&(df_variables['downloadPkg'] == 'basic')]

## Downloading the Instrument Data

Now that we have context for each variable, let's read in a CSV file for 1-minute-average soil temperature at WOOD in August 2018. We will again use the Pandas library to enable the use of dataframe objects.

In [None]:
#Check file name and read in file to a data frame
print(data_json['data']['files'][6]['name'])
df_soil_1min = pd.read_csv(data_json['data']['files'][6]['url'])

#Display dimensions:
print('Number of columns: ',df_soil_1min.shape[1])
print('Number of rows: ', df_soil_1min.shape[0])

In [None]:
#Display names and types of columns
df_soil_1min.dtypes

Note that many of the values are "aggregation" sample statistics, like mean, minimum, maximum, etc. This indicates that we aren't getting every single recorded soil temperature; as the table name suggests, we are getting the summary statistics for the data, aggregated over periods of one minute. So the first row includes mean, minimum, and maximum soil temperature for the first minute recording took place (specified by the start and end date-time variables), the second row includes a summary of values for the second minute recording took place, and so forth. Uploading different files from the available data could provide data aggregated over different time intervals.

In [None]:
#Print first ten rows of data
df_soil_1min.head(10)

Now we can manipulate the data using Pandas and other Python libraries.