# Introduction to working with NEON eddy flux data

This data tutorial provides an introduction to working with NEON eddy 
flux data, using the `neonUtilities` R package. If you are new to NEON 
data, we recommend starting with a more general tutorial, such as the 
<a href="https://www.neonscience.org/neonDataStackR" target="_blank">neonUtilities tutorial</a> 
or the <a href="https://www.neonscience.org/download-explore-neon-data" target="_blank">Download and Explore tutorial</a>. 
Some of the functions and techniques described in those tutorials will 
be used here, as well as functions and data formats that are unique to 
the eddy flux system.

This tutorial assumes general familiarity with eddy flux data and 
associated concepts.

## 1. Setup

Start by installing and loading packages and setting options.

In [None]:
devtools::install_github('NEONScience/NEON-utilities/neonUtilities')

In [1]:
options(stringsAsFactors=F, warn=-1)

library(neonUtilities)

Use the `zipsByProduct()` function from the `neonUtilities` package to 
download flux data from two sites and two months. The transformations 
and functions below will work on any time range and site(s), but two 
sites and two months allows us to see all the available functionality 
while minimizing download size.

Inputs to the `zipsByProduct()` function:

* dpID: DP4.00200.001, the bundled eddy covariance product
* package: basic (the expanded package is not covered in this tutorial)
* site: NIWO = Niwot Ridge and HARV = Harvard Forest
* startdate: 2018-06 (both dates are inclusive)
* enddate: 2018-07 (both dates are inclusive)
* savepath: modify this to something logical on your machine
* check.size: T if you want to see file size before downloading, otherwise F

The download may take a while, especially if you're on a slow network.

In [None]:
zipsByProduct(dpID="DP4.00200.001", package="basic", 
              site=c("NIWO", "HARV"), 
              startdate="2018-06", enddate="2018-07",
              savepath="/Users/clunch/Desktop", 
              check.size=F)

## 2. Data Levels

There are five levels of data contained in the eddy flux bundle. For full 
details, refer to the <a href="http://data.neonscience.org/api/v0/documents/NEON.DOC.004571vA" target="_blank">NEON algorithm document</a>.

Briefly, the data levels are:

* Level 0' (dp0p): Calibrated raw observations
* Level 1 (dp01): Time-aggregated observations, e.g. 30-minute mean gas concentrations
* Level 2 (dp02): Time-interpolated data, e.g. rate of change of a gas concentration
* Level 3 (dp03): Spatially interpolated data, i.e. vertical profiles
* Level 4 (dp04): Fluxes

The dp0p data are available in the expanded data package and are beyond 
the scope of this tutorial.

The dp02 and dp03 data are used in storage calculations, and the dp04 data 
include both the storage and turbulent components. Since many users will 
want to focus on the net flux data, we'll start there.

## 3. Extract Level 4 data (Fluxes!)

To extract the Level 4 data from the HDF5 files and merge them into a 
single table, we'll use the `stackEddy()` function from the `neonUtilities` 
package.

`stackEddy()` requires two inputs:

* `filepath`: Path to a file or folder, which can be any one of:
    1. A zip file of eddy flux data downloaded from the NEON data portal
    2. A folder of eddy flux data downloaded by the `zipsByProduct()` function
    3. The folder of files resulting from unzipping either of 1 or 2
    4. A single HDF5 file of NEON eddy flux data
* `level`: dp01-4

Input the filepath you downloaded to using `zipsByProduct()` earlier, 
and dp04:

In [2]:
flux <- stackEddy(filepath="/Users/clunch/Desktop/filesToStack00200/",
                 level="dp04")

Extracting data
Stacking data tables by month
Joining data variables


We now have an object called `flux`; it is a named list containing four 
tables: one for each site's data, and `variables` and `objDesc` tables:

In [3]:
names(flux)

Let's look at the contents of one of the site data files:

In [7]:
head(flux$NIWO)

timeBgn,timeEnd,data.fluxCo2.nsae.flux,data.fluxCo2.stor.flux,data.fluxCo2.turb.flux,data.fluxH2o.nsae.flux,data.fluxH2o.stor.flux,data.fluxH2o.turb.flux,data.fluxMome.turb.veloFric,data.fluxTemp.nsae.flux,⋯,data.foot.stat.veloFric,data.foot.stat.distZaxsMeasDisp,data.foot.stat.distZaxsRgh,data.foot.stat.distZaxsAbl,data.foot.stat.distXaxs90,data.foot.stat.distXaxsMax,data.foot.stat.distYaxs90,qfqm.fluxCo2.stor.qfFinl,qfqm.fluxH2o.stor.qfFinl,qfqm.fluxTemp.stor.qfFinl
2018-06-01T00:00:00.000Z,2018-06-01T00:29:59.000Z,0.1111935,-0.06191186,0.1731053,19.401824,3.2511265,16.150697,0.19707045,4.1712006,⋯,0.2,8.34,0.03221479,1000,333.6,133.44,25.02,1,1,0
2018-06-01T00:30:00.000Z,2018-06-01T00:59:59.000Z,0.9328922,0.08534117,0.847551,10.444936,-1.1768333,11.62177,0.19699723,-0.9163691,⋯,0.2,8.34,0.33007082,1000,258.54,108.42,50.04,1,1,0
2018-06-01T01:00:00.000Z,2018-06-01T01:29:59.000Z,0.4673682,0.02177216,0.445596,5.140617,-4.3112673,9.451884,0.06518208,-2.9814957,⋯,0.2,8.34,0.12876068,1000,308.58,125.1,58.38,1,1,0
2018-06-01T01:30:00.000Z,2018-06-01T01:59:59.000Z,0.7263614,0.24944366,0.4769178,9.017467,0.1980776,8.819389,0.12964,-13.3556222,⋯,0.2,8.34,0.834,1000,208.5,83.4,75.06,1,1,0
2018-06-01T02:00:00.000Z,2018-06-01T02:29:59.000Z,0.4740572,0.22524363,0.2488136,3.180386,0.1316297,3.048756,0.17460706,-5.3406503,⋯,0.2,8.34,0.834,1000,208.5,83.4,66.72,1,1,0
2018-06-01T02:30:00.000Z,2018-06-01T02:59:59.000Z,0.8807022,0.07078007,0.8099221,4.398761,-0.2989443,4.697706,0.1047797,-7.2739206,⋯,0.2,8.34,0.834,1000,208.5,83.4,41.7,1,1,0


The `variables` and `objDesc` tables can help you interpret the column 
headers in the data table. Let's look at the `variables` table:

In [8]:
flux$variables

category,system,variable,stat,units
data,fluxCo2,nsae,,umolCo2 m-2 s-1
data,fluxCo2,stor,,umolCo2 m-2 s-1
data,fluxCo2,turb,,umolCo2 m-2 s-1
data,fluxH2o,nsae,,W m-2
data,fluxH2o,stor,,W m-2
data,fluxH2o,turb,,W m-2
data,fluxMome,turb,,m s-1
data,fluxTemp,nsae,,W m-2
data,fluxTemp,stor,,W m-2
data,fluxTemp,turb,,W m-2
