# Overview 

This Jupyter notebook example shows how to use tools from the `flux-data-qaqc` Python package to produce "corrected" energy balance closure ratios by adjusting turblent energy fluxes, and to conduct other Qa/Qc related routines typically needed for eddy covariance climate station data. The data used is a daily time series from the FLUXNET 2015 dataset for site *US-AR1*. The data used herein is provided with the software package and can be downloaded [here](https://github.com/Open-ET/flux-data-qaqc/blob/master/examples/Basic_usage/FLX_US-AR1_FLUXNET2015_SUBSET_DD_2009-2012_1-3.xlsx).

## What you need to do first

This example is provided and can be reproduced by downloading `flux-data-qaqc` from github with git 

```bash
git clone https://github.com/Open-ET/flux-data-qaqc.git
```

or you can download a compressed folder from [GitHub](https://github.com/Open-ET/flux-data-qaqc). Dependencies can be handled by either installinng and activating the [provided Conda environment](https://raw.githubusercontent.com/Open-ET/flux-data-qaqc/master/environment.yml) or using PIP. 

```bash
conda env create -f environment.yml
```

To activate the environment before using the `flux-data-qaqc` package run,

```bash
conda activate fluxdataqaqc
```

Run the following to install `flux-data-qaqc` in developer mode, soon the package will be uploaded and available on PYPI,

```bash
cd flux-data-qaqc
pip install -e .
```

Now, you should be able to import `flux-data-qaqc` objects and modules within Python. Go ahead and test that everything has installed correctly by opening a Python interpretor or IDE and running the following:

```Python
>>> import fluxdataqaqc
```
or 
```Python
>>> from fluxdataqaqc import Data, QaQc, Plot
```

**Note:** currently, the software does not include a command line interface therefore to use the software you must use Python, specifically Python 3.5 or newer. However, you will see that to do a basic workflow you will not need to write more than a few (5-10) lines of code and can simply follow the templates given here. 

In [1]:
%load_ext autoreload
%autoreload 2
from fluxdataqaqc import Data, QaQc, Plot
from bokeh.plotting import figure, show
from bokeh.models.formatters import DatetimeTickFormatter
from bokeh.models import LinearAxis, Range1d
from bokeh.io import output_notebook
output_notebook()

## Create a ``Data`` object to read in time series data using a config file

In [2]:
config_path = 'fluxnet_config.ini'
d = Data(config_path)

In [3]:
# you can access all metadata and datain the config file as a list
d.config.items('METADATA') # can access the DATA section the same way

[('climate_file_path', 'FLX_US-AR1_FLUXNET2015_SUBSET_DD_2009-2012_1-3.xlsx'),
 ('site_id', 'US-AR1'),
 ('station_latitude', '36.4267'),
 ('station_longitude', '-99.42'),
 ('station_elevation', '611'),
 ('anemometer_height', '3'),
 ('missing_data_value', '-9999')]

In [4]:
# or as a dict, e.g. to access specific values by name
d.config.get('METADATA','station_elevation')

'611'

In [5]:
# path to climate time series input and config files
print(d.climate_file, '\n', d.config_file)

/home/john/flux-data-qaqc/examples/Basic_usage/FLX_US-AR1_FLUXNET2015_SUBSET_DD_2009-2012_1-3.xlsx 
 /home/john/flux-data-qaqc/examples/Basic_usage/fluxnet_config.ini


In [6]:
# view full header of input time series file
d.header

Index(['TIMESTAMP', 'TA_F', 'TA_F_QC', 'SW_IN_POT', 'SW_IN_F', 'SW_IN_F_QC',
       'LW_IN_F', 'LW_IN_F_QC', 'VPD_F', 'VPD_F_QC', 'PA_F', 'PA_F_QC', 'P_F',
       'P_F_QC', 'WS_F', 'WS_F_QC', 'USTAR', 'USTAR_QC', 'NETRAD', 'NETRAD_QC',
       'PPFD_IN', 'PPFD_IN_QC', 'PPFD_OUT', 'PPFD_OUT_QC', 'SW_OUT',
       'SW_OUT_QC', 'LW_OUT', 'LW_OUT_QC', 'CO2_F_MDS', 'CO2_F_MDS_QC',
       'TS_F_MDS_1', 'TS_F_MDS_1_QC', 'SWC_F_MDS_1', 'SWC_F_MDS_1_QC',
       'G_F_MDS', 'G_F_MDS_QC', 'LE_F_MDS', 'LE_F_MDS_QC', 'LE_CORR',
       'LE_CORR_25', 'LE_CORR_75', 'LE_RANDUNC', 'H_F_MDS', 'H_F_MDS_QC',
       'H_CORR', 'H_CORR_25', 'H_CORR_75', 'H_RANDUNC', 'NEE_VUT_REF',
       'NEE_VUT_REF_QC', 'NEE_VUT_REF_RANDUNC', 'NEE_VUT_25', 'NEE_VUT_50',
       'NEE_VUT_75', 'NEE_VUT_25_QC', 'NEE_VUT_50_QC', 'NEE_VUT_75_QC',
       'RECO_NT_VUT_REF', 'RECO_NT_VUT_25', 'RECO_NT_VUT_50', 'RECO_NT_VUT_75',
       'GPP_NT_VUT_REF', 'GPP_NT_VUT_25', 'GPP_NT_VUT_50', 'GPP_NT_VUT_75',
       'RECO_DT_VUT_REF', 'RECO_D

## View which variables and units that `flux-data-qaqc` has read with the `variables` and `units` attributes

This dictionary will be updated with calculated variables in the `QaQc` object (shown below) and also available in the `Plot` object.

In [7]:
# the keys are the variable names that are internal to flux-data-qaqc
d.variables

{'date': 'TIMESTAMP',
 'year': 'na',
 'month': 'na',
 'day': 'na',
 'Rn': 'NETRAD',
 'G': 'G_F_MDS',
 'LE': 'LE_F_MDS',
 'LE_user_corr': 'LE_CORR',
 'H': 'H_F_MDS',
 'H_user_corr': 'H_CORR',
 'sw_in': 'SW_IN_F',
 'sw_out': 'SW_OUT',
 'sw_pot': 'SW_IN_POT',
 'lw_in': 'LW_IN_F',
 'lw_out': 'LW_OUT',
 'vp': 'na',
 'vpd': 'VPD_F',
 't_avg': 'TA_F',
 'ppt': 'P_F',
 'ws': 'WS_F',
 'Rn_qc_flag': 'NETRAD_QC',
 'G_qc_flag': 'G_F_MDS_QC',
 'LE_qc_flag': 'LE_F_MDS_QC',
 'H_qc_flag': 'H_F_MDS_QC',
 'sw_in_qc_flag': 'SW_IN_F_QC',
 'sw_out_qc_flag': 'SW_OUT_QC',
 'lw_in_qc_flag': 'LW_IN_F_QC',
 'lw_out_qc_flag': 'LW_OUT_QC',
 'vpd_qc_flag': 'VPD_F_QC',
 't_avg_qc_flag': 'TA_F_QC',
 'ppt_qc_flag': 'P_F_QC',
 'ws_qc_flag': 'WS_F_QC'}

In [8]:
# similarly for variable units, keys are internal names
d.units

{'Rn': 'w/m2',
 'G': 'w/m2',
 'LE': 'w/m2',
 'LE_user_corr': 'w/m2',
 'H': 'w/m2',
 'H_user_corr': 'w/m2',
 'sw_in': 'w/m2',
 'sw_out': 'w/m2',
 'sw_pot': 'w/m2',
 'lw_in': 'w/m2',
 'lw_out': 'w/m2',
 'vp': 'na',
 'vpd': 'hPa',
 't_avg': 'C',
 'ppt': 'mm',
 'ws': 'm/s'}

# Load date-indexed DataFrame using ``.df``

* Note, if there are variables stated in the config file but not found in the header of the input file, they will be filled with NaN (null) values in the dataframe.

In [9]:
# note all names of variables from your input file are maintained
d.df.head()

Unnamed: 0_level_0,TA_F,TA_F_QC,SW_IN_POT,SW_IN_F,SW_IN_F_QC,LW_IN_F,LW_IN_F_QC,VPD_F,VPD_F_QC,P_F,...,LW_OUT,LW_OUT_QC,G_F_MDS,G_F_MDS_QC,LE_F_MDS,LE_F_MDS_QC,LE_CORR,H_F_MDS,H_F_MDS_QC,H_CORR
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2009-01-01,2.803,0.0,186.71,123.108,0.0,261.302,0.0,1.919,0.0,0.0,...,,0.0,,1.0,67.1459,0.0,43.8414,20.3876,0.0,13.3116
2009-01-02,2.518,0.0,187.329,121.842,0.0,268.946,0.0,0.992,0.0,0.0,...,,0.0,,1.0,92.8616,0.0,60.9673,32.6505,0.0,21.4364
2009-01-03,5.518,0.0,188.008,124.241,0.0,268.004,0.0,2.795,0.0,0.0,...,,0.0,,1.0,75.8029,0.0,50.3151,20.0569,0.0,13.313
2009-01-04,-3.753,0.0,188.742,113.793,0.0,246.675,0.0,0.892,0.0,0.0,...,,0.0,,1.0,67.1459,0.0,45.0539,20.3876,0.0,13.6798
2009-01-05,-2.214,0.0,189.534,124.332,0.0,244.478,0.0,1.304,0.0,0.0,...,,0.0,,1.0,92.8616,0.0,62.6443,32.6505,0.0,22.026


In [10]:
# if you prefer to use the naming scheme of flux-data-qaqc do the following,
# note any QC variable names will retain their original names
d.df.rename(columns=d.inv_map).head()

Unnamed: 0_level_0,t_avg,t_avg_qc_flag,sw_pot,sw_in,sw_in_qc_flag,lw_in,lw_in_qc_flag,vpd,vpd_qc_flag,ppt,...,lw_out,lw_out_qc_flag,G,G_qc_flag,LE,LE_qc_flag,LE_user_corr,H,H_qc_flag,H_user_corr
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2009-01-01,2.803,0.0,186.71,123.108,0.0,261.302,0.0,1.919,0.0,0.0,...,,0.0,,1.0,67.1459,0.0,43.8414,20.3876,0.0,13.3116
2009-01-02,2.518,0.0,187.329,121.842,0.0,268.946,0.0,0.992,0.0,0.0,...,,0.0,,1.0,92.8616,0.0,60.9673,32.6505,0.0,21.4364
2009-01-03,5.518,0.0,188.008,124.241,0.0,268.004,0.0,2.795,0.0,0.0,...,,0.0,,1.0,75.8029,0.0,50.3151,20.0569,0.0,13.313
2009-01-04,-3.753,0.0,188.742,113.793,0.0,246.675,0.0,0.892,0.0,0.0,...,,0.0,,1.0,67.1459,0.0,45.0539,20.3876,0.0,13.6798
2009-01-05,-2.214,0.0,189.534,124.332,0.0,244.478,0.0,1.304,0.0,0.0,...,,0.0,,1.0,92.8616,0.0,62.6443,32.6505,0.0,22.026


## You can modify the dataframe or assign new columns or even assign a new dataframe within Python

In [11]:
x = d.df
x += 100
d.df = x
d.df *= 5
d.df.head()

Unnamed: 0_level_0,TA_F,TA_F_QC,SW_IN_POT,SW_IN_F,SW_IN_F_QC,LW_IN_F,LW_IN_F_QC,VPD_F,VPD_F_QC,P_F,...,LW_OUT,LW_OUT_QC,G_F_MDS,G_F_MDS_QC,LE_F_MDS,LE_F_MDS_QC,LE_CORR,H_F_MDS,H_F_MDS_QC,H_CORR
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2009-01-01,514.015,500.0,1433.55,1115.54,500.0,1806.51,500.0,509.595,500.0,500.0,...,,500.0,,505.0,835.7295,500.0,719.207,601.938,500.0,566.558
2009-01-02,512.59,500.0,1436.645,1109.21,500.0,1844.73,500.0,504.96,500.0,500.0,...,,500.0,,505.0,964.308,500.0,804.8365,663.2525,500.0,607.182
2009-01-03,527.59,500.0,1440.04,1121.205,500.0,1840.02,500.0,513.975,500.0,500.0,...,,500.0,,505.0,879.0145,500.0,751.5755,600.2845,500.0,566.565
2009-01-04,481.235,500.0,1443.71,1068.965,500.0,1733.375,500.0,504.46,500.0,500.0,...,,500.0,,505.0,835.7295,500.0,725.2695,601.938,500.0,568.399
2009-01-05,488.93,500.0,1447.67,1121.66,500.0,1722.39,500.0,506.52,500.0,500.0,...,,500.0,,505.0,964.308,500.0,813.2215,663.2525,500.0,610.13


# Apply user provided QC values to filter data

After creating a `Data` object the `Data.apply_qc_flags` method enables optional filtering of data that has a QC value below some threshold (default 0.5). 

Specify the names of the QC columns for each climate variable in your input data in the config file in a similar way to defining the names of each climate variable. For example if the name of your QC column for LE is 'LE_quality' then in your config you will need to specify:

```bash
latent_heat_flux_qc = LE_quality
```

Another way to utilize this functionality without explicitly stating the QC column names is to use names for the QC columns identical to the climate variable names with the addition of a **'_QC'** suffix. This is the case for several FLUXNET variables and therefore makes working with FLUXNET datasets a bit more convenient (although you could apply this to your custom data as well if you wanted to add a QC column manually. For example, gap filled latent energy data from FLUXNET is named **'LE_F_MDS'** and the QC name is **'LE_F_MDS_QC'**. In this case the `Data` instance automatically looks through each variable in your data and see if another variable exists with the **'_QC'** suffix in your input data (reading only the header line for efficiency). Note, if you explicitly assign the name of the QC column in your config but also have a column that follows the convention described in this paragraph then the `Data.apply_qc_flags` method will utilize the column you explicitly assigned.

In [12]:
# if you want to see the name of your LE (or other variables) in case you forgot
# you can use the variables attribute within flux-data-qaqc objects  
d.variables.get('LE')

'LE_F_MDS'

In [13]:
# the input data has a QC column for the LE variable
[v for v in d.header if 'LE_F_MDS' in v]

['LE_F_MDS', 'LE_F_MDS_QC']

In [14]:
# to see all variables that you have QC for (that follow the convention above)
d.qc_var_pairs

{'NETRAD': 'NETRAD_QC',
 'G_F_MDS': 'G_F_MDS_QC',
 'LE_F_MDS': 'LE_F_MDS_QC',
 'H_F_MDS': 'H_F_MDS_QC',
 'SW_IN_F': 'SW_IN_F_QC',
 'SW_OUT': 'SW_OUT_QC',
 'LW_IN_F': 'LW_IN_F_QC',
 'LW_OUT': 'LW_OUT_QC',
 'VPD_F': 'VPD_F_QC',
 'TA_F': 'TA_F_QC',
 'P_F': 'P_F_QC',
 'WS_F': 'WS_F_QC'}

# Apply the QC flags at multiple thresholds and plot results

In [15]:
# create fresh Data object because above example altered QC flags 
d = Data(config_path)
# filter out data based on various QC flag values (remove values where flag < threshold)
no_qc = d.df.LE_F_MDS.copy()
d.apply_qc_flags(threshold=0.25)
qc_25 = d.df.LE_F_MDS.copy()
d.apply_qc_flags(threshold=0.5)
qc_50 = d.df.LE_F_MDS.copy()
d.apply_qc_flags(threshold=0.75)
qc_75 = d.df.LE_F_MDS.copy()
d.apply_qc_flags(threshold=1)
qc_100 = d.df.LE_F_MDS.copy()

p = figure(x_axis_label='date', y_axis_label='FLUXNET LE with data removed based on QC flag')
p.line(no_qc.index, no_qc, color='red', legend="no QC", line_width=2)
p.line(no_qc.index, qc_25, color='orange', legend="QC=0.25", line_width=2)
p.line(no_qc.index, qc_50, color='green', legend="QC=0.5", line_width=2)
p.line(no_qc.index, qc_75, color='blue', legend="QC=0.75", line_width=2)
p.line(no_qc.index, qc_100, color='black', legend="QC=1.0", line_width=2)
p.x_range=Range1d(d.df.index[0], d.df.index[365])
p.xaxis.formatter = DatetimeTickFormatter(days="%d-%b-%Y")
show(p)

---
# Using the `QaQc` class to correct latent energy and sensible heat

* note, the method used for corrections will be documented soon

In [16]:
# read in data fresh and use it to create a QaQc instance
data = Data(config_path)
q = QaQc(data)

## If the input data is not at daily temporal frequency it will be resampled

In [17]:
# access the input data's initial temporal frequency, 'D' is daily
q.temporal_freq

'D'

## LE and H are not corrected yet 

In [18]:
q.corrected

False

In [19]:
# data has not changed...
q.df.head()

Unnamed: 0_level_0,TA_F,TA_F_QC,SW_IN_POT,SW_IN_F,SW_IN_F_QC,LW_IN_F,LW_IN_F_QC,VPD_F,VPD_F_QC,P_F,...,LW_OUT,LW_OUT_QC,G_F_MDS,G_F_MDS_QC,LE_F_MDS,LE_F_MDS_QC,LE_CORR,H_F_MDS,H_F_MDS_QC,H_CORR
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2009-01-01,2.803,0.0,186.71,123.108,0.0,261.302,0.0,1.919,0.0,0.0,...,,0.0,,1.0,67.1459,0.0,43.8414,20.3876,0.0,13.3116
2009-01-02,2.518,0.0,187.329,121.842,0.0,268.946,0.0,0.992,0.0,0.0,...,,0.0,,1.0,92.8616,0.0,60.9673,32.6505,0.0,21.4364
2009-01-03,5.518,0.0,188.008,124.241,0.0,268.004,0.0,2.795,0.0,0.0,...,,0.0,,1.0,75.8029,0.0,50.3151,20.0569,0.0,13.313
2009-01-04,-3.753,0.0,188.742,113.793,0.0,246.675,0.0,0.892,0.0,0.0,...,,0.0,,1.0,67.1459,0.0,45.0539,20.3876,0.0,13.6798
2009-01-05,-2.214,0.0,189.534,124.332,0.0,244.478,0.0,1.304,0.0,0.0,...,,0.0,,1.0,92.8616,0.0,62.6443,32.6505,0.0,22.026


In [20]:
# note the original columns
import pprint
pprint.pprint(', '.join(q.df.columns))

('TA_F, TA_F_QC, SW_IN_POT, SW_IN_F, SW_IN_F_QC, LW_IN_F, LW_IN_F_QC, VPD_F, '
 'VPD_F_QC, P_F, P_F_QC, WS_F, WS_F_QC, NETRAD, NETRAD_QC, SW_OUT, SW_OUT_QC, '
 'LW_OUT, LW_OUT_QC, G_F_MDS, G_F_MDS_QC, LE_F_MDS, LE_F_MDS_QC, LE_CORR, '
 'H_F_MDS, H_F_MDS_QC, H_CORR')


In [21]:
q.elevation, q.latitude # necessary for computing clear sky radiation

(611.0, 36.4267)

# Correct energy balance using `flux-data-qaqc` methods

Adjust turbulent heat fluxes (latent and sensible) to improve surface energy balance closure.

### Two methods currently implemented:
1. Energy Balance Ratio method (default), following the [FLUXNET documentation](https://fluxnet.fluxdata.org/data/fluxnet2015-dataset/data-processing/)
2. Bowen Ratio approach (forces closure)

Detailed descriptions of both methods can be found in the online documentation website (coming soon).

In [22]:
q.correct_data()
q.corrected

True

We did not specify the correction method so it defaulted to the Energy Balance Ratio method or 'ebr'.

In [23]:
q.corr_meth

'ebr'

In [24]:
# now we have original data plus adjusted variables, energy balance ratios, and others
pprint.pprint(', '.join(q.df.columns))

('TA_F, TA_F_QC, SW_IN_POT, SW_IN_F, SW_IN_F_QC, LW_IN_F, LW_IN_F_QC, VPD_F, '
 'VPD_F_QC, P_F, P_F_QC, WS_F, WS_F_QC, NETRAD, NETRAD_QC, SW_OUT, SW_OUT_QC, '
 'LW_OUT, LW_OUT_QC, G_F_MDS, G_F_MDS_QC, LE_F_MDS, LE_F_MDS_QC, LE_CORR, '
 'H_F_MDS, H_F_MDS_QC, H_CORR, rso, flux, energy, flux_user_corr, '
 'ebr_user_corr, ebr, flux_corr, ebr_corr, LE_corr, H_corr, ebr_5day_clim, '
 'ebc_cf, et, et_corr, et_user_corr')


In [25]:
q.df.head()

Unnamed: 0_level_0,TA_F,TA_F_QC,SW_IN_POT,SW_IN_F,SW_IN_F_QC,LW_IN_F,LW_IN_F_QC,VPD_F,VPD_F_QC,P_F,...,ebr,flux_corr,ebr_corr,LE_corr,H_corr,ebr_5day_clim,ebc_cf,et,et_corr,et_user_corr
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2009-01-01,2.803,0.0,186.71,123.108,0.0,261.302,0.0,1.919,0.0,0.0,...,,64.6751,1.35343,49.6115,15.0636,1.35343,0.738862,2.325789,1.71844,1.518571
2009-01-02,2.518,0.0,187.329,121.842,0.0,268.946,0.0,0.992,0.0,0.0,...,,93.4846,1.3426,69.1657,24.3189,1.3426,0.744825,3.215657,2.3951,2.111206
2009-01-03,5.518,0.0,188.008,124.241,0.0,268.004,0.0,2.795,0.0,0.0,...,,71.9661,1.33201,56.9085,15.0576,1.33201,0.750743,2.632413,1.97627,1.747296
2009-01-04,-3.753,0.0,188.742,113.793,0.0,246.675,0.0,0.892,0.0,0.0,...,,66.0384,1.32549,50.6573,15.3811,1.32549,0.754436,2.311445,1.74384,1.550945
2009-01-05,-2.214,0.0,189.534,124.332,0.0,244.478,0.0,1.304,0.0,0.0,...,,94.7221,1.32506,70.0812,24.6408,1.32506,0.754685,3.201323,2.41599,2.159608


# Radiative versus turbulent flux, before and after Energy Balance Ratio closure correction applied

In [26]:
# make copy of data for comparing to Bowen Ratio method
ebr_df = q.df.copy()
# plot
p = figure(x_axis_label='Energy (Rn - G)', y_axis_label='Flux (LE + H)')
p.circle(q.df.energy, q.df.flux, color='red', line_width=2, legend='initial')
p.circle(q.df.energy, q.df.flux_corr, color='blue', line_width=2, legend='corrected')
p.line(range(-30,190), range(-30,190), line_dash='dashed',legend='1:1 Line')
p.legend.location = "top_left"
show(p)

# Use the Bowen Ratio correction routine and compare results

Note that corrected data, i.e. LE, H, EBR, ET, etc. will be overwritten in the `QaQc.df` when running subsequent correction methods. Keep a copy of the initially corrected data if you want to compare results of multiple correction options. 

In [27]:
q.correct_data(meth='br')
q.corr_meth

'br'

In [28]:
p = figure(x_axis_label='Energy (Rn - G)', y_axis_label='Flux (LE + H)')
p.circle(q.df.energy, q.df.flux, color='red', line_width=2, legend='initial')
p.circle(q.df.energy, q.df.flux_corr, color='blue', line_width=2, legend='BR corrected')
p.circle(ebr_df.energy, ebr_df.flux_corr, color='black', line_width=2, legend='EBR corrected')
p.line(range(-30,190), range(-30,190), line_dash='dashed',legend='1:1 Line')
p.legend.location = "top_left"
show(p)

## Temporally aggregate to monthly data using sums for ET and P, and means for all others

In [29]:
q.monthly_df.head()

Unnamed: 0_level_0,flux_user_corr,br,WS_F,ebr,H_CORR,LE_F_MDS,ebr_corr,rso,energy,SW_IN_F,...,TA_F,LW_IN_F,G_F_MDS,H_F_MDS,br_user_corr,SW_IN_POT,et_user_corr,et_corr,P_F,et
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2009-01-31,74.692668,0.306513,3.534355,,17.626603,78.233868,,154.20478,,128.720355,...,1.424161,261.216323,,24.236697,0.305757,203.904032,61.197137,,3.879,83.899064
2009-02-28,82.272961,0.308216,3.828571,,19.408454,79.112686,,197.213823,,175.16675,...,6.423714,274.406607,12.974747,24.660911,0.30545,262.85725,61.178247,,12.824,76.989291
2009-03-31,79.708248,0.305254,4.35871,,18.794103,78.513126,,256.339885,,200.775032,...,10.815452,305.189258,12.975887,24.226029,0.305254,341.04129,65.908264,,50.739,84.954167
2009-04-30,82.4097,0.307787,4.482,,19.565137,79.06194,,313.753204,,245.731533,...,13.427233,323.534467,13.440414,24.612837,0.307787,414.747833,65.963432,,129.323,82.989125
2009-05-31,106.351555,0.267686,3.529613,,21.571916,104.509681,,352.136946,,248.396258,...,17.991774,358.490258,7.620833,26.615866,0.267686,464.501226,92.457238,,0.0,113.975303


# Alternatively, create a QaQc instance from a pandas.DataFrame using `QaQc.from_dataframe`

Be sure to have the main energy balance components in the dataframe at daily time steps the dataframe index should be a daily datetime index as well, they should be mapped to the following names used by `flux-data-qaqc`: 
* Rn, G, H, LE  

Otherwise you will not be able to run the energy balance correction routine, the example below shows that only the four energy balance components are needed to run the routine. In this case the variables we need are named from FLUXNET conventions: 
* NETRAD, G_F_MDS, H_F_MDS, LE_F_MDS

Therefore we need to create a dictionary that maps the first list used by `flux-data-qaqc` to the names we have in out input DataFrame, shown below.

**Note:**  we need to assign station elevation (m) and latitude (dec. degrees) which are normaly in the config file however this method gives the ability to use arbitrary daily time series data within Python.

In [30]:
data = Data(config_path)
# using the same dataframe here but this can be any with the correct variable names
df = data.df
# drop all other variables except those needed to demonstrate
df = df.drop(
    [c for c in df.columns if not c in ['NETRAD', 'G_F_MDS', 'H_F_MDS', 'LE_F_MDS']], 
    axis=1
)
rename_dict = {
    'Rn' : 'NETRAD', 
    'G' : 'G_F_MDS', 
    'H' : 'H_F_MDS', 
    'LE' : 'LE_F_MDS'
}
q = QaQc.from_dataframe(
    df, 
    site_id='US-AR1', 
    elev_m=611, 
    lat_dec_deg=36.4267, 
    var_dict=rename_dict
)
q.df.head()

Unnamed: 0_level_0,NETRAD,G_F_MDS,LE_F_MDS,H_F_MDS
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2009-01-01,,,67.1459,20.3876
2009-01-02,,,92.8616,32.6505
2009-01-03,,,75.8029,20.0569
2009-01-04,,,67.1459,20.3876
2009-01-05,,,92.8616,32.6505


If you are not sure on the naming conventions of `flux-data-qaqc` you can find the internal names by viewing the class attribute `Data.variable_names_dict`:

In [31]:
Data.variable_names_dict

{'date': 'datestring_col',
 'year': 'year_col',
 'month': 'month_col',
 'day': 'day_col',
 'Rn': 'net_radiation_col',
 'G': 'ground_flux_col',
 'LE': 'latent_heat_flux_col',
 'LE_user_corr': 'latent_heat_flux_corrected_col',
 'H': 'sensible_heat_flux_col',
 'H_user_corr': 'sensible_heat_flux_corrected_col',
 'sw_in': 'shortwave_in_col',
 'sw_out': 'shortwave_out_col',
 'sw_pot': 'shortwave_pot_col',
 'lw_in': 'longwave_in_col',
 'lw_out': 'longwave_out_col',
 'vp': 'vap_press_col',
 'vpd': 'vap_press_def_col',
 't_avg': 'avg_temp_col',
 'ppt': 'precip_col',
 'ws': 'wind_spd_col'}

In [32]:
# note these names are mapped to your variable names in Data, QaQc, and Plot objects for example:
q.inv_map

{'NETRAD': 'Rn', 'G_F_MDS': 'G', 'H_F_MDS': 'H', 'LE_F_MDS': 'LE'}

## Compare monthly energy balance closure ratio calculated from initial and Energy Balance Ratio corrected LE and H

**Note:** when you access monthly_df before correcting the energy balance, the corections are run automatically and added to the QaQc instance as shown below.

In [33]:
p = figure(x_axis_label='date', y_axis_label='Energy Balance Ratio')
p.line(q.monthly_df.index, q.monthly_df['ebr'], color='red', legend="Raw", line_width=2)
p.line(q.monthly_df.index, q.monthly_df['ebr_corr'], legend="Corrected", line_width=2)
p.xaxis.formatter = DatetimeTickFormatter(days="%d-%b-%Y")
show(p)

## Save daily and monthly time series of input and computed variables

* Output by default is saved to *output* directory and files have the [site_id] prefix in this case *US-AR1*
* **Note:** the `QaQc.write` method will run energy balance corrections and produce the corrected and raw versions of the energy balance closure ratio at monthly and daily frequencies if they were not previously produced, i.e. if `QaQc.correct_data` and `QaQc.monthly_df` have not yet been called.

In [34]:
q.write()

/home/john/flux-data-qaqc/examples/Basic_usage/output does not exist, creating directory


In [35]:
# view contents of newly created output directory
for f in q.out_dir.glob('*'): print(f)

/home/john/flux-data-qaqc/examples/Basic_usage/output/US-AR1_monthly_data.csv
/home/john/flux-data-qaqc/examples/Basic_usage/output/US-AR1_daily_data.csv


as shown above the output directory is a system independent `pathlib.Path` absolute file path that is added as an instance attribute

In [36]:
q.out_dir

PosixPath('/home/john/flux-data-qaqc/examples/Basic_usage/output')

Alternatively you can name your own output directory and this will be passed on to `Plot` routines so that the saved plots go into the same directory:

In [37]:
# print output folder and all files in it
q.write(out_dir='my_output_folder')
print(q.out_dir,'\n','\n'.join([str(f) for f in q.out_dir.glob('*')]))

/home/john/flux-data-qaqc/examples/Basic_usage/my_output_folder does not exist, creating directory
/home/john/flux-data-qaqc/examples/Basic_usage/my_output_folder 
 /home/john/flux-data-qaqc/examples/Basic_usage/my_output_folder/US-AR1_monthly_data.csv
/home/john/flux-data-qaqc/examples/Basic_usage/my_output_folder/US-AR1_daily_data.csv


# Using the `Plot` class to create multiple comparison and time series plots for QA/QC and data validation

Currently output cells are not shown pending updates being made to the `Plot` module.

In [38]:
# create a Plot object
config_path = 'fluxnet_config.ini'
d = Data(config_path)
d.apply_qc_flags(threshold=0.5)
q = QaQc(d)
plt = Plot(q)

In [39]:
# generate plots
plt.generate_plots()


Vapor Pressure graph missing a variable.

Soil Moisture scalable plot missing a variable.

Soil heat flux scalable plot missing a variable.


In [40]:
# access the plot file path
plt.plot_file

PosixPath('/home/john/flux-data-qaqc/examples/Basic_usage/output/US-AR1_plots.html')

## The plots generated by `flux-data-qaqc` are all saved to a single HTML file with [bokeh](https://bokeh.pydata.org/en/latest/) subplots

* If variables needed for plots are missing some plots will not appear
* In this case 'user_corr' variables are precorrected versions of LE and H provided by the FLUXNET 2015 dataset

In [41]:
# view outplot plots within Jupyter notebook
from IPython.display import HTML
HTML(filename=plt.plot_file)