# overview

This example shows how to use `flux-data-qaqc` with a custom climate input file that is not from FLUXNET. The only real differences lie in the config file declarations therefore the entire workflow from the FLUXNET example notebook will work just the same. That notebook is recommended to be viewed for all general use whereas this one puts more focus on the formating rules of input data itself. 

---
The data used herein is provided with the software package and can be downloaded [here](https://github.com/Open-ET/flux-data-qaqc/blob/master/examples/), it happens to be from a USGS eddy covariance flux tower for Dixie Valey Dense Vegetation. Details on the data can be found in this [report](https://pubs.usgs.gov/pp/1805/pdf/pp1805.pdf).

In [1]:
%load_ext autoreload
%autoreload 2
import sys
# currently not installable so import from parent dir
sys.path.append('..')
from fluxdataqaqc.data import Data
from fluxdataqaqc.qaqc import QaQc 
from bokeh.plotting import figure, show
from bokeh.models.formatters import DatetimeTickFormatter
from bokeh.io import output_notebook
output_notebook()

# seting up a config file 
---
The config file needed for using `flux-data-qaqc` has two major sections:
1. METADATA
2. DATA

Currently in **METADATA**, the "station_elevation" (expected in meters) and latitude (decimal degrees) fields are used to calculate clear sky potential solar radiation. The item "missing_data_value" is used to correctly parse missing data in the climate time series. Other metadata is not used currently but may be useful for custom workflows, more on this later.

The **DATA** section of the config file is where you specify climate variables and their units. There are two major functionalities in `flux-data-qaqc`, first, correcting surface energy balance by adjusting latent energy and sensible heat fluxes. Second, it serves as a robust way to read in different time series data and simply plot their daily and monthly time series. The latter is under development but generally speaking the module is able to generate useful interactive plots of arbitrary time series data. 

Here is a list of all the "expected" climate variable names in the **DATA** section:

In [21]:
config_path = 'USGS_config.ini'
d = Data(config_path)
for each in d.config.items('DATA'):
    print(each[0])

datestring_col
year_col
month_col
day_col
net_radiation_col
net_radiation_units
ground_flux_col
ground_flux_units
latent_heat_flux_col
latent_heat_flux_units
latent_heat_flux_corrected_col
latent_heat_flux_corrected_units
sensible_heat_flux_col
sensible_heat_flux_units
sensible_heat_flux_corrected_col
sensible_heat_flux_corrected_units
shortwave_in_col
shortwave_in_units
shortwave_out_col
shortwave_out_units
shortwave_pot_col
shortwave_pot_units
longwave_in_col
longwave_in_units
longwave_out_col
longwave_out_units
vap_press_col
vap_press_units
vap_press_def_col
vap_press_def_units
avg_temp_col
avg_temp_units
precip_col
precip_units
wind_spd_col
wind_spd_units


**Note:** You may not have any of the expected climate variables in your data, and specify them all as missing ('na') however the result will be an output dataset of null values, and no plots will be produced!

## create a ``Data`` object to read in time series data using a config file

In [22]:
d = Data(config_path)
# you can access all metadata and datain the config file as a list
d.config.items('METADATA') # can access the DATA section the same way

[('climate_file_path', 'raw_subhour_DVDV_10.xlsx'),
 ('station_latitude', '39.762511'),
 ('station_longitude', '-117.960100'),
 ('station_elevation', '1046'),
 ('anemometer_height', '2.72'),
 ('missing_data_value', '-9999')]

In [23]:
# or as a dict, e.g. to access specific values by name
d.config.get('METADATA','station_elevation')

'1046'

In [24]:
# path to climate time series input and config files
print(d.climate_file, '\n', d.config_file)

/home/john/flux-data-qaqc/examples/raw_subhour_DVDV_10.xlsx 
 /home/john/flux-data-qaqc/examples/USGS_config.ini


In [25]:
# view full header of input time series file
d.header

Index(['Timestamp', 'ET, in.', 'Net radiation, W/m2', 'Latent-heat flux, W/m2',
       'Sensible-heat flux, W/m2', 'Soil-heat flux, W/m2'],
      dtype='object')

# load date-indexed DataFrame using ``.df``

* note, if there are variables stated in the config file but not found in the header of the input file, they will be filled with NaN (null) values in the dataframe

In [8]:
d.df.head()

Unnamed: 0_level_0,Rn,LE,H,G
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2009-10-01 00:00:00,-54.024218,0.70761,0.95511,-40.423659
2009-10-01 00:30:00,-51.077447,0.04837,-1.24935,-33.353833
2009-10-01 01:00:00,-50.994389,0.68862,1.91101,-43.179005
2009-10-01 01:30:00,-51.350324,-1.85829,-15.4944,-40.862015
2009-10-01 02:00:00,-51.066042,-1.80485,-19.1357,-39.809369


## you can now modify or assign new data using all tools available in Pandas

---
# using the `QaQc` class to correct latent energy and sensible heat

* note, the method used for corrections will be documented soon

In [9]:
q = QaQc(d)

The input data temporal frequency appears to be less than daily, it will be resampled to daily.


### note that the input data in this example is at hourly frequency, when creating a `QAQC` object the temporal frequency is resampled to daily

In [10]:
# data is not corrected yet:
q.corrected

False

In [11]:
# data has not changed...
q.df.head()

Unnamed: 0_level_0,Rn,G,LE,H
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2009-10-01,98.841967,-13.229871,7.908439,72.455098
2009-10-02,89.651032,-6.220552,8.303841,58.284511
2009-10-03,95.52948,-1.697661,8.992836,64.243684
2009-10-04,91.64082,-8.383377,6.542247,69.620512
2009-10-05,65.615026,-2.021651,5.588513,39.688535


In [13]:
q.elevation, q.latitude # necessary for computing clear sky potential radiation

(611, 36.4267)

# correct energy balance using `flux-data-qaqc` methods

In [15]:
q.correct_data()
q.corrected

True

In [16]:
# now we have original data plus adjusted variables, energy balance ratios, and others
pprint.pprint(', '.join(q.df.columns))

('Rn, G, LE, H, energy, flux, bowen_ratio, LE_adj, H_adj, flux_adj, et_reg, '
 'et_adj, ebc_reg, ebc_adj, rso')


In [17]:
q.df.head()

Unnamed: 0_level_0,Rn,G,LE,H,energy,flux,bowen_ratio,LE_adj,H_adj,flux_adj,et_reg,et_adj,ebc_reg,ebc_adj,rso
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2009-10-01,98.841967,-13.229871,7.908439,72.455098,112.071838,80.363537,9.161744,11.0288,101.043039,112.071838,0.273316,0.381155,0.717072,1.0,243.228821
2009-10-02,89.651032,-6.220552,8.303841,58.284511,95.871584,66.588352,7.018982,11.95558,83.916004,95.871584,0.286981,0.413185,0.694558,1.0,241.169559
2009-10-03,95.52948,-1.697661,8.992836,64.243684,97.22714,73.236521,7.143873,11.938685,85.288455,97.22714,0.310792,0.412601,0.753252,1.0,239.111036
2009-10-04,91.64082,-8.383377,6.542247,69.620512,100.024197,76.162759,10.641682,8.591903,91.432294,100.024197,0.2261,0.296936,0.761443,1.0,237.054038
2009-10-05,65.615026,-2.021651,5.588513,39.688535,67.636676,45.277048,7.101805,8.348346,59.28833,67.636676,0.193139,0.288519,0.669416,1.0,234.999354


In [18]:
# view time series of select variable
p = figure(x_axis_label='date', y_axis_label='net radiation w/m2')
p.line(q.df.index, q.df.Rn, line_width=2)
p.xaxis.formatter = DatetimeTickFormatter(days="%d-%b-%Y")
show(p)

## temporally aggregate to monthly data using sums for ET and P, and means for all others

In [19]:
q.monthly_df.head()

Unnamed: 0_level_0,LE_adj,bowen_ratio,H_adj,flux,rso,flux_adj,energy,ebc_adj,ebc_reg,LE,H,Rn,G,et_reg,et_adj
month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
2009-10-31,20.425891,4.424525,60.916322,56.217613,213.253334,81.342213,80.497421,1.010495,0.698378,14.148116,42.069497,74.166506,-6.330915,,
2009-11-30,6.083669,11.336047,46.805297,34.648079,162.869272,52.888965,52.389511,1.009533,0.661355,3.930365,30.717714,43.967036,-8.422475,,
2009-12-31,9.685065,3.338348,21.958787,17.861036,140.694255,31.49546,31.804789,0.99494,0.561583,5.167479,12.693557,25.094329,-6.71046,,
2010-01-31,13.924331,1.546328,22.126359,12.14252,154.20478,35.850646,40.067152,0.899757,0.303054,5.093168,7.049352,35.738479,-4.328673,,
2010-02-28,33.068004,1.337181,43.872164,50.207264,197.213823,76.940168,76.038656,1.011856,0.660286,22.060393,28.14687,74.06554,-1.973116,,


## compare monthly energy balance correction ratio with raw data and corrected

In [20]:
p = figure(x_axis_label='date', y_axis_label='energy balance correction ratio')
p.line(q.monthly_df.index, q.monthly_df['ebc_reg'], color='red', legend="Raw", line_width=2)
p.line(q.monthly_df.index, q.monthly_df['ebc_adj'], legend="Corrected", line_width=2)
p.xaxis.formatter = DatetimeTickFormatter(days="%d-%b-%Y")
show(p)