# overview

This example shows how to use `flux-data-qaqc` with a custom climate input file that is not from FLUXNET. The only real differences lie in the config file declarations therefore the entire workflow from the FLUXNET example notebook will work just the same. That notebook is recommended to be viewed for all general use whereas this one puts more focus on the formating rules of input data itself. 

---
The data used herein is provided with the software package and can be downloaded [here](https://github.com/Open-ET/flux-data-qaqc/blob/master/examples/), it happens to be from a USGS eddy covariance flux tower for Dixie Valey Dense Vegetation. Details on the data can be found in this [report](https://pubs.usgs.gov/pp/1805/pdf/pp1805.pdf).

In [10]:
%load_ext autoreload
%autoreload 2
from fluxdataqaqc import Data, QaQc, Plot
from bokeh.plotting import figure, show
from bokeh.models.formatters import DatetimeTickFormatter
from bokeh.io import output_notebook
output_notebook()

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# seting up a config file 
---
The config file needed for using `flux-data-qaqc` has two major sections:
1. METADATA
2. DATA

Currently in **METADATA**, the "station_elevation" (expected in meters) and latitude (decimal degrees) fields are used to calculate clear sky potential solar radiation. The item "missing_data_value" is used to correctly parse missing data in the climate time series. Other metadata is not used currently but may be useful for custom workflows, more on this later.

The **DATA** section of the config file is where you specify climate variables and their units. There are two major functionalities in `flux-data-qaqc`, first, correcting surface energy balance by adjusting latent energy and sensible heat fluxes. Second, it serves as a robust way to read in different time series data and simply plot their daily and monthly time series. The latter is under development but generally speaking the module is able to generate useful interactive plots of arbitrary time series data. 

Here is a list of all the "expected" climate variable names in the **DATA** section:

In [11]:
config_path = 'USGS_config.ini'
d = Data(config_path)
for each in d.config.items('DATA'):
    print(each[0])

datestring_col
year_col
month_col
day_col
net_radiation_col
net_radiation_units
ground_flux_col
ground_flux_units
latent_heat_flux_col
latent_heat_flux_units
latent_heat_flux_corrected_col
latent_heat_flux_corrected_units
sensible_heat_flux_col
sensible_heat_flux_units
sensible_heat_flux_corrected_col
sensible_heat_flux_corrected_units
shortwave_in_col
shortwave_in_units
shortwave_out_col
shortwave_out_units
shortwave_pot_col
shortwave_pot_units
longwave_in_col
longwave_in_units
longwave_out_col
longwave_out_units
vap_press_col
vap_press_units
vap_press_def_col
vap_press_def_units
avg_temp_col
avg_temp_units
precip_col
precip_units
wind_spd_col
wind_spd_units


**Note:** You may not have any of the expected climate variables in your data, and specify them all as missing ('na') however the result will be an output dataset of null values, and no plots will be produced!

## create a ``Data`` object to read in time series data using a config file

In [12]:
d = Data(config_path)
# you can access all metadata and datain the config file as a list
d.config.items('METADATA') # can access the DATA section the same way

[('climate_file_path', 'raw_subhour_DVDV_10.xlsx'),
 ('site_id', 'DVD_10'),
 ('station_latitude', '39.762511'),
 ('station_longitude', '-117.960100'),
 ('station_elevation', '1046'),
 ('anemometer_height', '2.72'),
 ('missing_data_value', '-9999')]

In [13]:
# or as a dict, e.g. to access specific values by name
d.config.get('METADATA','station_elevation')

'1046'

In [14]:
# path to climate time series input and config files
print(d.climate_file, '\n', d.config_file)

/home/john/flux-data-qaqc/examples/raw_subhour_DVDV_10.xlsx 
 /home/john/flux-data-qaqc/examples/USGS_config.ini


In [15]:
# view full header of input time series file
d.header

Index(['Timestamp', 'ET, in.', 'Net radiation, W/m2', 'Latent-heat flux, W/m2',
       'Sensible-heat flux, W/m2', 'Soil-heat flux, W/m2'],
      dtype='object')

# load date-indexed DataFrame using ``.df``

* note, if there are variables stated in the config file but not found in the header of the input file, they will be filled with NaN (null) values in the dataframe

In [16]:
d.df.head()

Unnamed: 0_level_0,"Net radiation, W/m2","Latent-heat flux, W/m2","Sensible-heat flux, W/m2","Soil-heat flux, W/m2"
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2009-10-01 00:00:00,-54.024218,0.70761,0.95511,-40.423659
2009-10-01 00:30:00,-51.077447,0.04837,-1.24935,-33.353833
2009-10-01 01:00:00,-50.994389,0.68862,1.91101,-43.179005
2009-10-01 01:30:00,-51.350324,-1.85829,-15.4944,-40.862015
2009-10-01 02:00:00,-51.066042,-1.80485,-19.1357,-39.809369


## you can now modify or assign new data using all tools available in Pandas

Most examples that focus on using the `QaQc` and `Plot` class are shown in the [FLUXNET Jupyter notebook](https://github.com/Open-ET/flux-data-qaqc/blob/master/examples/FLUXNET_2015_example.ipynb) and therefore the example plot below is to show that you can skip many steps there that explain features and get right to plotting corrected energy balance closure ratios. 

Note, the `QaQc` class resamples time series data to daily temporal frequency automatically, and the data is loaded into memory and correction routines are run automaticaly when you access the monthly time series data as shown below.

In [17]:
q = QaQc(d)
p = figure(x_axis_label='date', y_axis_label='energy balance closure ratio')
p.line(q.monthly_df.index, q.monthly_df['ebc_reg'], color='red', legend="Raw", line_width=2)
p.line(q.monthly_df.index, q.monthly_df['ebc_adj'], legend="Corrected", line_width=2)
p.xaxis.formatter = DatetimeTickFormatter(days="%d-%b-%Y")
show(p)

The input data temporal frequency appears to be less than daily, it will be resampled to daily.


# use the `Plot` object to create multiple validation plots

* note the input data is missing many possible variables that are used for plots if given but in this input data they are not existent, including a user provided 'corrected' version of latent energy and sensible heat flux therefore they are not shown on the scatter and time series plots for the energy balance closure ratio and time series

In [50]:
p = Plot(q)
p.generate_plots()

# view outplot plots within Jupyter notebook
from IPython.display import HTML
HTML(filename=p.plot_file)


Net Radiation components graph missing a variable.

Measured vs Potential SW graph missing a variable.

Temperature graph missing a variable.

Vapor Pressure graph missing a variable.

Windspeed graph missing a variable.

Precipitation graph missing a variable.
