# Basic Usage

This example demonstrates usage of the `flux-data-qaqc` Python package for management, analysis, and visualization of eddy covariance time series data. In this tutorial, the most important features of the Python API are demonstrated, it is recommended to view the [Installation](https://flux-data-qaqc.readthedocs.io/en/latest/install.html#installation) and [Configuration Options](https://flux-data-qaqc.readthedocs.io/en/latest/advanced_config_options.html#configuration-options-and-caveats) tutorials before this one. 

**Note:** currently, the software does not include a command line interface therefore to use the software you must use Python. However, you will see that to do a basic workflow you will not need to write more than a few (5-10) lines of code and can simply follow the templates given here to make custom scripts.

## Description of data

The data for this example comes from the "Twitchell Alfalfa" AmeriFlux eddy
covariance flux tower site in California. The site is located in alfalfa fields and exhibits a mild Mediterranean climate with dry and hot summers, for more information on this site or to download data click [here](https://ameriflux.lbl.gov/sites/siteinfo/US-Tw3). 

In [1]:
%load_ext autoreload
%autoreload 2
from fluxdataqaqc import Data, QaQc, Plot
from bokeh.plotting import figure, show
from bokeh.models.formatters import DatetimeTickFormatter
from bokeh.models import LinearAxis, Range1d
from bokeh.io import output_notebook
output_notebook()

## Loading input

The loading and management of input climatic data and metadata from a config.ini file is done using the ``fluxdataqaqc.Data`` object. In a nutshell, a ``Data`` object is created from a properly formatted config file (see [Setting up a config file](https://flux-data-qaqc.readthedocs.io/en/latest/advanced_config_options.html?#setting-up-a-config-file)) and has tools for parsing input climate data, averaging input climate time series, accessing/managing metadata, flag-based data filtering, and creating interactive visualizations of input data.  

There is only one argument to create a Data object, the path to the config.ini file:

In [3]:
config_path = 'US-Tw3_config.ini'
d = Data(config_path)

#### Attributes of a Data object

Below are some of the useful attributes of the ``Data`` object and how they may be used.

The full path to the config.ini file that was used to create the ``Data`` instance can be accessed, note that it will return a system-depenedent ``pathlib.Path`` object. E.g. on my Linux machine the path is:

In [13]:
d.config_file

PosixPath('/home/john/flux-data-qaqc/examples/Basic_usage/US-Tw3_config.ini')

On a Windows machine the path will have the appropriate backslashes.

Similarly to access the climate time series file:

In [14]:
d.climate_file

PosixPath('/home/john/flux-data-qaqc/examples/Basic_usage/AMF_US-Tw3_BASE_HH_5-5.csv')

The ``Data.config`` attribute is a ``configparser.ConfigParser`` object, it allows you to access metadata and data in the config file in multiple ways and to modify them. In ``flux-data-qaqc`` it is mainly used for accessing information about the input data.

In [10]:
# get a list of all entries in the METADATA section of the config.ini
d.config.items('METADATA') # access the DATA section the same way

[('climate_file_path', 'AMF_US-Tw3_BASE_HH_5-5.csv'),
 ('station_latitude', '38.1159'),
 ('station_longitude', '-121.6467'),
 ('station_elevation', '-9.0'),
 ('missing_data_value', '-9999'),
 ('skiprows', '2'),
 ('date_parser', '%Y%m%d%H%M'),
 ('site_id', 'US-Tw3'),
 ('country', 'USA'),
 ('doi_contributor_name', 'Dennis Baldocchi'),
 ('doi_contributor_role', 'Author'),
 ('doi_contributor_email', 'baldocchi@berkeley.edu'),
 ('doi_contributor_institution', 'University of California, Berkeley'),
 ('doi_organization', 'California Department of Water Resources'),
 ('doi_organization_role', 'Sponsor'),
 ('flux_measurements_method', 'Eddy Covariance'),
 ('flux_measurements_variable', 'CO2'),
 ('flux_measurements_operations', 'Continuous operation'),
 ('site_name', 'Twitchell Alfalfa'),
 ('igbp', 'CRO'),
 ('igbp_comment',
  'alfalfa is a fast growing leguminous crop raised for animal feed of low stature.  It is planted in rows and typically reaches 60-70 cm in height prior to harvest.'),
 ('la

A useful method is the ``configparser.ConfigParser.get`` which takes the section of the config file and the "option" and returns the value:

In [7]:
d.config.get(section='METADATA', option='site_name')

'Twitchell Alfalfa'

In [8]:
# section and option are optional keywords
d.config.get('METADATA', 'site_name')

'Twitchell Alfalfa'

**Tip**: If you are unsure if an entry or option exists in the config file, use the ``fallback`` keyword argument

In [9]:
# section and option are optional keywords
d.config.get('METADATA', 'site name', fallback='na')

'na'

Some metadata entries are added as ``Data`` attributes for easier access as they are used in multiple ways later, these include:
    
* site_id$^*$ 
* elevation$^*$
* latitude$^*$
* longitude$^*$
* na_val
* qc_threshold
* qc_flag

$^*$ are mandatory **METADATA** entries in the config file, see [Setting up a Config File](https://flux-data-qaqc.readthedocs.io/en/latest/advanced_config_options.html#setting-up-a-config-file) for further explanation.

View all the columns as found in the header row of the input time series climate file.

In [11]:
d.header

array(['TIMESTAMP_START', 'TIMESTAMP_END', 'CO2', 'H2O', 'CH4', 'FC',
       'FCH4', 'FC_SSITC_TEST', 'FCH4_SSITC_TEST', 'G', 'H', 'LE',
       'H_SSITC_TEST', 'LE_SSITC_TEST', 'WD', 'WS', 'USTAR', 'ZL', 'TAU',
       'MO_LENGTH', 'V_SIGMA', 'W_SIGMA', 'TAU_SSITC_TEST', 'PA', 'RH',
       'TA', 'VPD_PI', 'T_SONIC', 'T_SONIC_SIGMA', 'SWC_1_1_1',
       'SWC_1_2_1', 'TS_1_1_1', 'TS_1_2_1', 'TS_1_3_1', 'TS_1_4_1',
       'TS_1_5_1', 'NETRAD', 'PPFD_DIF', 'PPFD_IN', 'PPFD_OUT', 'SW_IN',
       'SW_OUT', 'LW_IN', 'LW_OUT', 'P', 'FC_PI_F', 'RECO_PI_F',
       'GPP_PI_F', 'H_PI_F', 'LE_PI_F'], dtype='<U15')

**Note:** all of the header columns will not necessarily be loaded, only those specified in the config file. Also, no data other than the header line is loaded into memory when creating a ``Data`` object, the time series data is only loaded when calling ``Data.df`` for increased efficiency for some workflows. 

This document is under development, more will be added soon.