# HydroGeoSines

## A general data processing workflow

This notebook demonstrates the general data handling capabilities of HydroGeoSines. The standard workflow for loading, processing and analysing data, as well as exporting and visualizing results is demonstrated on a simple example dataset. We show how the Site object and its methods can be used to store data and how the data processing is handled via the Processing object and its methods.

### Import HGS
Currently, the HydroGeoSines is not fully implemented as an installable package. Instead. we have to move to the parent directory, to import the package.

In [1]:
import os
os.chdir("../../")
print("Current Working Directory " , os.getcwd())

# Load the HGS package
import hydrogeosines as hgs

Current Working Directory  D:\Workspaces\GitHub\HydroGeoSines


In [2]:
# and other packages used in this tutorial
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### The Site object
Typically, we have time series data of groundwater head measurements from a couple of different loggers that are located at a site of interest. Similarly, we aggreate all our data records into a hgs.Site object. The Site object has a geo-location that attribute to add information on longitude, latitude and height . This is can later be used to calculate site specific Earth Tide records.  

In [3]:
# Create a Site object
example_site = hgs.Site('example', geoloc=[141.762065, -31.065781, 160])
print(example_site)

<hydrogeosines.models.site.Site object at 0x0000026438D51E88>


### Load Data
#### Groundwater head records
The import_csv method of the Site object can be used to import the three standard input categories "GW", "BP" and "ET" (groundwater, barometric pressure, and earth tides). In general, the hgs package is implemented in SI units. By passing a *unit* argument for your input dataset, units are automatically converted. 

In the present example, a dataset with three groundwater records is loaded. The location names are explicitly set as "Loc_A", "Loc_B" and "Loc_C" using the loc_names parameter, because there are no column headers in the data set (header = None).

In [4]:
# Load all our data attributed to the Site
example_site.import_csv('tests/data/notebook/GW_record.csv', 
                        input_category=["GW"]*3, 
                        utc_offset=10, unit=["m"]*3,
                        loc_names = ["Loc_A","Loc_B","Loc_C"], header = None,
                        check_dublicates=True) 

A new time series was added ...
No dublicates being found ...


The Site object now has the groundwater records added to its data attribute. It is stored as a Pandas DataFrame with a set of predefined column names:
 - **datetime:** the first column of every input data record should be a datetime convertible format
 - **category:** the data category (GW,BP or ET)
 - **location:** either infered from the header or defined by the loc_names parameter of the import method
 - **part:** pre-set to "all". For non-uniform data records, the data set is later split into uniform parts
 - **unit:** unit (SI after import)
 - **value** 

In [4]:
example_site.data.head(3)

Unnamed: 0,datetime,category,location,part,unit,value
0,2000-12-31 14:00:30+00:00,GW,Loc_A,all,m,7.017
1,2000-12-31 14:05:30+00:00,GW,Loc_A,all,m,7.017
2,2000-12-31 14:10:30+00:00,GW,Loc_A,all,m,7.016


#### Barometric pressure records
The import of barometric pressure records is similar to the groundwater head import. Only "BP" needs to be passed as an argument to the "category" parameter. Setting the *how* parameter to "all", the Site data attribute is updated and the BP record is added to the previously imported GW data.

In [5]:

example_site.import_csv('tests/data/notebook/BP_record.csv', 
                        input_category="BP", 
                        utc_offset=10, unit="m", 
                        loc_names = "Baro",
                        header = None,
                        how="add", check_dublicates=True) 

A new time series was added ...
No dublicates being found ...


In [6]:
example_site.data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 174973 entries, 0 to 174972
Data columns (total 6 columns):
 #   Column    Non-Null Count   Dtype              
---  ------    --------------   -----              
 0   datetime  174973 non-null  datetime64[ns, UTC]
 1   category  174973 non-null  object             
 2   location  174973 non-null  object             
 3   part      174973 non-null  object             
 4   unit      174973 non-null  object             
 5   value     87699 non-null   float64            
dtypes: datetime64[ns, UTC](1), float64(1), object(4)
memory usage: 8.0+ MB


### The Processing object
The Processing object enables easy access to the hgs methods for data pre-processing and data analysis. These include methods for calculating barometric efficiencies, corrected groundwater heads or extracting harmonic components from records.

In [7]:
# Create a Processing object of example site
process_example = hgs.Processing(example_site)

After instantiating the Processing object, we can simply run the desired method, which returns a new object containing the method results. In this case, we want to compute all available time domain barometric efficiencies (BE) available in the BE_time method. 

The BE_time methods requires our data to be uniformly sampled. Thus, preprocessing steps are applied to the data of the Site object. First the groundwater head measurements are resampled, interpolated and if necessary split into sub-parts of uniform sampling. Then the BP records are aligned with the GW data. Then the barometric efficiencies are calculated for every location and part individually.

In [8]:
# Test the BE Time methods
BE_results  = process_example.BE_time(method="all")


Processing BE_time method...
9.84 % of the 'GW' data at 'Loc_A_all' was interpolated due to gaps < 3600s!
9.67 % of the 'GW' data at 'Loc_B_all' was interpolated due to gaps < 3600s!
9.76 % of the 'GW' data at 'Loc_C_all' was interpolated due to gaps < 3600s!
Data of the category 'GW' is regularly sampled now!
6.26 % of the 'BP' data at 'Baro_all' was interpolated due to gaps < 3600s!
0.00 % of the 'GW' data at 'Loc_A_1' was interpolated due to gaps < 3600s!
0.00 % of the 'GW' data at 'Loc_B_1' was interpolated due to gaps < 3600s!
0.00 % of the 'GW' data at 'Loc_C_1' was interpolated due to gaps < 3600s!
0.00 % of the 'GW' data at 'Loc_C_2' was interpolated due to gaps < 3600s!
The groundwater (GW) and barometric pressure (BP) data is now aligned. There is now exactly one BP for every GW entry!
Successfully calculated using method 'all' on GW data from '('Loc_A', '1')'!
Successfully calculated using method 'all' on GW data from '('Loc_A', '2')'!
Successfully calculated using method '

BE_results now contains a nested dictionary for the BE_time method results. The dictionary labels correspond to the name of the location, its sub-parts and the individual BE methods used on the data.

In [9]:
print(BE_results)

{'be_time': {'Loc_A': {'2': {'BP': array([-0.00101974, -0.10095468,  0.10503366, ...,  0.        ,
       -0.00305923,  0.        ]), 'GW': array([-0.001,  0.004, -0.004, ..., -0.001, -0.003,  0.006]), 'dt': 7064    2001-03-01 07:20:00+00:00
7065    2001-03-01 07:25:00+00:00
7066    2001-03-01 07:30:00+00:00
7067    2001-03-01 07:35:00+00:00
7068    2001-03-01 07:40:00+00:00
                   ...           
15491   2001-03-30 13:35:00+00:00
15492   2001-03-30 13:40:00+00:00
15493   2001-03-30 13:45:00+00:00
15494   2001-03-30 13:50:00+00:00
15495   2001-03-30 13:55:00+00:00
Name: datetime, Length: 8432, dtype: datetime64[ns, UTC], 'derivative': True, 'clark': 0.10214224288026245, 'davis_and_rasmussen': -0.24966269348758294, 'rahi': 0.3529139626245524, 'rojstaczer': 0.5039950805827401, 'average_of_ratios': 0.068208979341703, 'linear_regression': 0.005787699199442982, 'median_of_ratios': 0.0}}, 'Loc_B': {'2': {'BP': array([-0.00101974, -0.10095468,  0.10503366, ...,  0.        ,
       

#### Filter by groundwater location
Once we created our Site object containing all our data, we can decide to process certain locations individually, using the gw_loc method.

In [10]:
# Create Processing object for a specific groundwater location of example_site
locations = "Loc_A"
process_loc_A = hgs.Processing(example_site).by_gwloc(locations)

#### Add regular data attribute to the processing object
BE_time and other methods require the data to be uniformly sampled. Thus, if multiply multiple methods need access to uniformly sampled data it sometimes makes sense to pre-process the data using the make_regular() method to reduce the overall processing time.

In [11]:
# Create Processing object for two groundwater location of example_site and add a regularly sampled data attribute.
# It is automatically reused in some of the methods, reducing computation times
locations = ["Loc_A","Loc_B"]
process_loc_AB = hgs.Processing(example_site).by_gwloc(locations).make_regular()

9.84 % of the 'GW' data at 'Loc_A_all' was interpolated due to gaps < 3600s!
9.67 % of the 'GW' data at 'Loc_B_all' was interpolated due to gaps < 3600s!
Data of the category 'GW' is regularly sampled now!
0.02 % of the 'BP' data at 'Baro_all' was interpolated due to gaps < 3600s!
0.00 % of the 'GW' data at 'Loc_A_1' was interpolated due to gaps < 3600s!
0.00 % of the 'GW' data at 'Loc_B_1' was interpolated due to gaps < 3600s!
The groundwater (GW) and barometric pressure (BP) data is now aligned. There is now exactly one BP for every GW entry!


In [12]:
be_results_2 = process_loc_AB.BE_time(method="all")


Processing BE_time method...
Successfully calculated using method 'all' on GW data from '('Loc_A', '1')'!
Successfully calculated using method 'all' on GW data from '('Loc_A', '2')'!
Successfully calculated using method 'all' on GW data from '('Loc_B', '1')'!
Successfully calculated using method 'all' on GW data from '('Loc_B', '2')'!


### The View object