# HydroGeoSines
## A general data processing workflow

This notebook demonstrates the general data handling capabilities of HydroGeoSines. The standard workflow for loading, processing and analysing data, as well as exporting and visualizing results is demonstrated on a simple example dataset. We show how the Site object and its methods can be used to store data and how the data processing is handled via the Processing object and its methods.

### Import HGS
Currently, the HydroGeoSines is not fully implemented as an installable package. Instead. we have to move to the parent directory, to import the package.

In [1]:
import os
os.chdir("../../")
print("Current Working Directory " , os.getcwd())

# Load the HGS package
import hydrogeosines as hgs

Current Working Directory  /media/daniel/SharedData/Workspaces/GitHub/HydroGeoSines


In [2]:
# and other packages used in this tutorial
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### The Site object
Typically, we have time series data of groundwater head measurements from a couple of different loggers that are located at a site of interest. Similarly, we aggreate all our data records into a hgs.Site object. The Site object has a geo-location that attribute to add information on longitude, latitude and height . This is can later be used to calculate site specific Earth Tide records.  

In [3]:
# Create a Site object
example_site = hgs.Site('example', geoloc=[141.762065, -31.065781, 160])
print(example_site)

<hydrogeosines.models.site.Site object at 0x7fd5585b9700>


### Load Data
#### Import groundwater head records
The import_csv method of the Site object can be used to import the three standard input categories "GW", "BP" and "ET" (groundwater, barometric pressure, and earth tides). In general, the hgs package is implemented in SI units. By passing a *unit* argument for your input dataset, units are automatically converted. 

In the present example, a dataset with three groundwater records is loaded. The location names are explicitly set as "Loc_A", "Loc_B" and "Loc_C" using the loc_names parameter, because there are no column headers in the data set (header = None).

In [4]:
# Load all our data attributed to the Site
example_site.import_csv('tests/data/notebook/GW_record.csv', 
                        input_category=["GW"]*3, 
                        utc_offset=10, 
                        unit=["m"]*3,
                        loc_names = ["Loc_A","Loc_B"], 
                        header = None,
                        check_duplicates=True) 

A new time series was added ...
No duplicate entries were found.


The Site object now has the groundwater records added to its data attribute. It is stored as a Pandas DataFrame with a set of predefined column names:
 - **datetime:** the first column of every input data record should be a datetime convertible format
 - **category:** the data category (GW,BP or ET)
 - **location:** either infered from the header or defined by the loc_names parameter of the import method
 - **part:** pre-set to "all". For non-uniform data records, the data set is later split into uniform parts
 - **unit:** unit (SI after import)
 - **value** 

In [5]:
example_site.data.head(3)

Unnamed: 0,datetime,category,location,part,unit,value
0,2000-12-31 14:00:30+00:00,GW,Loc_A,all,m,7.017
1,2000-12-31 14:05:30+00:00,GW,Loc_A,all,m,7.017
2,2000-12-31 14:10:30+00:00,GW,Loc_A,all,m,7.016


In [6]:
example_site.data.location.unique()

array(['Loc_A', 'Loc_B'], dtype=object)

#### Import barometric pressure records
The import of barometric pressure records is similar to the groundwater head import. Only "BP" needs to be passed as an argument to the "category" parameter. Setting the *how* parameter to "all", the Site data attribute is updated and the BP record is added to the previously imported GW data.

In [7]:

example_site.import_csv('tests/data/notebook/BP_record.csv', 
                        input_category="BP", 
                        utc_offset=10, 
                        unit="m", 
                        loc_names = "Baro",
                        header = None,
                        how="add", check_duplicates=True) 

A new time series was added ...
No duplicate entries were found.


In [8]:
example_site.data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 108063 entries, 0 to 108062
Data columns (total 6 columns):
 #   Column    Non-Null Count   Dtype              
---  ------    --------------   -----              
 0   datetime  108063 non-null  datetime64[ns, UTC]
 1   category  108063 non-null  object             
 2   location  108063 non-null  object             
 3   part      108063 non-null  object             
 4   unit      108063 non-null  object             
 5   value     68173 non-null   float64            
dtypes: datetime64[ns, UTC](1), float64(1), object(4)
memory usage: 4.9+ MB


### The Processing object
The Processing object enables easy access to the hgs methods for data pre-processing and data analysis. These include methods for calculating barometric efficiencies, corrected groundwater heads or extracting harmonic components from records.

In [9]:
# Create a Processing object of example site
process_example = hgs.Processing(example_site)

#### Get comprehensive info on the data loaded into the  processing object

In [10]:
process_example.info()

-------------------------------------------------
Summary of dataset:
-------------------------------------------------
Category: GW, Location: Loc_A
Start: 31/12/2000 14:00:30 UTC
Stop:  30/03/2001 13:55:49 UTC
UTC offset: +10.00 h
Sampling: 280-84270 sec (irregular)
Values: 43,810 (19,744 empty)
Unit: m
-------------------------------------------------
Category: GW, Location: Loc_B
Start: 31/12/2000 14:00:30 UTC
Stop:  30/03/2001 13:55:49 UTC
UTC offset: +10.00 h
Sampling: 275-200400 sec (irregular)
Values: 43,810 (20,146 empty)
Unit: m
-------------------------------------------------
-------------------------------------------------
Category: BP, Location: Baro
Start: 31/12/2000 14:00:00 UTC
Stop:  30/03/2001 13:59:59 UTC
UTC offset: +10.00 h
Sampling: 1-952200 sec (irregular)
Values: 20,443 (0 empty)
Unit: m
-------------------------------------------------
-------------------------------------------------


After instantiating the Processing object, we can simply run the desired method, which returns a new object containing the method results. In this case, we want to compute all available time domain barometric efficiencies (BE) available in the BE_time() method. 

#### Example: Calculate BE using the BE_time() method
The BE_time() methods requires our data to be uniformly sampled. Thus, preprocessing steps are applied to the data of the Site object. First the groundwater head measurements are resampled, interpolated and if necessary split into sub-parts of uniform sampling. Then the BP records are aligned with the GW data. Then the barometric efficiencies are calculated for every location and part individually.

In [11]:
# Test the BE Time methods
BE_results  = process_example.BE_time(method="all")

-------------------------------------------------
Processing BE_time method ...
4.73 % of the 'GW' data at 'Loc_A_all' was interpolated due to gaps < 3600s!
5.08 % of the 'GW' data at 'Loc_B_all' was interpolated due to gaps < 3600s!
Data of the category 'GW' is regularly sampled now!

Start iteration No. 1 ...

----- Loc_A_1 -----
BP record resampled to 1 sample per 300s.

Processing BP gaps ...
0.02 % of the 'BP' data at 'Baro_all' was interpolated due to gaps < 3600s!
... record gaps between 2001-02-27 16:30 and 2001-03-01 07:15 too large for interpolation!

Processing GW gaps ...
... dropping GW and BP entries for which BP record gaps are too big.
0.00 % of the 'GW' data at 'Loc_A_1' was interpolated due to gaps < 3600s!

----- Loc_B_1 -----
BP record resampled to 1 sample per 300s.

Processing BP gaps ...
6.10 % of the 'BP' data at 'Baro_all' was interpolated due to gaps < 3600s!
... record gaps between 2001-02-27 16:30 and 2001-03-01 07:15 too large for interpolation!

Processing

#### The results container
BE_results now contains a <font color="green">nested</font> dictionary for the <font color="red">*BE_time()*</font> method results. The top level of the nested dictionary constains one item for each method that has been applied to the data. The second level contains one item for each location and its sub-parts.

- Each method is stored as an item in the results dictionary with the name of the method as the key:

In [12]:
print(BE_results.keys())

dict_keys(['be_time'])


- The method dictionary items are also dictionaries (i.e. forming a nested dictionary). For each location a seperate entry is created:

In [13]:
print(BE_results["be_time"].keys())

dict_keys([('Loc_A', '1'), ('Loc_A', '2'), ('Loc_B', '1'), ('Loc_B', '2')])


 - The final method results are stored as a list with 3 entries. The first entry (index 0) contains the method output, the second (1) the input data as a DataFrame with Datetime index, and the third entry (2) is for additional information:

In [14]:
print("Output:\n",BE_results["be_time"]["Loc_A","1"][0],"\n")
print("Input:\n", BE_results["be_time"]["Loc_A","1"][1].head(3),"\n")
print("Info:\n", BE_results["be_time"]["Loc_A","1"][2],"\n")

Output:
 {'clark': 0.14785256869529242, 'davis_and_rasmussen': -0.28441191303176316, 'rahi': 0.38839574348629413, 'rojstaczer': 0.6783000137341513, 'average_of_ratios': 0.03110885146484908, 'linear_regression': 0.02210803548828672, 'median_of_ratios': 0.0} 

Input:
                               GW        BP
datetime                                  
2001-02-03 03:50:00+00:00  0.010 -0.010197
2001-02-03 03:55:00+00:00  0.001 -0.004079
2001-02-03 04:00:00+00:00  0.003 -0.002039 

Info:
 {'derivative': True, 'unit': '-', 'utc_offset': 10} 



#### How to filter data by groundwater location
Once we created our Site object containing all our data, we can decide to process only a subset of the available locations, using the gw_loc method.

In [15]:
# Create Processing object for only one specific groundwater location of example_site
locations = ["Loc_A"]
process_A = hgs.Processing(example_site).by_gwloc(locations)

Filter dataset by location ...


Lets check if there is now only the data of location A ("Loc_A") in our processing object:

In [16]:
process_A.site.data.location.unique()

array(['Baro', 'Loc_A'], dtype=object)

### Advanced and manual preprocessing
Although the Processing class automatically handles and applies all required and recommended data preprocessing steps for the analysis methods to work, these can also be customized by the user. 

The RegularAndAligned() method consists of two main functions. First, the make_regular() method to regularly sample the groundwater data and second, the BP_align() method to align the BP entries to the groundwater data. As a result, every groundwater record will have a matching BP meassurement for the same point in time.

####  The make_regular() method
The make_regular() method can be accessed directly through the hgs pandas accessor:
```python
example_site.data.hgs.make_regular()
```

It has several parameters with default values:
 - **inter_max:** int = 3600 <br />*This is the maximum interpolated time interval in seconds. Any gap larger than this value will not be interpolated.*
 - **part_min:** int = 20 <br />*The minimum record duration without gaps in days. If there are gaps in the data that can not be interpolated, the data is split into parts. TIn this case, every part needs to fullfill the minimum criteria. Otherwise it is dropped from the data.* 
 - **method:** str = "backfill" <br />*The interpolation method of Pandas to be used. Check out the Pandas documenation for more informations on the available methods.*
 - **category** = "GW" <br />*This method was developed for groundwater data, but can in principal be applied to other categories as well.*
 - **spl_freq:** int = None <br />*The method is automatically calculating the most common sampling frequency for each location. But the parameter can also be passed to the function as an argument.*
 - **inter_max_total:** int = 10 <br />*The maximum percentage threshold of values to be interpolated. If this threshold is exceeded, there were to many gaps in the data.* 

In [17]:
# select data from site object
data = example_site.data

# lets check if there are any NaN in the value column of the groundwater category:
print("There are missing values in the data:", data[data.category == "GW"].value.isna().any())

There are missing values in the data: True


##### Upsample data
Now lets upsample (i.e. interpolate the missing values) our data using the "time" method. Internally this calls on the following function, which is individually applied to all locations of the GW data:
```python
data.hgs.upsample("time")
```

In [18]:
data_resample = data.hgs.make_regular(method='time', inter_max = 3600)
data_resample.hgs.filters.get_gw_data.head(3)

#, part_min = 20, method = "backfill", category = "GW", spl_freq = None, inter_max_total = 10)

4.73 % of the 'GW' data at 'Loc_A_all' was interpolated due to gaps < 3600s!
5.08 % of the 'GW' data at 'Loc_B_all' was interpolated due to gaps < 3600s!
Data of the category 'GW' is regularly sampled now!


Unnamed: 0,datetime,category,location,part,unit,value
0,2001-02-03 03:45:00+00:00,GW,Loc_A,1,m,6.712
1,2001-02-03 03:50:00+00:00,GW,Loc_A,1,m,6.722
2,2001-02-03 03:55:00+00:00,GW,Loc_A,1,m,6.723


##### Custom sampling frequency
Resample data to a sampling frequency of 1 hour (3600 seconds).

**Careful!** The interpolation maximum (inter_max) always has to be equal or higher than the sampling frequency. Otherwise your data won't be interpolated correctly.

In [19]:
data_resample = data.hgs.make_regular(inter_max = 5400, spl_freq = 3600)
data_resample.head(3) 

0.00 % of the 'GW' data at 'Loc_A_all' was interpolated due to gaps < 5400s!
0.00 % of the 'GW' data at 'Loc_B_all' was interpolated due to gaps < 5400s!
Data of the category 'GW' is regularly sampled now!


Unnamed: 0,datetime,category,location,part,unit,value
0,2001-02-03 03:00:00+00:00,GW,Loc_A,1,m,6.719
1,2001-02-03 04:00:00+00:00,GW,Loc_A,1,m,6.726083
2,2001-02-03 05:00:00+00:00,GW,Loc_A,1,m,6.729917


Our data has been resampled to one sample per hour. HGS also provides a DataFrame attribute to check for the most common sample frequency by group (i.e. splitted by category, location, parts and unit) called *spl_freq_groupby*. This attribute is also accessed by the make_regular() method and used to configure the resampling.

We can see that all GW data is resampled to **3600**, while the BP data was untouched and still has a sampling frequency of **300**:

In [20]:
spl_freq = data_resample.hgs.spl_freq_groupby
spl_freq

category  location  part  unit
BP        Baro      all   m        300.0
GW        Loc_A     1     m       3600.0
          Loc_B     1     m       3600.0
Name: datetime, dtype: float64

We can also use the information from this HGS attribute and redefine the sampling frequencies, which can then be passed on to other methods. Lets say we want the sampling frequency of our groundwater records for locations **Loc_A** and **Loc_B** to be **180** and **1500**, respectively:

In [21]:
# get the sampling frequencies of each group
spl_freq = data.hgs.spl_freq_groupby
# redefine the sampling frequencies for the groundwater category
spl_freq["GW"] = [180,1500]
print(spl_freq)

category  location  part  unit
BP        Baro      all   m        300.0
GW        Loc_A     all   m        180.0
          Loc_B     all   m       1500.0
Name: datetime, dtype: float64


In [22]:
# Resample all data
custom_resample = data.hgs.resample_by_group(spl_freq, origin="start")
# Get BP and GW data in seperate DataFrames
bp_data = custom_resample.hgs.filters.get_bp_data
gw_data = custom_resample.hgs.filters.get_gw_data

Looking at the groundwater data of location B we can see that it is now resampled to one sample every 25 minutes (1500 seconds). Additionally, the origin was set to the original start time of the record:

In [23]:
gw_data[gw_data.location=="Loc_B"].head(5)

Unnamed: 0,datetime,category,location,part,unit,value
68351,2000-12-31 14:00:30+00:00,GW,Loc_B,all,m,1.376667
68352,2000-12-31 14:25:30+00:00,GW,Loc_B,all,m,1.376
68353,2000-12-31 14:50:30+00:00,GW,Loc_B,all,m,1.374
68354,2000-12-31 15:15:30+00:00,GW,Loc_B,all,m,1.37425
68355,2000-12-31 15:40:30+00:00,GW,Loc_B,all,m,1.3762


Our groundwater data for location A is sampled every 3 minutes (180 seconds). Of course, resampling the data at a frequency higher then the original 300 seconds leaves us with gaps at regular intervals:

In [24]:
gw_data[gw_data.location=="Loc_A"].head(5)

Unnamed: 0,datetime,category,location,part,unit,value
25632,2000-12-31 14:00:30+00:00,GW,Loc_A,all,m,7.017
25633,2000-12-31 14:03:30+00:00,GW,Loc_A,all,m,7.017
25634,2000-12-31 14:06:30+00:00,GW,Loc_A,all,m,
25635,2000-12-31 14:09:30+00:00,GW,Loc_A,all,m,7.016
25636,2000-12-31 14:12:30+00:00,GW,Loc_A,all,m,


#### Interpolate gaps
The make_regular() method of HGS applies a gap filling routine to the data using groupby. It can be accessed via:
```python
hgs.ext.pandas_hgs.HgsAccessor.gap_routine
```

We usually apply this data processing step to only one of our data categories (GW or BP). In our case, we choose to fill the gaps in our resampled GW data. Thus, we select the required sampling frequency information accordingly: 

In [25]:
# drop the BP sampling frequency information
spl_freqs_gw = spl_freq.drop("BP")
spl_freqs_gw

category  location  part  unit
GW        Loc_A     all   m        180.0
          Loc_B     all   m       1500.0
Name: datetime, dtype: float64

First we group our gw_data DataFrame by its columns of Dtype *object*, which are "category", "location", "part" and "unit" by default. HGS provides a simple filtering attribute for this operation:
```python
gw_data.hgs.filters.obj_col
```

Then we apply the gap_routine setting the following parameters:
- **spl_freqs_gw:** These are our custom groundwater sampling frequencies of our data. They are used to identify gaps that might be too large for interpolation. This parameter is not strictly neccessary, as the gap routine also checks the HGS *spl_freq_groupby* attribute. 
- **inter_max_total:** We need to interpolate a lot of data, thus we have to increase or interpolation maximum.
- **part_min** 
- **method:** Interpolate the value column using the datetime as index. This results in a smooth interpolation between existing datapoints.

In [26]:
regular = gw_data.groupby(gw_data.hgs.filters.obj_col).apply(hgs.ext.pandas_hgs.HgsAccessor.gap_routine,mcf=spl_freqs_gw,inter_max_total=50,part_min=10,method="time") 
regular = regular.reset_index(drop=True)

42.28 % of the 'GW' data at 'Loc_A_all' was interpolated due to gaps < 3600s!
0.14 % of the 'GW' data at 'Loc_B_all' was interpolated due to gaps < 3600s!


As we can see from the console print out, around 40% of all groundwater data at Location A had to be interpolated. Now, lets have a closer look at the data. We can see that location A has been split into three parts: 

In [27]:
regular.hgs.filters.loc_names_unique

{('Loc_A', '1'), ('Loc_A', '2'), ('Loc_A', '3'), ('Loc_B', '1')}

In [28]:
#Using the pivot table we can see all parts of the location at once
regular[regular.location=="Loc_A"].hgs.pivot

category,GW,GW,GW
location,Loc_A,Loc_A,Loc_A
part,1,2,3
unit,m,m,m
datetime,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4
2000-12-31 14:00:30+00:00,7.0170,,
2000-12-31 14:03:30+00:00,7.0170,,
2000-12-31 14:06:30+00:00,7.0165,,
2000-12-31 14:09:30+00:00,7.0160,,
2000-12-31 14:12:30+00:00,7.0160,,
...,...,...,...
2001-03-30 13:42:30+00:00,,,6.9545
2001-03-30 13:45:30+00:00,,,6.9540
2001-03-30 13:48:30+00:00,,,6.9510
2001-03-30 13:51:30+00:00,,,6.9540


In [29]:
regular[regular.location=="Loc_B"].head(5)

Unnamed: 0,datetime,category,location,part,unit,value
42128,2001-01-12 15:35:30+00:00,GW,Loc_B,1,m,1.277
42129,2001-01-12 16:00:30+00:00,GW,Loc_B,1,m,1.2748
42130,2001-01-12 16:25:30+00:00,GW,Loc_B,1,m,1.2766
42131,2001-01-12 16:50:30+00:00,GW,Loc_B,1,m,1.274333
42132,2001-01-12 17:15:30+00:00,GW,Loc_B,1,m,1.279


Lets see if there are any NaN values left in the groundwater data. Or in other words, there are no records without a valid value:

In [30]:
regular.hgs.filters.is_nan

False

NICE!!! Our data is regularly sampled and interpolated

####  The BP_align() method
The BP_align() method can be accessed directly through the hgs pandas accessor:
```python
example_site.data.hgs.BP_align()
```

It has several parameters with default values, all of them can also be found in the make_regular() method:
- **inter_max:** int = 3600
- **method:** str = "backfill"
- **part_min:** int = 20
- **inter_max_total:** int = 10

BP_align() automatically tries to match the sampling frequency of the groundwater records. It does so, by individually upsampling or downsampling the BP records for each GW location and its parts. Therefore, no user defined sampling frequency is available at this step. 
Gaps that exceed the inter_max threshold and thus, can not be interpolated are used to drop the according entries from the GW record. This generally causes another split into parts. The part_min parameter ensures that only parts large then the threshold are retained in the data.
In some cases the BP and GW data can not be aligned. The main reason usually is that there are too many gaps in the BP record. In this case, try to reduce the part_min or increase the inter_max and inter_max_total parameters.

In [31]:
data_aligned = data_resample.hgs.BP_align(inter_max = 5400)


Start iteration No. 1 ...

----- Loc_A_1 -----
BP record resampled to 1 sample per 3600s.

Processing BP gaps ...
0.00 % of the 'BP' data at 'Baro_all' was interpolated due to gaps < 5400s!
... record gaps between 2001-02-27 17:00 and 2001-03-01 06:00 too large for interpolation!

Processing GW gaps ...
... dropping GW and BP entries for which BP record gaps are too big.
0.00 % of the 'GW' data at 'Loc_A_1' was interpolated due to gaps < 5400s!

----- Loc_B_1 -----
BP record resampled to 1 sample per 3600s.

Processing BP gaps ...
0.00 % of the 'BP' data at 'Baro_all' was interpolated due to gaps < 5400s!
... record gaps between 2001-02-27 17:00 and 2001-03-01 06:00 too large for interpolation!

Processing GW gaps ...
... dropping GW and BP entries for which BP record gaps are too big.
0.00 % of the 'GW' data at 'Loc_B_1' was interpolated due to gaps < 5400s!


We can now check if the GW and BP data is truely aligned:

In [32]:
data_aligned.hgs.check_alignment()

The groundwater (GW) and  BP data is aligned. There is exactly one BP for every GW entry!


True

At this stage it can be very convenient for inspection to pivot our data for and use the datetime as our index:

In [33]:
data_aligned.hgs.pivot

category,BP,GW,GW,GW,GW
location,Baro,Loc_A,Loc_A,Loc_B,Loc_B
part,all,1,2,1,2
unit,m,m,m,m,m
datetime,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,Unnamed: 5_level_4
2001-01-12 15:00:00+00:00,10.309615,,,1.277000,
2001-01-12 16:00:00+00:00,10.308085,,,1.275500,
2001-01-12 17:00:00+00:00,10.310635,,,1.279500,
2001-01-12 18:00:00+00:00,10.315733,,,1.280417,
2001-01-12 19:00:00+00:00,10.319302,,,1.281333,
...,...,...,...,...,...
2001-03-30 09:00:00+00:00,10.655818,,6.955917,,1.260750
2001-03-30 10:00:00+00:00,10.660492,,6.957000,,1.261444
2001-03-30 11:00:00+00:00,10.661512,,6.955917,,1.260727
2001-03-30 12:00:00+00:00,10.658452,,6.956667,,1.260750


#### Add the data_regular attribute to the processing object
BE_time and other methods require the data to be uniformly sampled. Thus, if multiple methods need access to uniformly sampled data it sometimes makes sense to pre-process the data using the make_regular() method to reduce the overall processing time.

You can also specify additional parameter that are internally passed to the make_regular() and BP_align() method. A comprehensive explanation for both methods and their parameters was given above.

In [34]:
# Create a Processing object of example site
process_RAA = hgs.Processing(example_site).RegularAndAligned(inter_max=5000, part_min=20,inter_max_total=40)

4.73 % of the 'GW' data at 'Loc_A_all' was interpolated due to gaps < 5000s!
5.08 % of the 'GW' data at 'Loc_B_all' was interpolated due to gaps < 5000s!
Data of the category 'GW' is regularly sampled now!

Start iteration No. 1 ...

----- Loc_A_1 -----
BP record resampled to 1 sample per 300s.

Processing BP gaps ...
0.02 % of the 'BP' data at 'Baro_all' was interpolated due to gaps < 5000s!
... record gaps between 2001-02-27 16:30 and 2001-03-01 07:15 too large for interpolation!

Processing GW gaps ...
... dropping GW and BP entries for which BP record gaps are too big.
0.00 % of the 'GW' data at 'Loc_A_1' was interpolated due to gaps < 5000s!

----- Loc_B_1 -----
BP record resampled to 1 sample per 300s.

Processing BP gaps ...
6.10 % of the 'BP' data at 'Baro_all' was interpolated due to gaps < 5000s!
... record gaps between 2001-02-27 16:30 and 2001-03-01 07:15 too large for interpolation!

Processing GW gaps ...
... dropping GW and BP entries for which BP record gaps are too big

Now we have an attribute with the regularly sampled data added to our processing object which can be accessed manually and will automatically be used by processing methods such as BE_time:

In [35]:
process_RAA.data_regular.head(3)

Unnamed: 0,datetime,category,location,part,unit,value
0,2001-02-03 03:45:00+00:00,GW,Loc_A,1,m,6.712
1,2001-02-03 03:50:00+00:00,GW,Loc_A,1,m,6.722
2,2001-02-03 03:55:00+00:00,GW,Loc_A,1,m,6.723


#### Compare runtime
Time difference between running the BE_time with a precalculated data_regular() attribute using the RegularAndAligned method() and without.

In [36]:
process_example = hgs.Processing(example_site)
# Turn off console output for readability
with hgs.utils.nullify_output(suppress_stdout=True, suppress_stderr=True):
    time1 = %timeit -n1 -r1 -o process_RAA.BE_time(method="all")
    time2 = %timeit -n1 -r1 -o process_example.BE_time(method="all")    

In [37]:
print("With RAA:",time1)
print("Without RAA:",time2)

With RAA: 596 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
Without RAA: 2.29 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


### The View object

... under preparation