### Tutorial on how to use the data `resample` function of the s2s `time` module

We start by importing the required libraries and generating some an example Pandas Series and a DataFrame

In [1]:
import numpy as np
# import xarray as xr
import pandas as pd
import s2s.time

time_index = pd.date_range('20171020', '20211001', freq='15d')
random_data = np.random.random(len(time_index))
example_series = pd.Series(random_data, index=time_index)
example_dataframe = pd.DataFrame(example_series.rename('data1'))
example_dataframe['data2'] = example_dataframe['data1']

example_series

2017-10-20    0.166812
2017-11-04    0.615352
2017-11-19    0.608811
2017-12-04    0.536555
2017-12-19    0.516352
                ...   
2021-07-31    0.581551
2021-08-15    0.622603
2021-08-30    0.789754
2021-09-14    0.462017
2021-09-29    0.733950
Freq: 15D, Length: 97, dtype: float64

The DataFrame looks similar but has two named columns:

In [2]:
example_dataframe

Unnamed: 0,data1,data2
2017-10-20,0.166812,0.166812
2017-11-04,0.615352,0.615352
2017-11-19,0.608811,0.608811
2017-12-04,0.536555,0.536555
2017-12-19,0.516352,0.516352
...,...,...
2021-07-31,0.581551,0.581551
2021-08-15,0.622603,0.622603
2021-08-30,0.789754,0.789754
2021-09-14,0.462017,0.462017


To resample we need to set up an advent calendar with the anchor date and frequency:

In [3]:
calendar = s2s.time.AdventCalendar(anchor_date=(10, 15), freq='90d')

To resample the data we pass the example data to the `resample` method of the calendar.

In [4]:
resampled_series = calendar.resample(example_series)
resampled_series

Unnamed: 0,anchor_year,lag,interval,mean_data
0,2018,0,"(2018-07-17, 2018-10-15]",0.610081
1,2018,1,"(2018-04-18, 2018-07-17]",0.506726
2,2018,2,"(2018-01-18, 2018-04-18]",0.643391
3,2018,3,"(2017-10-20, 2018-01-18]",0.665477
4,2019,0,"(2019-07-17, 2019-10-15]",0.482043
5,2019,1,"(2019-04-18, 2019-07-17]",0.469529
6,2019,2,"(2019-01-18, 2019-04-18]",0.797876
7,2019,3,"(2018-10-20, 2019-01-18]",0.497652
8,2020,0,"(2020-07-17, 2020-10-15]",0.434242
9,2020,1,"(2020-04-18, 2020-07-17]",0.517219


As you see above, this generates a new DataFrame containing the data resampled for each interval, along with the corresponding lag time as well as the anchor year that the interval belongs to.

This works the same if you input a pandas DataFrame:

In [5]:
resampled_dataframe = calendar.resample(example_dataframe)
resampled_dataframe

Unnamed: 0,anchor_year,lag,interval,data1,data2
0,2018,0,"(2018-07-17, 2018-10-15]",0.610081,0.610081
1,2018,1,"(2018-04-18, 2018-07-17]",0.506726,0.506726
2,2018,2,"(2018-01-18, 2018-04-18]",0.643391,0.643391
3,2018,3,"(2017-10-20, 2018-01-18]",0.665477,0.665477
4,2019,0,"(2019-07-17, 2019-10-15]",0.482043,0.482043
5,2019,1,"(2019-04-18, 2019-07-17]",0.469529,0.469529
6,2019,2,"(2019-01-18, 2019-04-18]",0.797876,0.797876
7,2019,3,"(2018-10-20, 2019-01-18]",0.497652,0.497652
8,2020,0,"(2020-07-17, 2020-10-15]",0.434242,0.434242
9,2020,1,"(2020-04-18, 2020-07-17]",0.517219,0.517219
