### Tutorial on how to use the data `resample` function of the s2s `time` module

We start by importing the required libraries and generating some an example Pandas Series and a DataFrame

In [1]:
import numpy as np
import xarray as xr
import pandas as pd
import s2spy.time

time_index = pd.date_range('20171020', '20211001', freq='15d')
random_data = np.random.random(len(time_index))
example_series = pd.Series(random_data, index=time_index)
example_dataframe = pd.DataFrame(example_series.rename('data1'))
example_dataframe['data2'] = example_dataframe['data1']

example_series

2017-10-20    0.337139
2017-11-04    0.213494
2017-11-19    0.955686
2017-12-04    0.686485
2017-12-19    0.298034
                ...   
2021-07-31    0.952994
2021-08-15    0.798515
2021-08-30    0.507279
2021-09-14    0.436875
2021-09-29    0.510620
Freq: 15D, Length: 97, dtype: float64

The DataFrame looks similar but has two named columns:

In [2]:
example_dataframe

Unnamed: 0,data1,data2
2017-10-20,0.337139,0.337139
2017-11-04,0.213494,0.213494
2017-11-19,0.955686,0.955686
2017-12-04,0.686485,0.686485
2017-12-19,0.298034,0.298034
...,...,...
2021-07-31,0.952994,0.952994
2021-08-15,0.798515,0.798515
2021-08-30,0.507279,0.507279
2021-09-14,0.436875,0.436875


To resample we need to set up an advent calendar with the anchor date and frequency:

In [3]:
calendar = s2spy.time.AdventCalendar(anchor_date=(10, 15), freq='90d')
calendar.map_years(2015, 2020).flat

anchor_year  i_interval
2020         0             (2020-07-17, 2020-10-15]
             1             (2020-04-18, 2020-07-17]
             2             (2020-01-19, 2020-04-18]
             3             (2019-10-21, 2020-01-19]
2019         0             (2019-07-17, 2019-10-15]
             1             (2019-04-18, 2019-07-17]
             2             (2019-01-18, 2019-04-18]
             3             (2018-10-20, 2019-01-18]
2018         0             (2018-07-17, 2018-10-15]
             1             (2018-04-18, 2018-07-17]
             2             (2018-01-18, 2018-04-18]
             3             (2017-10-20, 2018-01-18]
2017         0             (2017-07-17, 2017-10-15]
             1             (2017-04-18, 2017-07-17]
             2             (2017-01-18, 2017-04-18]
             3             (2016-10-20, 2017-01-18]
2016         0             (2016-07-17, 2016-10-15]
             1             (2016-04-18, 2016-07-17]
             2             (2016-01-19, 

Next we pass the example data to the `resample` method of the calendar.

In [4]:
resampled_series = calendar.resample(example_series)
resampled_series

Unnamed: 0,anchor_year,i_interval,interval,mean_data
0,2018,0,"(2018-07-17, 2018-10-15]",0.50445
1,2018,1,"(2018-04-18, 2018-07-17]",0.710327
2,2018,2,"(2018-01-18, 2018-04-18]",0.470996
3,2018,3,"(2017-10-20, 2018-01-18]",0.523299
4,2019,0,"(2019-07-17, 2019-10-15]",0.515867
5,2019,1,"(2019-04-18, 2019-07-17]",0.530227
6,2019,2,"(2019-01-18, 2019-04-18]",0.229032
7,2019,3,"(2018-10-20, 2019-01-18]",0.46416
8,2020,0,"(2020-07-17, 2020-10-15]",0.487994
9,2020,1,"(2020-04-18, 2020-07-17]",0.67227


As you see above, this generates a new DataFrame containing the data resampled for each interval, along with the corresponding lag time as well as the anchor year that the interval belongs to.

This works the same if you input a pandas DataFrame:

In [5]:
resampled_dataframe = calendar.resample(example_dataframe)
resampled_dataframe

Unnamed: 0,anchor_year,i_interval,interval,data1,data2
0,2018,0,"(2018-07-17, 2018-10-15]",0.50445,0.50445
1,2018,1,"(2018-04-18, 2018-07-17]",0.710327,0.710327
2,2018,2,"(2018-01-18, 2018-04-18]",0.470996,0.470996
3,2018,3,"(2017-10-20, 2018-01-18]",0.523299,0.523299
4,2019,0,"(2019-07-17, 2019-10-15]",0.515867,0.515867
5,2019,1,"(2019-04-18, 2019-07-17]",0.530227,0.530227
6,2019,2,"(2019-01-18, 2019-04-18]",0.229032,0.229032
7,2019,3,"(2018-10-20, 2019-01-18]",0.46416,0.46416
8,2020,0,"(2020-07-17, 2020-10-15]",0.487994,0.487994
9,2020,1,"(2020-04-18, 2020-07-17]",0.67227,0.67227


This works the same for an `xarray` `Dataset`:

In [6]:
import xarray as xr

time_index = pd.date_range('20171020', '20211001', freq='15d')

np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 2, len(time_index))
precipitation = 10 * np.random.rand(2, 2, len(time_index))

lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]


ds = xr.Dataset(
    data_vars=dict(
        temperature=(["x", "y", "time"], temperature),
        precipitation=(["x", "y", "time"], precipitation),
    ),
    coords=dict(
        lon=(["x", "y"], lon),
        lat=(["x", "y"], lat),
        time=time_index,
    ),
    attrs=dict(description="Weather related data."),
)
ds

As you see below, the `temperature` and `precipitation` variables can 
easily be resampled to a new index using the calendar specified intervals.

This index dimension has as coordinates the anchor year and the interval count.



Note: both the `temperature` and `precipitation` variables have kept their 
`lat` and `lon` coordinates.

In [7]:
ds_r = calendar.resample(ds)
ds_r