# Resampling data

This notebook will demonstrate how to resample data with Lilio.
Lilio is able to resample Pandas' `DataFrame` and `Series`, as well as xarray's `DataArray` and `Dataset`.

We start by importing the required libraries and generating some an example Pandas Series and a DataFrame

In [1]:
import numpy as np
import xarray as xr
import pandas as pd
import lilio

time_index = pd.date_range('20171020', '20211001', freq='15d')
random_data = np.random.random(len(time_index))
example_series = pd.Series(random_data, index=time_index)
example_dataframe = pd.DataFrame(example_series.rename('data1'))
example_dataframe['data2'] = example_dataframe['data1']

example_series

2017-10-20    0.493865
2017-11-04    0.775964
2017-11-19    0.616071
2017-12-04    0.828319
2017-12-19    0.799355
                ...   
2021-07-31    0.228983
2021-08-15    0.596911
2021-08-30    0.555568
2021-09-14    0.365234
2021-09-29    0.780807
Freq: 15D, Length: 97, dtype: float64

The DataFrame looks similar but has two named columns:

In [2]:
example_dataframe

Unnamed: 0,data1,data2
2017-10-20,0.493865,0.493865
2017-11-04,0.775964,0.775964
2017-11-19,0.616071,0.616071
2017-12-04,0.828319,0.828319
2017-12-19,0.799355,0.799355
...,...,...
2021-07-31,0.228983,0.228983
2021-08-15,0.596911,0.596911
2021-08-30,0.555568,0.555568
2021-09-14,0.365234,0.365234


To resample we need to set up an calendar with the anchor date and frequency.
In this case we choose to use the `daily_calendar` shorthand.

(Passing `max_lag` is optional, as well as `allow_overlap`, but this will allow us to demonstrate that resampling works even when intervals overlap)

In [3]:
calendar = lilio.daily_calendar(
    anchor="10-15",
    freq='90d',
    max_lag=4,
    allow_overlap=True
)
calendar.map_years(2018, 2020)

Calendar(
    anchor='10-15',
    allow_overlap=True,
    mapping=('years', 2018, 2020),
    intervals=[
        Interval(role='target', length='90d', gap='0d'),
        Interval(role='precursor', length='90d', gap='0d'),
        Interval(role='precursor', length='90d', gap='0d'),
        Interval(role='precursor', length='90d', gap='0d'),
        Interval(role='precursor', length='90d', gap='0d')
    ]
)

Next we pass the example data to the `resample` function. This requires a mapped calendar and the input data.


In [4]:
resampled_series = lilio.resample(calendar, example_series)
resampled_series

Unnamed: 0,anchor_year,i_interval,interval,data,target
0,2018,-4,"[2017-10-20, 2018-01-18)",0.638379,False
1,2018,-3,"[2018-01-18, 2018-04-18)",0.27936,False
2,2018,-2,"[2018-04-18, 2018-07-17)",0.372093,False
3,2018,-1,"[2018-07-17, 2018-10-15)",0.584836,False
4,2018,1,"[2018-10-15, 2019-01-13)",0.53668,True
5,2019,-4,"[2018-10-20, 2019-01-18)",0.533325,False
6,2019,-3,"[2019-01-18, 2019-04-18)",0.688629,False
7,2019,-2,"[2019-04-18, 2019-07-17)",0.609113,False
8,2019,-1,"[2019-07-17, 2019-10-15)",0.515201,False
9,2019,1,"[2019-10-15, 2020-01-13)",0.513284,True


As you see above, this generates a new DataFrame containing the data resampled for each interval, along with the corresponding interval index, and the anchor year that the interval belongs to.

This works the same if you input a pandas DataFrame:

In [5]:
resampled_dataframe = lilio.resample(calendar, example_dataframe)
resampled_dataframe

Unnamed: 0,anchor_year,i_interval,interval,data1,data2,target
0,2018,-4,"[2017-10-20, 2018-01-18)",0.638379,0.638379,False
1,2018,-3,"[2018-01-18, 2018-04-18)",0.27936,0.27936,False
2,2018,-2,"[2018-04-18, 2018-07-17)",0.372093,0.372093,False
3,2018,-1,"[2018-07-17, 2018-10-15)",0.584836,0.584836,False
4,2018,1,"[2018-10-15, 2019-01-13)",0.53668,0.53668,True
5,2019,-4,"[2018-10-20, 2019-01-18)",0.533325,0.533325,False
6,2019,-3,"[2019-01-18, 2019-04-18)",0.688629,0.688629,False
7,2019,-2,"[2019-04-18, 2019-07-17)",0.609113,0.609113,False
8,2019,-1,"[2019-07-17, 2019-10-15)",0.515201,0.515201,False
9,2019,1,"[2019-10-15, 2020-01-13)",0.513284,0.513284,True


This works the same for an `xarray` `Dataset`:

In [6]:
import xarray as xr

time_index = pd.date_range('20171020', '20211001', freq='15d')

np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 2, len(time_index))
precipitation = 10 * np.random.rand(2, 2, len(time_index))

lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]


ds = xr.Dataset(
    data_vars=dict(
        temperature=(["x", "y", "time"], temperature),
        precipitation=(["x", "y", "time"], precipitation),
    ),
    coords=dict(
        lon=(["x", "y"], lon),
        lat=(["x", "y"], lat),
        time=time_index,
    ),
    attrs=dict(description="Weather related data."),
)
ds

As you see below, the `temperature` and `precipitation` variables can 
easily be resampled to a new index using the calendar specified intervals.

This index dimension has as coordinates the anchor year and the interval count.



Note: both the `temperature` and `precipitation` variables have kept their 
`lat` and `lon` coordinates.

In [7]:
ds_r = lilio.resample(calendar, ds)
ds_r