# Resampling Series in Pyleoclim

Resampling Pyleoclim Series objects is analagous to the Pandas resample method. It provides the same functionality
but with some additional resampling options useful for paleo work, such as specifying units like `Ga`. 

In [None]:
import pyleoclim as pyleo

First, let's load some data to work with. 

In [None]:
ts = pyleo.utils.load_dataset('LR04')
ts

From the metadata provided, we can see that the time span is specified in `ky BP`. However, the time values are not evenly spaced. Before we begin any analysis, we want to resample the data to ensure even spacing.

Let's resample every 5 ka (5,000 years). 

In [None]:
ts.resample('5ka')

Now we have a `SeriesResampler` object. Just like in the [Pandas API](https://pandas.pydata.org/docs/reference/resampling.html), 
the resampler object is merely a stepping stone in the process, not an endpoint. We need to define **how** to aggregate the 
data between our timesteps. 

Because the Pyleoclim is a light wrapper around the Pandas resample API, you can review all the availiable aggregation options
in the [Pandas docs](https://pandas.pydata.org/docs/reference/resampling.html).

For now, we'll select `mean`. This will give us the average value of the data point within each sample period. 


In [None]:
ts5k = ts.resample('5ka').mean()
ts5k

Great! We've successfully resampled to every 5,000 years, but notice it looks like we have some rounding errors.
Actually, because we are using the specialized Paleo time units (ka), we are converting between float and 
datetime. 

If we inspect the index as a Pandas DatetimeIndex instead of floats, we see that the numbers are
actually round dates. 

In [None]:
ts5k.datetime_index

Now let's plot both of these on top of one another to make a 1:1 comparison. 

In [None]:
fig, ax = ts.plot(invert_yaxis='True')
ts5k.plot(ax=ax,color='C1')

## Going further

You may be wondering what other time units are available. 

In [None]:
from pyleoclim.utils.tsbase import MATCH_A, MATCH_KA, MATCH_MA, MATCH_GA

In [None]:
print(f'Available time units are: \n{list(MATCH_A)} \n{list(MATCH_KA)} \n{list(MATCH_MA)} \n{list(MATCH_GA)}')

Suppose you have some NaN values in your results that you need to handle. You can interpolate across these using a variety of 
[interpolation methods](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html)

In [None]:
ts_linear_fill = ts5k.pandas_method(lambda x: x.interpolate(method='linear'))
ts_linear_fill

## Interactive Plotting

In [None]:
import holoviews as hv
hv.extension('bokeh')

In [None]:
(
    hv.Curve(list(zip(ts.time, ts.value))).opts(invert_yaxis=True, line_width=3) * \
    hv.Curve(list(zip(ts5k.time, ts5k.value))).opts(invert_yaxis=True, line_width=1)
).opts(width=1200, height=800)

From our plot we can make some quick observations. First, during the time period when the sampling was at a higher resolution,
the two dataset match. Second, during the less frequent sampling period we have introduced some error. Our highs and lows have lost
some accuracy. 

Using this appproach we can get a brief visual indication of the accuracy of our resampling method and iterate on different 
methods as needed. 