# Resampling Time Series

In [1]:
import pandas as pd
import numpy as np

Sometimes a time series does not have the frequency you desire. Then you have to resample the data for it to have a ordered frequency.

In [2]:
rng = pd.date_range('1/1/2018', periods=72, freq='H')
ts = pd.Series(data=range(len(rng)), index=rng)
ts.head()

2018-01-01 00:00:00    0
2018-01-01 01:00:00    1
2018-01-01 02:00:00    2
2018-01-01 03:00:00    3
2018-01-01 04:00:00    4
Freq: H, dtype: int64

You can use the pandas `.asfreq()` method, which takes as arguments the new frequency (in this case 45 Minutes) and the method to use for filling holes in the reindexed series:

In [3]:
converted = ts.asfreq('45Min', method='pad')
converted.head()

2018-01-01 00:00:00    0
2018-01-01 00:45:00    0
2018-01-01 01:30:00    1
2018-01-01 02:15:00    2
2018-01-01 03:00:00    3
Freq: 45T, dtype: int64

Now the time series was resampled with a 45 minute index and is logically longer than the one with hourly frequency:

In [4]:
print(len(ts), len(converted))

72 95


Since we have more items in the index we must also have more data points within the data. We can specify how these new data points are filled with the `method` argument. In the last case we have used `pad` which is the same as `ffill`: We fill the newly created data points with the same data as the previous data point. An alternative would be to use `backfill` or `bfill` which uses the next valid observation to fill (much less common!).
Another option would be to just not specify a `method` at all. This means that pandas creates NaN for every new index item:

In [5]:
converted2 = ts.asfreq('45min')
converted2.head()

2018-01-01 00:00:00    0.0
2018-01-01 00:45:00    NaN
2018-01-01 01:30:00    NaN
2018-01-01 02:15:00    NaN
2018-01-01 03:00:00    3.0
Freq: 45T, dtype: float64

Of course you can also specify a less frequent time series with the `asfreq()` function:

In [6]:
converted2 = ts.asfreq('3H')
converted2.head()

2018-01-01 00:00:00     0
2018-01-01 03:00:00     3
2018-01-01 06:00:00     6
2018-01-01 09:00:00     9
2018-01-01 12:00:00    12
Freq: 3H, dtype: int64

What we see is that pandas simply dorps the other data points in between. However, this is usually not what we want. Usually we would like the data to be aggregated to the new frequency. This is where the `resample` method comes into play:

In [7]:
ts.resample('D').sum()

2018-01-01     276
2018-01-02     852
2018-01-03    1428
Freq: D, dtype: int64

In [8]:
ts.resample('3H').sum().head()

2018-01-01 00:00:00     3
2018-01-01 03:00:00    12
2018-01-01 06:00:00    21
2018-01-01 09:00:00    30
2018-01-01 12:00:00    39
Freq: 3H, dtype: int64

As you can see in the example above the function takes the index label of the first item in the aggregated bin (in this case 3 hours) and takes the sum of the next 3 data points. We can also specify that the function labels the aggregated bins not with the first index item but with the last with the `label` argument:

In [9]:
ts.resample('3H', label='right').sum().head()

2018-01-01 03:00:00     3
2018-01-01 06:00:00    12
2018-01-01 09:00:00    21
2018-01-01 12:00:00    30
2018-01-01 15:00:00    39
Freq: 3H, dtype: int64