In [1]:
import pandas as pd

### Pandas Resample

Resample is an amazing function that will convert your time series data into a different frequency (or time intervals). This is most often used when converting your granular data into larger buckets.

Running through examples:
1. Resampling minute data to 5 minute data
2. Resampling minute data to 5 minute data - changing the "close" side
3. Resampling minute data to 5 minute data - changing the "label" side
4. Up resampling quarterly data to monthly data with convention: start/end
5. Bonus: Combine close/label parameters together

First create a DataFrame with a Datetime Index. That's a fancy way of saying that Pandas recognizes the index as time points.

In [55]:
# Here I'm first creating a date range, then creating a DataFrame with the date range as the index.
index = pd.date_range('2/1/2020', periods=9, freq='T')
df = pd.DataFrame(data=range(9), index=index, columns=['count'])
df

Unnamed: 0,count
2020-02-01 00:00:00,0
2020-02-01 00:01:00,1
2020-02-01 00:02:00,2
2020-02-01 00:03:00,3
2020-02-01 00:04:00,4
2020-02-01 00:05:00,5
2020-02-01 00:06:00,6
2020-02-01 00:07:00,7
2020-02-01 00:08:00,8


### 1. Resampling minute data to 5 minute data
First off, we are going to down sample our data from 1 minute frequency to 5 minute frequency. It's called 'down sampling' becuase you're going *down* in the number of samples.


You need to ask yourself:
1. What new frequency do I want?
2. What do I want to do with the data points in the old frequency. What aggregate function do you want to apply? This is very similary to .groupby() agg functions

Here I'm doing setting the frequency to "5T" which means 5-minutes. Then I'm taking the sum of the data points. Notice how
1. The labels of the new frequency start at 00:00:00. This is known as the 'left' side of the bin.
2. The data point under 00:05:00 is *not included* in the first bucket. This means the new bin is 'closed' on the left. Meaning the right most data point is not included in the new bucket. Yes, I know it sounds like 'closed on the left' should mean that the left data point is not included, but this isn't the case.

In [57]:
df.resample('5T').sum()

Unnamed: 0,count
2020-02-01 00:00:00,10
2020-02-01 00:05:00,26


### 2. Resampling minute data to 5 minute data - changing the "close" side
Now let's change the 'close' side. Say you wanted to include the 00:05:00 data point within the first bucket. By default the closed side is usually the left.

Here we set closed='right'. Woah, we get another label - 23:55:00. This is because the old 00:00:00 data point needed somewhere to go. It used to be included within the 00:00:00 bucket when close='left' but now that we chose close='right' the 0 is in it's own bucket.

In [58]:
df.resample('5T', closed='right').sum()

Unnamed: 0,count
2020-01-31 23:55:00,0
2020-02-01 00:00:00,15
2020-02-01 00:05:00,21


### 3. Resampling minute data to 5 minute data - changing the "label" side.
See how after we down sampled our original data frame, the resulting index labels were on the left side of the bin? This is because the label defaults to the left. However, we can change this to the right.

In [60]:
df.resample('5T', label='right').sum()

Unnamed: 0,count
2020-02-01 00:05:00,10
2020-02-01 00:10:00,26


### 4. Up resampling quarterly data to monthly data with convention: start/end
So far we have down sampled our data. But what about up sampling? No problem, but we need to choose where we want to put our data points. By definition, since we are 'zooming in' on our data, we need to tell pandas where to put the previous data points.

Let's create another DataFrame of quarters with a *period range*. Think of period ranges representing intervals while time ranges represent specific times.

In [64]:
# Here I'm first creating a period range, then creating a DataFrame with the period range as the index.
index = pd.period_range('1/1/2020', periods=3, freq='Q')
df = pd.DataFrame(data=range(1, 4), index=index, columns=['count'])
df

Unnamed: 0,count
2020Q1,1
2020Q2,2
2020Q3,3


Now say I want to turn this quarterly data into monthly data. All we need to do is call .resample() and pass the months!

Notice how the data below is placed at the *start* of the period

In [62]:
df.resample('M', convention='start').sum()

Unnamed: 0,count
2020-01,1.0
2020-02,
2020-03,
2020-04,2.0
2020-05,
2020-06,
2020-07,3.0
2020-08,
2020-09,


and here the data is placed at the *end* of the period

In [63]:
df.resample('M', convention='end').sum()

Unnamed: 0,count
2020-03,1.0
2020-04,
2020-05,
2020-06,2.0
2020-07,
2020-08,
2020-09,3.0


### 5. Bonus: Combine close/label parameters together

Here I'm going to take my 3 minute time sample, and change it to a 7 minute time sample with labels and close on the right side of the bins.

In [48]:
index = pd.date_range('2/1/2020', periods=9, freq='3T')
df = pd.DataFrame(data=range(9), index=index, columns=['count'])
df

Unnamed: 0,count
2020-02-01 00:00:00,0
2020-02-01 00:03:00,1
2020-02-01 00:06:00,2
2020-02-01 00:09:00,3
2020-02-01 00:12:00,4
2020-02-01 00:15:00,5
2020-02-01 00:18:00,6
2020-02-01 00:21:00,7
2020-02-01 00:24:00,8


In [53]:
df.resample('7T', label='right', closed='right').sum()

Unnamed: 0,count
2020-02-01 00:00:00,0
2020-02-01 00:07:00,3
2020-02-01 00:14:00,7
2020-02-01 00:21:00,18
2020-02-01 00:28:00,8


Check out how our data is now in 7 minute intervals with the right-most bin data included and labels are the right bins. Nice.