# 11.6 Resampling and Frequency Conversion

In [1]:
import numpy as np
import pandas as pd

Resampling refers to the process of converting a time series from one frequency to
another. Aggregating higher frequency data to lower frequency is called downsampling,
while converting lower frequency to higher frequency is called upsampling. Not
all resampling falls into either of these categories; for example, converting W-WED
(weekly on Wednesday) to W-FRI is neither upsampling nor downsampling.

Works like groupby!!!!

In [2]:
rng = pd.date_range('2000-01-01', periods=100, freq='D')

In [3]:
ts = pd.Series(np.random.randn(len(rng)), index=rng)

In [5]:
ts[:5]

2000-01-01    0.782903
2000-01-02   -1.428502
2000-01-03   -0.115439
2000-01-04   -1.061670
2000-01-05    0.864519
Freq: D, dtype: float64

In [6]:
ts.resample('M')

DatetimeIndexResampler [freq=<MonthEnd>, axis=0, closed=right, label=right, convention=start, base=0]

In [9]:
for stamp, group in ts.resample('M'):
    print(stamp)
    print(group)
    print(len(group)); print()

2000-01-31 00:00:00
2000-01-01    0.782903
2000-01-02   -1.428502
2000-01-03   -0.115439
2000-01-04   -1.061670
2000-01-05    0.864519
2000-01-06    0.925758
2000-01-07    0.723171
2000-01-08    0.104631
2000-01-09   -1.043485
2000-01-10   -0.621766
2000-01-11    0.687975
2000-01-12   -0.951005
2000-01-13    0.698133
2000-01-14    0.592596
2000-01-15   -0.433250
2000-01-16   -0.529223
2000-01-17    0.578950
2000-01-18    1.102302
2000-01-19   -0.371038
2000-01-20   -0.231728
2000-01-21    1.723778
2000-01-22    0.197198
2000-01-23   -0.074974
2000-01-24    0.878036
2000-01-25    0.863213
2000-01-26    0.361256
2000-01-27    0.358858
2000-01-28    0.851106
2000-01-29    0.007070
2000-01-30   -0.539576
2000-01-31    0.122517
Freq: D, dtype: float64
31

2000-02-29 00:00:00
2000-02-01    1.071434
2000-02-02    0.172286
2000-02-03    0.173666
2000-02-04   -1.936565
2000-02-05    1.610406
2000-02-06   -0.908205
2000-02-07    0.967312
2000-02-08    0.520465
2000-02-09    0.360740
2000-02-10  

In [10]:
ts.resample('M').mean()

2000-01-31    0.162010
2000-02-29    0.181635
2000-03-31   -0.109297
2000-04-30   -0.460643
Freq: M, dtype: float64

In [15]:
ts.resample('M', kind='period').mean()
# kind parameter specifies the index type

2000-01    0.162010
2000-02    0.181635
2000-03   -0.109297
2000-04   -0.460643
Freq: M, dtype: float64

---

## Downsampling

Aggregating data to a regular, lower frequency.

Aggregating data to a regular, lower frequency is a pretty normal time series task. The
data you’re aggregating doesn’t need to be fixed frequently; the desired frequency
defines bin edges that are used to slice the time series into pieces to aggregate. For
example, to convert to monthly, 'M' or 'BM', you need to chop up the data into onemonth
intervals. Each interval is said to be half-open; a data point can only belong to
one interval, and the union of the intervals must make up the whole time frame.
There are a couple things to think about when using resample to downsample data:

- Which side of each interval is closed
- How to label each aggregated bin, either with the start of the interval or the end

In [16]:
rng = pd.date_range('2000-01-01', periods=12, freq='T')

In [17]:
rng

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 00:01:00',
               '2000-01-01 00:02:00', '2000-01-01 00:03:00',
               '2000-01-01 00:04:00', '2000-01-01 00:05:00',
               '2000-01-01 00:06:00', '2000-01-01 00:07:00',
               '2000-01-01 00:08:00', '2000-01-01 00:09:00',
               '2000-01-01 00:10:00', '2000-01-01 00:11:00'],
              dtype='datetime64[ns]', freq='T')

In [18]:
ts = pd.Series(np.arange(12), index=rng)

In [19]:
ts

2000-01-01 00:00:00     0
2000-01-01 00:01:00     1
2000-01-01 00:02:00     2
2000-01-01 00:03:00     3
2000-01-01 00:04:00     4
2000-01-01 00:05:00     5
2000-01-01 00:06:00     6
2000-01-01 00:07:00     7
2000-01-01 00:08:00     8
2000-01-01 00:09:00     9
2000-01-01 00:10:00    10
2000-01-01 00:11:00    11
Freq: T, dtype: int32

In [23]:
ts.resample('5min', closed='right').sum()

1999-12-31 23:55:00     0
2000-01-01 00:00:00    15
2000-01-01 00:05:00    40
2000-01-01 00:10:00    11
Freq: 5T, dtype: int32

closed argument defines at which side the interval inclusive, for example if left 00:00 value is included in the 00:00 to 00:05

In [26]:
ts.resample('5min', closed='left').count()

2000-01-01 00:00:00    5
2000-01-01 00:05:00    5
2000-01-01 00:10:00    2
Freq: 5T, dtype: int64

In [27]:
ts.resample('5min', closed='right', label='right').sum()

2000-01-01 00:00:00     0
2000-01-01 00:05:00    15
2000-01-01 00:10:00    40
2000-01-01 00:15:00    11
Freq: 5T, dtype: int32

In [28]:
ts.resample('5T', closed='right', label='right').sum()

2000-01-01 00:00:00     0
2000-01-01 00:05:00    15
2000-01-01 00:10:00    40
2000-01-01 00:15:00    11
Freq: 5T, dtype: int32

In [29]:
ts.resample('5T', closed='right', label='right', loffset='-1S').sum()
# shift the index by 1 second

1999-12-31 23:59:59     0
2000-01-01 00:04:59    15
2000-01-01 00:09:59    40
2000-01-01 00:14:59    11
Freq: 5T, dtype: int32

### Open-High-Low-Close (OHLC) resampling

In finance, a popular way to aggregate a time series is to compute four values for each
bucket: the first (open), last (close), maximum (high), and minimal (low) values. By
using the ohlc aggregate function you will obtain a DataFrame having columns containing
these four aggregates, which are efficiently computed in a single sweep of the
data:

In [30]:
ts.resample('5min').ohlc()

Unnamed: 0,open,high,low,close
2000-01-01 00:00:00,0,4,0,4
2000-01-01 00:05:00,5,9,5,9
2000-01-01 00:10:00,10,11,10,11


In [40]:
ts

2000-01-01 00:00:00     0
2000-01-01 00:01:00     1
2000-01-01 00:02:00     2
2000-01-01 00:03:00     3
2000-01-01 00:04:00     4
2000-01-01 00:05:00     5
2000-01-01 00:06:00     6
2000-01-01 00:07:00     7
2000-01-01 00:08:00     8
2000-01-01 00:09:00     9
2000-01-01 00:10:00    10
2000-01-01 00:11:00    11
Freq: T, dtype: int32

In [39]:
ts.resample('5T').asfreq()
# downsampling will lose several values in between boundaries

2000-01-01 00:00:00     0
2000-01-01 00:05:00     5
2000-01-01 00:10:00    10
Freq: 5T, dtype: int32

---

## Upsampling and Interpolation

When converting from a low frequency to a higher frequency, no aggregation is needed.

In [31]:
frame = pd.DataFrame(np.random.randn(2, 4),
                     index=pd.date_range('1/1/2000', periods=2,
                     freq='W-WED'),
                     columns=['Colorado', 'Texas', 'New York', 'Ohio'])

In [32]:
frame

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.2729,-0.37511,0.273864,1.024186
2000-01-12,0.295637,0.245288,1.357344,0.279202


In [33]:
df_daily = frame.resample('D').asfreq()

In [35]:
df_daily

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.2729,-0.37511,0.273864,1.024186
2000-01-06,,,,
2000-01-07,,,,
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,0.295637,0.245288,1.357344,0.279202


Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.2729,-0.37511,0.273864,1.024186
2000-01-06,,,,
2000-01-07,,,,
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,0.295637,0.245288,1.357344,0.279202


In [42]:
frame.resample('D').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.2729,-0.37511,0.273864,1.024186
2000-01-06,-0.2729,-0.37511,0.273864,1.024186
2000-01-07,-0.2729,-0.37511,0.273864,1.024186
2000-01-08,-0.2729,-0.37511,0.273864,1.024186
2000-01-09,-0.2729,-0.37511,0.273864,1.024186
2000-01-10,-0.2729,-0.37511,0.273864,1.024186
2000-01-11,-0.2729,-0.37511,0.273864,1.024186
2000-01-12,0.295637,0.245288,1.357344,0.279202


In [43]:
frame.resample('D').ffill(limit=2)

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.2729,-0.37511,0.273864,1.024186
2000-01-06,-0.2729,-0.37511,0.273864,1.024186
2000-01-07,-0.2729,-0.37511,0.273864,1.024186
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,0.295637,0.245288,1.357344,0.279202


In [44]:
frame.resample('W-THU').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-06,-0.2729,-0.37511,0.273864,1.024186
2000-01-13,0.295637,0.245288,1.357344,0.279202


---

## Resampling with Periods

In [45]:
frame = pd.DataFrame(np.random.randn(24, 4),
                     index=pd.period_range('1-2000', '12-2001',
                     freq='M'),
                     columns=['Colorado', 'Texas', 'New York', 'Ohio'])

In [47]:
frame[:5]

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01,-2.00489,-0.450965,0.006655,0.575437
2000-02,-0.952837,0.63481,-0.135817,-0.27008
2000-03,0.091294,-0.4357,-1.371228,-1.18039
2000-04,1.294715,0.041726,-0.684483,-0.335233
2000-05,0.437808,1.113076,-1.518807,1.501556


In [51]:
annual_frame = frame.resample('A-DEC').mean()

In [49]:
annual_frame

Unnamed: 0,Colorado,Texas,New York,Ohio
2000,-0.217688,0.240427,-0.419499,-0.126064
2001,0.303314,0.151659,0.177802,-0.124891


In [55]:
# Q-DECL Quarterly, year ending in december
annual_frame.resample('Q-DEC').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q1,-0.217688,0.240427,-0.419499,-0.126064
2000Q2,-0.217688,0.240427,-0.419499,-0.126064
2000Q3,-0.217688,0.240427,-0.419499,-0.126064
2000Q4,-0.217688,0.240427,-0.419499,-0.126064
2001Q1,0.303314,0.151659,0.177802,-0.124891
2001Q2,0.303314,0.151659,0.177802,-0.124891
2001Q3,0.303314,0.151659,0.177802,-0.124891
2001Q4,0.303314,0.151659,0.177802,-0.124891


In [56]:
annual_frame.resample('Q-DEC', convention='end').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q4,-0.217688,0.240427,-0.419499,-0.126064
2001Q1,-0.217688,0.240427,-0.419499,-0.126064
2001Q2,-0.217688,0.240427,-0.419499,-0.126064
2001Q3,-0.217688,0.240427,-0.419499,-0.126064
2001Q4,0.303314,0.151659,0.177802,-0.124891


Since periods refer to timespans, the rules about upsampling and downsampling are
more rigid:

• In downsampling, the target frequency must be a subperiod of the source
frequency.


• In upsampling, the target frequency must be a superperiod of the source
frequency.

In [57]:
annual_frame.resample('Q-MAR').ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q4,-0.217688,0.240427,-0.419499,-0.126064
2001Q1,-0.217688,0.240427,-0.419499,-0.126064
2001Q2,-0.217688,0.240427,-0.419499,-0.126064
2001Q3,-0.217688,0.240427,-0.419499,-0.126064
2001Q4,0.303314,0.151659,0.177802,-0.124891
2002Q1,0.303314,0.151659,0.177802,-0.124891
2002Q2,0.303314,0.151659,0.177802,-0.124891
2002Q3,0.303314,0.151659,0.177802,-0.124891
