# Pandas resample

This is a notebook for the medium article [Pandas resample() tricks you should know for manipulating time-series data](https://bindichen.medium.com/pandas-resample-tricks-you-should-know-for-manipulating-time-series-data-7e9643a7e7f3)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [1]:
import pandas as pd 
import numpy as np

## 1. Downsampling and performing aggregation

In [2]:
df_sales = pd.read_csv(
    'data/sales_data.csv', 
    parse_dates=['date'], 
    index_col=['date']
)

df_sales

Unnamed: 0_level_0,num_sold
date,Unnamed: 1_level_1
2017-01-02 09:02:03,5
2017-01-02 09:14:13,7
2017-01-02 09:21:00,5
2017-01-02 09:28:57,9
2017-01-02 09:42:14,1
...,...
2017-01-02 22:46:36,5
2017-01-02 22:48:08,5
2017-01-02 22:52:19,2
2017-01-02 23:02:25,2


In [3]:
## Downsampling to 2 hour
df_sales.resample('2H').sum()

Unnamed: 0_level_0,num_sold
date,Unnamed: 1_level_1
2017-01-02 08:00:00,37
2017-01-02 10:00:00,66
2017-01-02 12:00:00,81
2017-01-02 14:00:00,50
2017-01-02 16:00:00,64
2017-01-02 18:00:00,66
2017-01-02 20:00:00,44
2017-01-02 22:00:00,45


In [4]:
## Performing multiple aggregations
df_sales.resample('2H').agg(['min','max', 'sum'])

Unnamed: 0_level_0,num_sold,num_sold,num_sold
Unnamed: 0_level_1,min,max,sum
date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
2017-01-02 08:00:00,1,9,37
2017-01-02 10:00:00,1,9,66
2017-01-02 12:00:00,1,9,81
2017-01-02 14:00:00,1,9,50
2017-01-02 16:00:00,1,8,64
2017-01-02 18:00:00,1,9,66
2017-01-02 20:00:00,1,9,44
2017-01-02 22:00:00,2,6,45


## 2. Downsampling with a custom base

In [5]:
df_sales.resample('2H', base=1).sum()

Unnamed: 0_level_0,num_sold
date,Unnamed: 1_level_1
2017-01-02 09:00:00,62
2017-01-02 11:00:00,77
2017-01-02 13:00:00,64
2017-01-02 15:00:00,55
2017-01-02 17:00:00,72
2017-01-02 19:00:00,48
2017-01-02 21:00:00,70
2017-01-02 23:00:00,5


## 3. Upsampling and filling values

In [6]:
# make up a dataset
df = pd.DataFrame(
    { 'value': [1, 2, 3] }, 
    index=pd.period_range(
        '2012-01-01',
         freq='A',
         periods=3
    )
)
df

Unnamed: 0,value
2012,1
2013,2
2014,3


In [7]:
# df.resample('Q').asfreq()
df.resample('Q').ffill()

Unnamed: 0,value
2012Q1,1
2012Q2,1
2012Q3,1
2012Q4,1
2013Q1,2
2013Q2,2
2013Q3,2
2013Q4,2
2014Q1,3
2014Q2,3


In [8]:
# df.resample('Q').asfreq()
df.resample('Q').bfill()

Unnamed: 0,value
2012Q1,1.0
2012Q2,2.0
2012Q3,2.0
2012Q4,2.0
2013Q1,2.0
2013Q2,3.0
2013Q3,3.0
2013Q4,3.0
2014Q1,3.0
2014Q2,


## 4. A practical example: filling values

In [9]:
# load sales
df_sales = pd.read_csv('data/sales.csv', parse_dates=['date'], index_col=['date'])
df_sales

Unnamed: 0_level_0,num_sold
date,Unnamed: 1_level_1
2018-01-31,5
2018-02-28,17
2018-03-31,5
2018-04-30,16
2018-05-31,12
2018-06-30,12
2018-07-31,2
2018-08-31,9
2018-09-30,5
2018-10-31,15


In [10]:
# load price
df_price = pd.read_csv('data/price.csv', parse_dates=['date'], index_col=['date'])
df_price

Unnamed: 0_level_0,price
date,Unnamed: 1_level_1
2018-01-31,16.0
2018-05-31,15.5
2018-12-31,10.0


In [12]:
df_price = df_price.resample('M').ffill()
df_price

Unnamed: 0_level_0,price
date,Unnamed: 1_level_1
2018-01-31,16.0
2018-02-28,16.0
2018-03-31,16.0
2018-04-30,16.0
2018-05-31,15.5
2018-06-30,15.5
2018-07-31,15.5
2018-08-31,15.5
2018-09-30,15.5
2018-10-31,15.5


In [13]:
df = pd.concat([df_sales, df_price], axis = 1)
df['total_sales'] = df['num_sold'] * df['price']
df

Unnamed: 0_level_0,num_sold,price,total_sales
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-01-31,5,16.0,80.0
2018-02-28,17,16.0,272.0
2018-03-31,5,16.0,80.0
2018-04-30,16,16.0,256.0
2018-05-31,12,15.5,186.0
2018-06-30,12,15.5,186.0
2018-07-31,2,15.5,31.0
2018-08-31,9,15.5,139.5
2018-09-30,5,15.5,77.5
2018-10-31,15,15.5,232.5
