# TimeSeries - Reindexing and Resampling

Using our stocks data again

In [3]:
import pandas as pd

df = pd.read_pickle("./dataset/stocks.pkl")
df

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
Name,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,2013-02-08,45.07,45.35,45.00,45.08,1824755
A,2013-02-11,45.17,45.18,44.45,44.60,2915405
A,2013-02-12,44.81,44.95,44.50,44.62,2373731
A,2013-02-13,44.81,45.24,44.68,44.75,2052338
A,2013-02-14,44.72,44.78,44.36,44.58,3826245
...,...,...,...,...,...,...
ZTS,2018-02-01,76.84,78.27,76.69,77.82,2982259
ZTS,2018-02-02,77.53,78.12,76.73,76.78,2595187
ZTS,2018-02-05,76.64,76.92,73.18,73.83,2962031
ZTS,2018-02-06,72.74,74.56,72.13,73.27,4924323


In our current data, we have business days only. What if we want the ability to look up any day in our date range and get the value? We can look back to the last trading day in this case.

To keep this simple, lets look at AAL again.

In [4]:
aal = df.xs("AAL")
aal

Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2013-02-08,15.07,15.12,14.63,14.75,8407500
2013-02-11,14.89,15.01,14.26,14.46,8882000
2013-02-12,14.45,14.51,14.10,14.27,8126000
2013-02-13,14.30,14.94,14.25,14.66,10259500
2013-02-14,14.94,14.96,13.16,13.99,31879900
...,...,...,...,...,...
2018-02-01,54.00,54.64,53.59,53.88,3623078
2018-02-02,53.49,53.99,52.03,52.10,5109361
2018-02-05,51.99,52.39,49.75,49.76,6878284
2018-02-06,49.32,51.50,48.79,51.18,6782480


In [5]:
start, end = aal.index.min(), aal.index.max()
print(start, end)

2013-02-08 00:00:00 2018-02-07 00:00:00


In [6]:
new_index = pd.date_range(start, end)
new_index

DatetimeIndex(['2013-02-08', '2013-02-09', '2013-02-10', '2013-02-11',
               '2013-02-12', '2013-02-13', '2013-02-14', '2013-02-15',
               '2013-02-16', '2013-02-17',
               ...
               '2018-01-29', '2018-01-30', '2018-01-31', '2018-02-01',
               '2018-02-02', '2018-02-03', '2018-02-04', '2018-02-05',
               '2018-02-06', '2018-02-07'],
              dtype='datetime64[ns]', length=1826, freq='D')

In [7]:
aal2 = aal.reindex(new_index, method="ffill")
aal2

Unnamed: 0,open,high,low,close,volume
2013-02-08,15.07,15.12,14.63,14.75,8407500
2013-02-09,15.07,15.12,14.63,14.75,8407500
2013-02-10,15.07,15.12,14.63,14.75,8407500
2013-02-11,14.89,15.01,14.26,14.46,8882000
2013-02-12,14.45,14.51,14.10,14.27,8126000
...,...,...,...,...,...
2018-02-03,53.49,53.99,52.03,52.10,5109361
2018-02-04,53.49,53.99,52.03,52.10,5109361
2018-02-05,51.99,52.39,49.75,49.76,6878284
2018-02-06,49.32,51.50,48.79,51.18,6782480


Great, now every date has a time. But what if we wanted to do this to all securities? And lets only fill within two days to cater to a weekend, but not to something more drastic!

If we tried to use the level feature of the `reindex` function, we cannot use `method="ffill"`, watch:

In [13]:
start, end = df.index.levels[1].min(), df.index.levels[1].max()
print(start, end)

2013-02-08 00:00:00 2018-02-07 00:00:00


In [9]:
date_range = pd.date_range(start, end)

In [10]:
# This doesnt work
df.reindex(index=date_range, level=1)

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
Name,date,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,2013-02-08,45.07,45.35,45.00,45.08,1824755
A,2013-02-11,45.17,45.18,44.45,44.60,2915405
A,2013-02-12,44.81,44.95,44.50,44.62,2373731
A,2013-02-13,44.81,45.24,44.68,44.75,2052338
A,2013-02-14,44.72,44.78,44.36,44.58,3826245
...,...,...,...,...,...,...
ZTS,2018-02-01,76.84,78.27,76.69,77.82,2982259
ZTS,2018-02-02,77.53,78.12,76.73,76.78,2595187
ZTS,2018-02-05,76.64,76.92,73.18,73.83,2962031
ZTS,2018-02-06,72.74,74.56,72.13,73.27,4924323


In [11]:
# This doesnt work
# df.reindex(index=date_range, level=1, method="ffill")

# Reindexing on levels is hards
# https://github.com/pandas-dev/pandas/issues/12319

In [12]:
new_index = pd.MultiIndex.from_product([df.index.levels[0], date_range])
df.reindex(index=new_index)

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,2013-02-08,45.07,45.35,45.00,45.08,1824755.0
A,2013-02-09,,,,,
A,2013-02-10,,,,,
A,2013-02-11,45.17,45.18,44.45,44.60,2915405.0
A,2013-02-12,44.81,44.95,44.50,44.62,2373731.0
...,...,...,...,...,...,...
ZTS,2018-02-03,,,,,
ZTS,2018-02-04,,,,,
ZTS,2018-02-05,76.64,76.92,73.18,73.83,2962031.0
ZTS,2018-02-06,72.74,74.56,72.13,73.27,4924323.0


In [13]:
new_index = pd.MultiIndex.from_product([df.index.levels[0], date_range])
df.reindex(index=new_index, method="ffill")

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,2013-02-08,45.07,45.35,45.00,45.08,1824755
A,2013-02-09,45.07,45.35,45.00,45.08,1824755
A,2013-02-10,45.07,45.35,45.00,45.08,1824755
A,2013-02-11,45.17,45.18,44.45,44.60,2915405
A,2013-02-12,44.81,44.95,44.50,44.62,2373731
...,...,...,...,...,...,...
ZTS,2018-02-03,77.53,78.12,76.73,76.78,2595187
ZTS,2018-02-04,77.53,78.12,76.73,76.78,2595187
ZTS,2018-02-05,76.64,76.92,73.18,73.83,2962031
ZTS,2018-02-06,72.74,74.56,72.13,73.27,4924323


In [14]:
# we used unstack on our dataframe because reindex can't alter the Name columns
filled = df.unstack("Name").reindex(date_range, method="ffill", tolerance=pd.Timedelta('2 days'))
# after reindex we are going to add Name back as columns.
filled = filled.stack("Name").swaplevel().sort_index()
filled

Unnamed: 0_level_0,Unnamed: 1_level_0,open,high,low,close,volume
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A,2013-02-08,45.07,45.35,45.00,45.08,1824755.0
A,2013-02-09,45.07,45.35,45.00,45.08,1824755.0
A,2013-02-10,45.07,45.35,45.00,45.08,1824755.0
A,2013-02-11,45.17,45.18,44.45,44.60,2915405.0
A,2013-02-12,44.81,44.95,44.50,44.62,2373731.0
...,...,...,...,...,...,...
ZTS,2018-02-03,77.53,78.12,76.73,76.78,2595187.0
ZTS,2018-02-04,77.53,78.12,76.73,76.78,2595187.0
ZTS,2018-02-05,76.64,76.92,73.18,73.83,2962031.0
ZTS,2018-02-06,72.74,74.56,72.13,73.27,4924323.0


So that was *considerably* more work than expected, but hopefully each step makes sense!

Other methods are:

* `ffill`: Fill it forward.
* `bfill`: Fill it backwards.
* `nearest`: Yup, pick the nearest value
* `none`: Just NaN em.

All of these operations will also fail on an index which isnt sorted. Always sort your indexes when you set them.

### Recap
* date_range
* reindex