# Shifting (Leading and Lagging) Data

"Shifting" refers to moving data backward and forword through time. Both series and DataFreme have a *shift* method for doing naive shifts forward or backward, leaving the index unmodified:

In [1]:
import pandas as pd
import numpy as np
from pandas import DataFrame, Series
from datetime import datetime
from pandas.tseries.offsets import Hour, Minute

In [7]:
ts = Series(np.arange(4,8),
            index = pd.date_range('1/1/2000', periods = 4,
            freq = 'M'))

In [10]:
ts, ts.shift(1), ts.shift(-2)

(2000-01-31    4
 2000-02-29    5
 2000-03-31    6
 2000-04-30    7
 Freq: M, dtype: int32,
 2000-01-31    NaN
 2000-02-29    4.0
 2000-03-31    5.0
 2000-04-30    6.0
 Freq: M, dtype: float64,
 2000-01-31    6.0
 2000-02-29    7.0
 2000-03-31    NaN
 2000-04-30    NaN
 Freq: M, dtype: float64)

A common use of *shift* is computing percent changes in a time series or multiple time series as DataFreme columns. This is expressed as

In [11]:
ts/ ts.shift(1) -1

2000-01-31         NaN
2000-02-29    0.250000
2000-03-31    0.200000
2000-04-30    0.166667
Freq: M, dtype: float64

Because naive shifts leave the index unmodified, some data is discarded. Thus if the frequency is known, it can be passed to shift to advance the timestamps instead of simply the data:

In [14]:
ts.shift(6, freq = 'M')

2000-07-31    4
2000-08-31    5
2000-09-30    6
2000-10-31    7
Freq: M, dtype: int32

Other frequencies can be passed, too, giving you a lot of flexibility in how to lead and lag the data:

In [15]:
ts.shift(3, freq = 'D'), ts.shift(1, freq = '3D')

(2000-02-03    4
 2000-03-03    5
 2000-04-03    6
 2000-05-03    7
 dtype: int32,
 2000-02-03    4
 2000-03-03    5
 2000-04-03    6
 2000-05-03    7
 dtype: int32)

In [19]:
ts.shift(1, freq='90T')

2000-01-31 01:30:00    4
2000-02-29 01:30:00    5
2000-03-31 01:30:00    6
2000-04-30 01:30:00    7
dtype: int32

## Shifting dates with offsets

The pandas date offsets can also be used with *datetime or Timestamp* objects:

In [20]:
from pandas.tseries.offsets import Day, MonthEnd

In [25]:
now = datetime(2022,8,29)

In [26]:
now, now + 3 * Day()

(datetime.datetime(2022, 8, 29, 0, 0), Timestamp('2022-09-01 00:00:00'))

If you add an anchored offset like *MonthEnd*, the first increment will *roll forward* a date to the next date according to the frequency rule:

In [27]:
now + MonthEnd()

Timestamp('2022-08-31 00:00:00')

In [28]:
now + MonthEnd(2)

Timestamp('2022-09-30 00:00:00')

Anchored offsets can explicitly “roll” dates forward or backward using their *rollforward* and *rollback* methods, respectively:

In [30]:
offset = MonthEnd()

offset.rollforward(now), offset.rollback(now)

(Timestamp('2022-08-31 00:00:00'), Timestamp('2022-07-31 00:00:00'))

A clever use of date offsets is to use these methods with *groupby*:

In [34]:
ts = Series(np.arange(20, 40),
        index = pd.date_range('2020/1/1', periods = 20, freq = '4d'))

In [35]:
ts.groupby(offset.rollforward).mean()

2020-01-31    23.5
2020-02-29    31.0
2020-03-31    37.0
dtype: float64

Of course, an easier and faster way to do this is using resample (much more on this later):

In [39]:
ts.resample('M', axis= 0).sum()

2020-01-31    188
2020-02-29    217
2020-03-31    185
Freq: M, dtype: int32