# Pandas shift()

This is a notebook for the medium article [All the Pandas shift() you should know for data analysis](https://bindichen.medium.com/all-the-pandas-shift-you-should-know-for-data-analysis-791c1692b5e)

Please check out article for instructions

**License**: [BSD 2-Clause](https://opensource.org/licenses/BSD-2-Clause)

In [1]:
import pandas as pd 
import numpy as np

## 1. Shifting with `period`

In [2]:
df = pd.DataFrame({
    "A": [1, 2, 3, 4, 5],
    "B": [10, 20, 30, 40, 50]
})
df

Unnamed: 0,A,B
0,1,10
1,2,20
2,3,30
3,4,40
4,5,50


In [3]:
df.shift(1)

Unnamed: 0,A,B
0,,
1,1.0,10.0
2,2.0,20.0
3,3.0,30.0
4,4.0,40.0


In [4]:
df.shift(-1)

Unnamed: 0,A,B
0,2.0,20.0
1,3.0,30.0
2,4.0,40.0
3,5.0,50.0
4,,


In [5]:
# replacing NaN
df.shift(1, fill_value=0)

Unnamed: 0,A,B
0,0,0
1,1,10
2,2,20
3,3,30
4,4,40


In [6]:
# Shifting horizontally
df.shift(1, axis=1)

Unnamed: 0,A,B
0,,1.0
1,,2.0
2,,3.0
3,,4.0
4,,5.0


In [7]:
# Shifting horizontally with nagative number
df.shift(-1, axis=1)

Unnamed: 0,A,B
0,10.0,
1,20.0,
2,30.0,
3,40.0,
4,50.0,


## 2. Shifting time-series data with `freq`

In [8]:
df = pd.DataFrame({
        "A": [1, 2, 3, 4, 5],
        "B": [10, 20, 30, 40, 50]
    },  
    index=pd.date_range("2020-01-01", freq='D', periods=5)
)
df

Unnamed: 0,A,B
2020-01-01,1,10
2020-01-02,2,20
2020-01-03,3,30
2020-01-04,4,40
2020-01-05,5,50


In [9]:
# This is what happen when call it with periods
df.shift(10)

Unnamed: 0,A,B
2020-01-01,,
2020-01-02,,
2020-01-03,,
2020-01-04,,
2020-01-05,,


In [10]:
# with freq
df.shift(freq='10D')

Unnamed: 0,A,B
2020-01-11,1,10
2020-01-12,2,20
2020-01-13,3,30
2020-01-14,4,40
2020-01-15,5,50


In [11]:
# You can do this instead
df.shift(periods=2, freq='D')

Unnamed: 0,A,B
2020-01-03,1,10
2020-01-04,2,20
2020-01-05,3,30
2020-01-06,4,40
2020-01-07,5,50


In [53]:
# Axis doesn't do anything
df.shift(freq='2D', axis=1)

Unnamed: 0,A,B
2020-01-03,1,10
2020-01-04,2,20
2020-01-05,3,30
2020-01-06,4,40
2020-01-07,5,50


## 3. A practical example: calculating the difference in consecutive rows

In [12]:
df = pd.DataFrame({
    "date": pd.date_range("2020-01-01", freq='D', periods=5),
    "sales": [22, 30, 32, 25, 42]
})
df

Unnamed: 0,date,sales
0,2020-01-01,22
1,2020-01-02,30
2,2020-01-03,32
3,2020-01-04,25
4,2020-01-05,42


In [13]:
df.shift(1)

Unnamed: 0,date,sales
0,NaT,
1,2020-01-01,22.0
2,2020-01-02,30.0
3,2020-01-03,32.0
4,2020-01-04,25.0


In [14]:
df['diff'] = df['sales'] - df.shift(1)['sales']
df

Unnamed: 0,date,sales,diff
0,2020-01-01,22,
1,2020-01-02,30,8.0
2,2020-01-03,32,2.0
3,2020-01-04,25,-7.0
4,2020-01-05,42,17.0


## 4. A practical example: calculating the 7 days difference for time-series data

In [15]:
df = pd.read_csv(
    'data/time_series.csv', 
    parse_dates=['date'], 
    index_col=['date'],
)

In [16]:
# Notice that the record for “2020–01–08” is missing in the DataFrame.
df

Unnamed: 0_level_0,sales
date,Unnamed: 1_level_1
2020-01-01,22
2020-01-02,30
2020-01-03,32
2020-01-04,25
2020-01-05,42
2020-01-06,20
2020-01-07,45
2020-01-09,43
2020-01-10,27


In [18]:
# There is a record for “2020–01–08”
df.shift(freq='7D')

Unnamed: 0_level_0,sales
date,Unnamed: 1_level_1
2020-01-08,22
2020-01-09,30
2020-01-10,32
2020-01-11,25
2020-01-12,42
2020-01-13,20
2020-01-14,45
2020-01-16,43
2020-01-17,27


In [19]:
the_7_days_diff = df['sales'] - df.shift(freq='7D')['sales']
the_7_days_diff.to_frame()

Unnamed: 0_level_0,sales
date,Unnamed: 1_level_1
2020-01-01,
2020-01-02,
2020-01-03,
2020-01-04,
2020-01-05,
2020-01-06,
2020-01-07,
2020-01-08,
2020-01-09,13.0
2020-01-10,-5.0
