In [1]:
import pandas as pd

# Window Operations

In [2]:
# Sample DataFrame
df = pd.DataFrame({
    "Day": [1, 2, 3, 4, 5, 6],
    "Sales": [100, 120, 90, 150, 170, 160]
})

df

Unnamed: 0,Day,Sales
0,1,100
1,2,120
2,3,90
3,4,150
4,5,170
5,6,160


## 1. What is a Window?

A window is a subset of consecutive rows used to compute a statistic.  

Two types  
1. Rolling - Fixed
2. Expanding - Grows

## 2. Rolling Window - `.rolling()`

Calculations happens like...  
row_0+1+2+...+win_size-1 / win_size  
in short - The function applyed on the above rows. As window size

In [5]:
df["Sales_3day_avg"] = df["Sales"].rolling(window=3).mean()
df

Unnamed: 0,Day,Sales,Sales_3day_avg
0,1,100,
1,2,120,
2,3,90,103.333333
3,4,150,120.0
4,5,170,136.666667
5,6,160,160.0


`.rolling(3)` = `.rolling(window=3)`

In [7]:
df["Sales_3day_sum"] = df["Sales"].rolling(3).sum()
df

Unnamed: 0,Day,Sales,Sales_3day_avg,Sales_3day_sum
0,1,100,,
1,2,120,,
2,3,90,103.333333,310.0
3,4,150,120.0,360.0
4,5,170,136.666667,410.0
5,6,160,160.0,480.0


In [16]:
df["Sales"].rolling(window=3, min_periods=1).sum()

0    100.0
1    220.0
2    310.0
3    360.0
4    410.0
5    480.0
Name: Sales, dtype: float64

### Note:
For window size 3 the first two rows stays NaN. To avoid that pass `min_period=1`. So the first two rows also get values. First row gert itself value. second gets first and second combine specified operation. Same workflow.

## 3. Common Rolling functions

`.mean()`  
`.sum()`  
`.max()`  
`.min()`  
`.std()`  

## 4. Expanding Window - `.expanding()`

It starts from 1 and as go down the window size increases.

In [19]:
df["Sales_cum_avg"] = df["Sales"].expanding().mean()
df

Unnamed: 0,Day,Sales,Sales_3day_avg,Sales_3day_sum,Sales_cum_avg
0,1,100,,,100.0
1,2,120,,,110.0
2,3,90,103.333333,310.0,103.333333
3,4,150,120.0,360.0,115.0
4,5,170,136.666667,410.0,126.0
5,6,160,160.0,480.0,131.666667


In [20]:
df["Sales_cum_sum"] = df["Sales"].expanding().sum()
df

Unnamed: 0,Day,Sales,Sales_3day_avg,Sales_3day_sum,Sales_cum_avg,Sales_cum_sum
0,1,100,,,100.0,100.0
1,2,120,,,110.0,220.0
2,3,90,103.333333,310.0,103.333333,310.0
3,4,150,120.0,360.0,115.0,460.0
4,5,170,136.666667,410.0,126.0,630.0
5,6,160,160.0,480.0,131.666667,790.0


## 4. Rolling vs Expanding (diff)

|Feature|Rolling|Expanding|
|-|-|-|
|Window Size|Fixed|Growing|
|Use case|Moving avg/sum/...|Cumulative metrics|
|Ex|last 7 days|From day 1|

## 6. Time-Series Rolling (IMP note)
If we have datetime index, we can use time-based windows also.

In [27]:
df_ts = df.copy()

In [29]:
df_ts["Date"] = pd.date_range("2023-01-01", periods=6)

df_ts = df_ts.set_index("Date")

In [39]:
# "3D" is the datetime based windows

# Both are acceptable
df_ts["Sales"].rolling(window="3d").mean()
# df_ts["Sales"].rolling(window="3D").mean()

Date
2023-01-01    100.000000
2023-01-02    110.000000
2023-01-03    103.333333
2023-01-04    120.000000
2023-01-05    136.666667
2023-01-06    160.000000
Name: Sales, dtype: float64

In [37]:
df_ts["Sales"].rolling(window=3, min_periods=0).mean()

Date
2023-01-01    100.000000
2023-01-02    110.000000
2023-01-03    103.333333
2023-01-04    120.000000
2023-01-05    136.666667
2023-01-06    160.000000
Name: Sales, dtype: float64

'S' - Seconds, 'T'/'min' - minutes, 'H' - Hours, 'D' - Days, 'W' - Weeks

In [50]:
# 3days = 72h 
df_ts["Sales"].rolling(window="72h").mean()

Date
2023-01-01    100.000000
2023-01-02    110.000000
2023-01-03    103.333333
2023-01-04    120.000000
2023-01-05    136.666667
2023-01-06    160.000000
Name: Sales, dtype: float64

String mean the parameter of `.rolling()` method. `window='...'`

|String|Meaning|
|-|-|
|'S'|Seconds|
|'T'/'min'|Minutes|
|'H'|Hours|
|'D'|Days|
|'W'|Weeks|

# Summary

1. Rolling window - Fixed window `.rolling(window=3, min_period=1)`  
    Parameters:
    1. `window=` - Size of window. `3`
    2. `min_period=` - Starts from where. `1`
2. Expanding Window - Growing window `.expanding()`
3. Time-Series Rolling  
    When we have Date as index we can use time-based window.  
    'S'-Sec, 'T'/'min'-Minute, 'H'-Hours, 'D'-Day, 'W'-Week.
    It is not case sensetive.