# Windowing Operations

Pandas can perform windowing operations, which are operations that works similar to `group_by()`.

1. Split the rows in partitions (windows), which can vary depending of the type of window.
2. Performs an aggregation over each window, returning one scalar per window.

The windowing operation can be applied to `Series` or `DataFrames`, and their types are the following:

1. Rolling window (most important)
2. Weighted window
3. Expanding window
4. Exponentially weighted window

There are also some properties that applies to all or some types of windows.

**NOTE:** What I think is essentially important to understand is the general/specific properties, rolling window, and expanding window.

In [100]:
import pandas as pd
import numpy as np

np.random.seed(0)

In [101]:
## Handy functions
from IPython.display import display_html, display, HTML

def display_side_by_side(*args):
    html_str=''
    for df in args:
        html_str+=df.to_html()
    display_html(html_str.replace('table','table style="display:inline"'),raw=True)

def display_several(*args):
    for df in args:
        display(df)

def display_windowed(windowed):
    table_title_html = '<div style="display:inline-block; vertical-align:top; width:15%; margin:1px;"><h5>window {0} (type: {1})</h5>{2}</div>'

    html_str=''
    for i, window in enumerate(windowed):
        if isinstance(window, pd.Series):
            window = window.to_frame()
            html_str+=table_title_html.format(i, "s",window.to_html())
        else:
            html_str+=table_title_html.format(i, "df",window.to_html())
        
    display_html(html_str,raw=True)


## General/Specific Properties

There are some functionalities that you can apply to all (general) or some (specific) types of windows.

**General Properties**
1. It is possible to iterate over windows using a simple `for` on the returned object after the split.
2. All windowing operations support a `min_periods` argument. `min_periods` indicates the minimum number of non-nan values a window must contain in order to return a result, otherwise, return `nan`.
    - default = 1 for time-based windows
    - default = window size for fixed windows
3. All windowing operations supports the `aggregate()` (or `agg()`) method for returning a result of multiple aggregations applied to a window

**Specific Properties**

| Concept                        | Method                      | Supports time-based windows | Supports chained groupby | Supports table method     | Supports online operations |
|-------------------------------|-----------------------------|----------------------------|--------------------------|---------------------------|----------------------------|
| Rolling window                | rolling                     | Yes                        | Yes                      | Yes                       | No                         |
| Weighted window               | rolling                     | No                         | No                       | No                        | No                         |
| Expanding window              | expanding                   | No                         | Yes                      | Yes                       | No                         |
| Exponentially Weighted window | ewm                         | No                         | Yes                      | No                        | Yes                       |


1. **Time-based windows** refer to methods that use `offsets` (e.g: `"2D"` which means 2 days) to spliting the data based on dates indexes.
2. **Chained groupby** refers to perform a chained operation that consists in a `groupby` operation followed by a window operation. In other words, from the `Groupby` object use a window operation directly. e.g.: `df.groupby('A').expanding().sum()`
3. **Table method** refers to perform window operation over the entire `DataFrame` using an engine numba. This increases the performance, but for learning purposes it will not be relevant.
4. **Online Operations** refers to data processed incrementally as it arrives in a streaming fashion, rather than being stored in memory as a complete dataset. This approach is useful when dealing with large or continuous data streams where storing the entire dataset is not feasible or necessary.


**NOTE:** For learning purposes, it is only useful to understand `agg()`, `min_periods`, iteration over windows, chained `groupby`, and time-based windows (which are explained later in the `rolling()` section).

**NOTE:** Windowing operations currently only support numeric data (integer and float) and will always return `float64` values.

In [102]:
df = pd.DataFrame(
    { "A": range(6),
      "B" : [np.nan, 1, 2, np.nan, np.nan, 3]
     }, 
     index=pd.date_range('2020-01-01', periods=6, freq='1D')
     )
df

Unnamed: 0,A,B
2020-01-01,0,
2020-01-02,1,1.0
2020-01-03,2,2.0
2020-01-04,3,
2020-01-05,4,
2020-01-06,5,3.0


In [103]:
# GENERAL 1. Iterate over windows
for window in df.rolling(window = 3):
    display(window)

Unnamed: 0,A,B
2020-01-01,0,


Unnamed: 0,A,B
2020-01-01,0,
2020-01-02,1,1.0


Unnamed: 0,A,B
2020-01-01,0,
2020-01-02,1,1.0
2020-01-03,2,2.0


Unnamed: 0,A,B
2020-01-02,1,1.0
2020-01-03,2,2.0
2020-01-04,3,


Unnamed: 0,A,B
2020-01-03,2,2.0
2020-01-04,3,
2020-01-05,4,


Unnamed: 0,A,B
2020-01-04,3,
2020-01-05,4,
2020-01-06,5,3.0


In [104]:
# NOTE: I created a handy function to display inline
display_windowed(df.rolling(window = 3))

Unnamed: 0,A,B
2020-01-01,0,

Unnamed: 0,A,B
2020-01-01,0,
2020-01-02,1,1.0

Unnamed: 0,A,B
2020-01-01,0,
2020-01-02,1,1.0
2020-01-03,2,2.0

Unnamed: 0,A,B
2020-01-02,1,1.0
2020-01-03,2,2.0
2020-01-04,3,

Unnamed: 0,A,B
2020-01-03,2,2.0
2020-01-04,3,
2020-01-05,4,

Unnamed: 0,A,B
2020-01-04,3,
2020-01-05,4,
2020-01-06,5,3.0


In [None]:
# GENERAL 2. using min_periods (FOCUS on column B)
# NOTE: all the previous windows have at least 1 non-nan value except 
# by the first one. Then, all return a result except the first one in the column B
df.rolling(window = 3 , min_periods= 1).sum()


Unnamed: 0,A,B
2020-01-01,0.0,
2020-01-02,1.0,1.0
2020-01-03,3.0,3.0
2020-01-04,6.0,3.0
2020-01-05,9.0,2.0
2020-01-06,12.0,3.0


In [None]:
# NOTE: From previous windows, in column B, the windows 0, 1, 4, 5 have less than 2 non-nan values.
# Then, the result will be a NaN value for those windows.
df.rolling(window = 3 , min_periods= 2).sum()


Unnamed: 0,A,B
2020-01-01,,
2020-01-02,1.0,
2020-01-03,3.0,3.0
2020-01-04,6.0,3.0
2020-01-05,9.0,
2020-01-06,12.0,


In [None]:
#NOTE: all the windows have less than 3 non-nan values in column B. Then, all
# will return NaN in the result
df.rolling(window = 3 , min_periods= 3).sum()

Unnamed: 0,A,B
2020-01-01,,
2020-01-02,,
2020-01-03,3.0,
2020-01-04,6.0,
2020-01-05,9.0,
2020-01-06,12.0,


In [None]:
#NOTE: For fixed-window size the default min_periods is the windows size 
# (in this case 3), similar to the above example.
df.rolling(window = 3).sum()

Unnamed: 0,A,B
2020-01-01,,
2020-01-02,,
2020-01-03,3.0,
2020-01-04,6.0,
2020-01-05,9.0,
2020-01-06,12.0,


In [None]:
#NOTE: For time-based windows, the default min_periods is 1, similar to our 
#first example.
df.rolling(window='3D').sum()


Unnamed: 0,A,B
2020-01-01,0.0,
2020-01-02,1.0,1.0
2020-01-03,3.0,3.0
2020-01-04,6.0,3.0
2020-01-05,9.0,2.0
2020-01-06,12.0,3.0


In [106]:
# GENERAL 3. Support aggregate method to perform multiple aggregation at once
df.rolling(window = 3, min_periods=1).agg([np.sum, np.mean, np.std])

Unnamed: 0_level_0,A,A,A,B,B,B
Unnamed: 0_level_1,sum,mean,std,sum,mean,std
2020-01-01,0.0,0.0,,,,
2020-01-02,1.0,0.5,0.707107,1.0,1.0,
2020-01-03,3.0,1.0,1.0,3.0,1.5,0.707107
2020-01-04,6.0,2.0,1.0,3.0,1.5,0.707107
2020-01-05,9.0,3.0,1.0,2.0,2.0,
2020-01-06,12.0,4.0,1.0,3.0,3.0,


In [110]:
#SPECIFIC 2: chained group by
df["letter"] = ["C","D","D","D","C","D"]

result = df.groupby("letter").rolling(window=2).sum()
display_side_by_side(df, result)

Unnamed: 0,A,B,letter
2020-01-01,0,,C
2020-01-02,1,1.0,D
2020-01-03,2,2.0,D
2020-01-04,3,,D
2020-01-05,4,,C
2020-01-06,5,3.0,D

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
letter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
C,2020-01-01,,
C,2020-01-05,4.0,
D,2020-01-02,,
D,2020-01-03,3.0,3.0
D,2020-01-04,5.0,
D,2020-01-06,8.0,


## Rolling Window

A rolling window, also known as a moving window, can support three types of windows using the function 

`rolling(window, center, closed, step)`

1. fixed windows `window = <integer>`
2. time-based windows based on an offset `window = <time-based offset>`, which create variable size windows. It requires a monotonic time based index.
3. custom windows `window = <custom_indexer>` (optional check last section).

**NOTE:** By default, a window `i` is created using the `i`-th row and the previous rows (if there is any) until complete the window. It will explained in more detail later in the `center` parameter section.

Although, there are those three well-defined windows, it is possible to alter the size of windows using some parameters. The parameters that I consider important are:

1. `center` controls the approach to create a window.
2. `closed` controls the inclusion of window endpoints.

**NOTE:** You can also use `step` to control the jump between windows, similar to slicing `[::step]`.

For aggregation, we can use built-in functions (as `.mean()`) or user defined functions. Here, I consider important to understand:

1. `.apply()` for user defined function UDF
2. `.cov()` or `.corr()` for binary calculations.

**NOTE:** we used a handy function for displaying the windows

In [73]:
times = ['2020-01-01', '2020-01-03', '2020-01-04', '2020-01-05', '2020-01-29']

df = pd.DataFrame(
    { "A": range(5),
      "B" : np.random.randint(10, size = 5)
     }, 
     index=pd.DatetimeIndex(times)
     )
df

Unnamed: 0,A,B
2020-01-01,0,5
2020-01-03,1,0
2020-01-04,2,3
2020-01-05,3,3
2020-01-29,4,7


In [74]:
# 1. fixed window using integer number
# NOTE: the first windows have less size than 3, it is because how the windows  
# are created. It will be explained later, when cover the parameter center
windowed = df.rolling(window=3)
display_windowed(windowed)

Unnamed: 0,A,B
2020-01-01,0,5

Unnamed: 0,A,B
2020-01-01,0,5
2020-01-03,1,0

Unnamed: 0,A,B
2020-01-01,0,5
2020-01-03,1,0
2020-01-04,2,3

Unnamed: 0,A,B
2020-01-03,1,0
2020-01-04,2,3
2020-01-05,3,3

Unnamed: 0,A,B
2020-01-04,2,3
2020-01-05,3,3
2020-01-29,4,7


In [75]:
# 2. time-based window using an offset
# NOTE: it requires a time-based index to split the data in intervals of 
# 3 days (3D), generating variable size windows
windowed = df.rolling(window="3D")
display_windowed(windowed)

Unnamed: 0,A,B
2020-01-01,0,5

Unnamed: 0,A,B
2020-01-01,0,5
2020-01-03,1,0

Unnamed: 0,A,B
2020-01-03,1,0
2020-01-04,2,3

Unnamed: 0,A,B
2020-01-03,1,0
2020-01-04,2,3
2020-01-05,3,3

Unnamed: 0,A,B
2020-01-29,4,7


In [76]:
# step 2 and 3

windowed = df.rolling(window=3, step=1) #default
display_windowed(windowed)

windowed = df.rolling(window=3, step=2)
display_windowed(windowed)

windowed = df.rolling(window=3, step=3)
display_windowed(windowed)

Unnamed: 0,A,B
2020-01-01,0,5

Unnamed: 0,A,B
2020-01-01,0,5
2020-01-03,1,0

Unnamed: 0,A,B
2020-01-01,0,5
2020-01-03,1,0
2020-01-04,2,3

Unnamed: 0,A,B
2020-01-03,1,0
2020-01-04,2,3
2020-01-05,3,3

Unnamed: 0,A,B
2020-01-04,2,3
2020-01-05,3,3
2020-01-29,4,7


Unnamed: 0,A,B
2020-01-01,0,5

Unnamed: 0,A,B
2020-01-01,0,5
2020-01-03,1,0
2020-01-04,2,3

Unnamed: 0,A,B
2020-01-04,2,3
2020-01-05,3,3
2020-01-29,4,7


Unnamed: 0,A,B
2020-01-01,0,5

Unnamed: 0,A,B
2020-01-03,1,0
2020-01-04,2,3
2020-01-05,3,3


### `center` parameter

The `center` parameter allows us to modify the way windows are created.

1. By default, windows are created starting from the `i`-th row and including the previous rows until the window is complete. Therefore, we say that the window is ***aligned to the right***.
2. When `center = True`, the `i`-th row is placed in the center of the window, which is then completed with the preceding and subsequent rows (in that order of priority). Therefore, we say that the window is ***center aligned***.

The following picture provides a graphical representation of this procedure:

<img src="./assets/imgs/center_parameter.jpg" width="400"/>

**NOTE:** The default behavior (right-aligment) provokes that the first windows contain less rows than the specified size in the fixed window (see code below). It is because the windows are always created with the previous rows.

**NOTE**: There is an alternative method for creating an indexer that takes the `i`-th row and the subsequent rows until the window is complete (using `FixedForwardWindowIndexer`). However, this tutorial does not cover that approach.


In [77]:
df = pd.DataFrame(
    {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
)
df

Unnamed: 0,A
2020-01-01,0
2020-01-02,1
2020-01-03,2
2020-01-04,3
2020-01-05,4


In [78]:
# 1. default righ-aligned rolling
# NOTE: the i-th rows in each window is always at the bottom
windowed = df.rolling(window=3)
display_windowed(windowed)

Unnamed: 0,A
2020-01-01,0

Unnamed: 0,A
2020-01-01,0
2020-01-02,1

Unnamed: 0,A
2020-01-01,0
2020-01-02,1
2020-01-03,2

Unnamed: 0,A
2020-01-02,1
2020-01-03,2
2020-01-04,3

Unnamed: 0,A
2020-01-03,2
2020-01-04,3
2020-01-05,4


In [79]:
# 2. center = True (center aligned rolling)
# NOTE: the i-th rows in each window is in the center and completed 
# first by preceding and next rows (in that order of priority)
windowed = df.rolling(window=3, center=True)
display_windowed(windowed)

Unnamed: 0,A
2020-01-01,0
2020-01-02,1

Unnamed: 0,A
2020-01-01,0
2020-01-02,1
2020-01-03,2

Unnamed: 0,A
2020-01-02,1
2020-01-03,2
2020-01-04,3

Unnamed: 0,A
2020-01-03,2
2020-01-04,3
2020-01-05,4

Unnamed: 0,A
2020-01-04,3
2020-01-05,4


In [80]:
# NOTE: center also work with time-based windows
windowed = df.rolling(window="3D", center=False)
display_windowed(windowed)
windowed = df.rolling(window="3D", center=True)
display_windowed(windowed)

Unnamed: 0,A
2020-01-01,0

Unnamed: 0,A
2020-01-01,0
2020-01-02,1

Unnamed: 0,A
2020-01-01,0
2020-01-02,1
2020-01-03,2

Unnamed: 0,A
2020-01-02,1
2020-01-03,2
2020-01-04,3

Unnamed: 0,A
2020-01-03,2
2020-01-04,3
2020-01-05,4


Unnamed: 0,A
2020-01-01,0
2020-01-02,1

Unnamed: 0,A
2020-01-01,0
2020-01-02,1
2020-01-03,2

Unnamed: 0,A
2020-01-02,1
2020-01-03,2
2020-01-04,3

Unnamed: 0,A
2020-01-03,2
2020-01-04,3
2020-01-05,4

Unnamed: 0,A
2020-01-04,3
2020-01-05,4


### Window endpoints and `closed` parameter

The parameter `closed` allow us to include or exclude the endpoints for our windows.

- `closed = right` includes right, but excludes left endpoint. (default)
- `closed = left` includes left, but excludes right endpoint.
- `closed = both` includes both left and right.
- `closed = neither` excludes both left and right.

The following picture indicates us the endpoints of a window (with fixed size 3 `window = 3`) and the effect of the `closed` parameter. It is important to remember that by default the `right` endpoint is included. In other words, `closed = right` is the default behavior.

<img src="./assets/imgs/window_endpoint.jpg" width="500"/>

**NOTE:** although the window size will be fixed, `both` and `neither` can change the size of the windows, without taking into account the fixed size 3. For example, in the image above `both` return a window of size 4 and `neither` a window of size 2.

**NOTE:** using time-based window the behavior is the same, but it is important to remember that the window size is variable and could return larger or smaller windows depending of the amount of data in each interval of time.



In [81]:
df = pd.DataFrame({'A': range(6)})
df

Unnamed: 0,A
0,0
1,1
2,2
3,3
4,4
5,5


In [82]:
# 1. default closed='right'
# NOTE: the example described in the picture are focus on the window 3
rolling_window = df.rolling(window=3, closed='right')
display_windowed(rolling_window)

Unnamed: 0,A
0,0

Unnamed: 0,A
0,0
1,1

Unnamed: 0,A
0,0
1,1
2,2

Unnamed: 0,A
1,1
2,2
3,3

Unnamed: 0,A
2,2
3,3
4,4

Unnamed: 0,A
3,3
4,4
5,5


In [83]:
# 2. default closed='left'
rolling_window = df.rolling(window=3, closed='left')
display_windowed(rolling_window)

Unnamed: 0,A

Unnamed: 0,A
0,0

Unnamed: 0,A
0,0
1,1

Unnamed: 0,A
0,0
1,1
2,2

Unnamed: 0,A
1,1
2,2
3,3

Unnamed: 0,A
2,2
3,3
4,4


In [84]:
# 3. default closed='both'
rolling_window = df.rolling(window=3, closed='both')
display_windowed(rolling_window)

Unnamed: 0,A
0,0

Unnamed: 0,A
0,0
1,1

Unnamed: 0,A
0,0
1,1
2,2

Unnamed: 0,A
0,0
1,1
2,2
3,3

Unnamed: 0,A
1,1
2,2
3,3
4,4

Unnamed: 0,A
2,2
3,3
4,4
5,5


In [85]:
# 4. default closed='neither'
rolling_window = df.rolling(window=3, closed='neither')
display_windowed(rolling_window)

Unnamed: 0,A

Unnamed: 0,A
0,0

Unnamed: 0,A
0,0
1,1

Unnamed: 0,A
1,1
2,2

Unnamed: 0,A
2,2
3,3

Unnamed: 0,A
3,3
4,4


### User defined function (UDF) with `apply()`

The `apply()` function takes an extra `func` argument and performs generic rolling computations. The `func` argument should be a **single function that produces a single value** from each column in the window. `raw` specifies whether the windows columns are cast as Series objects (`raw=False`) or ndarray objects (`raw=True`).

In [86]:
df = pd.DataFrame({
    "A": np.random.randint(10, size = 5),
    "B" : range(5)
})
df

Unnamed: 0,A,B
0,9,0
1,3,1
2,5,2
3,2,3
4,4,4


In [87]:
# mean
r1 = df.rolling(window=3).mean()

# weighted mean
weights = [0.3, 0.2, 0.5]
r2 = df.rolling(window=3).apply(lambda x: np.dot(x, weights))

# mean with apply + weights
weights = [0.33, 0.33, 0.33]
r3 = df.rolling(window=3).apply(lambda x: np.dot(x, weights))

display_side_by_side(r1, r2, r3)

Unnamed: 0,A,B
0,,
1,,
2,5.666667,1.0
3,3.333333,2.0
4,3.666667,3.0

Unnamed: 0,A,B
0,,
1,,
2,5.8,1.2
3,2.9,2.2
4,3.9,3.2

Unnamed: 0,A,B
0,,
1,,
2,5.61,0.99
3,3.3,1.98
4,3.63,2.97


### Binary window functions (`cov()` and `corr()`)

`cov()` and `corr()` can compute moving window statistics about two Series or any combination of DataFrame/Series or DataFrame/DataFrame. Here is the behavior in each case:

1. two `Series`: compute the statistic for the pairing.

2. `DataFrame`/`Series`: compute the statistics for each column of the DataFrame with the passed Series, thus returning a DataFrame.

3. `DataFrame`/`DataFrame`: by default compute the statistic for matching column names, returning a `DataFrame`. If the keyword argument `pairwise=True` is passed then computes the statistic for each pair of columns, returning a `DataFrame` with a `MultiIndex`.

**NOTE:** ***Matching Indexes***: to match the window with the given `Series` or `Dataframe`, the `Series` or `DataFrame` passed to `cov()` or `corr()` must have the same index than the current window. It is because the match is done by the index (See image below)

<img src="./assets/imgs/binary_window_func.jpg" width="400"/>

**NOTE:** it is not necessary that the given `Series` or `Dataframe` has the same len than the window.

In [88]:
df = pd.DataFrame({
    "A" : range(5),
    "B" : np.random.randint(10, size=5),
    "C" : np.random.randint(10, size=5)
}
)

s = pd.Series(range(7))
df

Unnamed: 0,A,B,C
0,0,7,6
1,1,6,7
2,2,8,7
3,3,8,8
4,4,1,1


In [89]:
# 1. two Series corr
df["A"].rolling(window=3).corr(s)

0    NaN
1    NaN
2    1.0
3    1.0
4    1.0
5    NaN
6    NaN
dtype: float64

In [90]:
# 2. DataFrame - Series corr
# NOTE: the series s has a larger size than the window size, but it doesn't matter
# because the matching is made by the indexes in both window and s
df.rolling(window=3).corr(s)

Unnamed: 0,A,B,C
0,,,
1,,,
2,1.0,0.5,0.866025
3,1.0,0.866025,0.866025
4,1.0,-0.866025,-0.792406


In [64]:
# 3. DataFrame - DataFrame corr
df.rolling(window=3).corr(df)

Unnamed: 0,A,B,C
0,,,
1,,,
2,1.0,1.0,1.0
3,1.0,1.0,1.0
4,1.0,1.0,1.0


In [70]:
# NOTE: if the indexes from the series change the matching is not possible
s.index = np.arange(1,8)*10

df.rolling(window=3).corr(s)

Unnamed: 0,A,B,C
0,,,
1,,,
2,,,
3,,,
4,,,


## Weighted window (`win_type` argument)

A **weighted window** assigns different weights to the data points within the window based on some criteria or a predefined weight function (e.g. `weighted mean`). These weights influence the contribution of each data point to the final result. The weights can be based on factors like time, importance, or any other user-defined criteria.

Additionally, the `win_type` argument in `.rolling` generates a weighted windows that are commonly used in filtering and spectral estimation (based on Scipy window methods). You could define different `win_type` to apply weights in different ways.

This topic is out of the scope of this tutorial. It is just important to understand the concept.

In [91]:
df = pd.DataFrame({
    "A": np.random.randint(10, size = 5),
    "B" : range(5)
})
df

# weighted mean
weights = [0.3, 0.2, 0.5]
df.rolling(window=3).apply(lambda x: np.dot(x, weights))

Unnamed: 0,A,B
0,,
1,,
2,7.3,1.2
3,8.8,2.2
4,6.2,3.2


## Expanding Window

The `expanding()` function is used to create expanding windows. An expanding window will yield the current `i`-th row and all the previous rows.

**NOTE:** `expanding()` is similar to apply `.rolling(window=len(df))`.

**NOTE:** `expanding()` will create an Expanding object that support some aggregation built-in functions.

In [96]:
df = pd.DataFrame(range(5))
df


Unnamed: 0,0
0,0
1,1
2,2
3,3
4,4


In [97]:
windowed = df.expanding()
display_windowed(windowed)

windowed = df.rolling(window= len(df))
display_windowed(windowed)

Unnamed: 0,0
0,0

Unnamed: 0,0
0,0
1,1

Unnamed: 0,0
0,0
1,1
2,2

Unnamed: 0,0
0,0
1,1
2,2
3,3

Unnamed: 0,0
0,0
1,1
2,2
3,3
4,4


Unnamed: 0,0
0,0

Unnamed: 0,0
0,0
1,1

Unnamed: 0,0
0,0
1,1
2,2

Unnamed: 0,0
0,0
1,1
2,2
3,3

Unnamed: 0,0
0,0
1,1
2,2
3,3
4,4


In [98]:
df.expanding(min_periods=1).mean()

Unnamed: 0,0
0,0.0
1,0.5
2,1.0
3,1.5
4,2.0


## Exponentially Weighted Window

The function `.ewm()` is used to create exponentially weighted window, which is similar to an expanding window but with *each prior point being exponentially weighted down relative to the current point*.


It supports two variants. For example, with a weighted moving average

$$y_t=\frac{\sum_{i=0}^t w_i x_{t-i}}{\sum_{i=0}^t w_i}$$


1. `adjust=True`. The weights will be $w_i = (1-\alpha)^i$
2. `adjust=False`. The weights will be $w_i= \begin{cases}\alpha(1-\alpha)^i & \text { if } i<t \\ (1-\alpha)^i & \text { if } i=t\end{cases}$

**NOTE:** The $\alpha$ hasn't be explained. This topic is out of the scope of this tutorial. It is just important to understand the concept. But only to take into account remember that you could specify `span`, `center_of_mass`, `half-life` and `alpha` for controlling how the weights behave in a exponentially weighted window.


## Additional Functionality

The sections indicates additional optional functionalities for windowing operations. However, I don't consider important to review it (for an interview), unless you have an specific use.

### Custom Indexer (Optional)

It is possible to use `window = <custom_indexer>`. To do that, we have to 
inherit the class `BaseIndexer` to create a custom indexer and defined a custom
method:

 `get_window_bounds(self, num_values, min_periods, center, closed)`

This method will return a tuple of two arrays, the first being the starting indices of the windows and second being the ending indices of the windows

**NOTE:** Additionally, `num_values`, `min_periods`, `center`, and `closed` will automatically be passed to `get_window_bounds` and the defined method must always accept these arguments.

**NOTE:** There are other ways to make a custom indexer as `VariableOffsetWindowIndexer` or `FixedForwardWindowIndexer`, but they are not addressed in this tutorial.

In [18]:
#Example: using a custom indexer as window parameter

# given an array `use_expanding = [True, False, True, False, True]` which 
# indicates the window that will use expanding with True, otherwise, perform a 
# normal fixed rolling window.

import numpy as np
from pandas.api.indexers import BaseIndexer

# indicates use expanding for window 0, 2, 4
use_expanding = [True, False, True, False, True]

# create the custom indexer
class CustomIndexer(BaseIndexer):

    def __init__(self, window_size, use_expanding):
        self.window_size = window_size
        self.use_expanding = use_expanding
    
    def get_window_bounds(self, num_values, min_periods, center, closed, step):
        start = np.empty(num_values, dtype=np.int64)
        end = np.empty(num_values, dtype=np.int64)
        for i in range(num_values):
            if self.use_expanding[i]:
                start[i] = 0
                end[i] = i + 1
            else:
                start[i] = i
                end[i] = i + self.window_size
        return start, end
    
indexer = CustomIndexer(window_size=1, use_expanding=use_expanding)

# NOTE: the the windows 0, 2, and 4 are expanding windows and the others are 
# common rolling windows of fixed size 1
windowed = df.rolling(indexer)
display_windowed(windowed)

Unnamed: 0,A,B
2020-01-01,0,5

Unnamed: 0,A,B
2020-01-03,1,0

Unnamed: 0,A,B
2020-01-01,0,5
2020-01-03,1,0
2020-01-04,2,3

Unnamed: 0,A,B
2020-01-05,3,3

Unnamed: 0,A,B
2020-01-01,0,5
2020-01-03,1,0
2020-01-04,2,3
2020-01-05,3,3
2020-01-29,4,7
