## Use Vectorization instead of Pandas `rolling().apply()` when Possible
### A case study: How to calculate rolling ZScores and Correlations? 

At the end of the last [notebook](https://github.com/coindataschool/pytips/blob/main/pandas/apply/05-pandas-rolling-apply.ipynb), 
I mentioned `rolling().apply()` has two major drawbacks with the first one being 
that it's slow. In this notebook, I will calculate rolling ZScores and 
Correlations via `rolling().apply()` and vectorization respectively and compare 
their speed. You will see vectorization is much faster. 

In [1]:
import numpy as np
import pandas as pd
import datetime as dt
from defillama2 import DefiLlama
from typing import Union

pd.set_option('display.max_columns', 15)
pd.set_option('display.max_rows', 50)
pd.options.display.float_format = '{:,.4f}'.format

In [2]:
def equal(
    a: Union[pd.DataFrame, pd.Series, np.ndarray], 
    b: Union[pd.DataFrame, pd.Series, np.ndarray],
    threshold=1e-8):
    """ 
    Check if the corresponding values of two data frames or series or numpy arrays are the same.
    """
    return (abs(a - b) > threshold).sum().sum() == 0 # 0 means same values

### Prep Data

In [3]:
dd = {'0xfc5a1a6eb076a2c7ad06ed22c90d7e710e35ad0a':'arbitrum', # GMX on arbitrum
      '0x912CE59144191C1204E64559FE8253a0e49E6548':'arbitrum', # ARB on arbitrum
      '0x82aF49447D8a07e3bd95BD0d56f35241523fBab1':'arbitrum', # ETH on arbitrum      
      }

obj = DefiLlama() # create a DefiLlama instance

# get historical daily close prices 
df = obj.get_daily_open_close(dd, start='2023-03-23', end='2023-07-27', kind='close')

# calc daily returns
daily_rets = df.pct_change().dropna()
daily_rets.head()

Unnamed: 0_level_0,ARB,GMX,WETH
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2023-03-24,-0.0459,-0.0659,-0.0338
2023-03-25,-0.038,-0.0109,-0.0042
2023-03-26,0.0407,-0.0069,0.0129
2023-03-27,-0.0859,0.0392,-0.033
2023-03-28,0.0403,0.0815,0.0337


In [4]:
# user input 
ndays = 7
mvar = 'WETH'

### Calculate Rolling Z-Scores

In [5]:
# using rolling().apply()
def calc_zscore(xs):
    return (xs[-1] - xs.mean()) / xs.std(ddof=1) # x[-1] is the current rolling data point, dropping [-1] will throw error

%timeit df_zscores_v1 = daily_rets.rolling(ndays).apply(calc_zscore)

97.6 ms ± 4.93 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [6]:
# using rolling() + vectorization
%timeit df_zscores_v2 = (daily_rets - daily_rets.rolling(ndays).mean()) / daily_rets.rolling(ndays).std(ddof=1)

1.36 ms ± 68.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


The vectorization approach is ~100x faster than `rolling().apply()`!

## Calculate Rolling Correlations

Let's now calculate the rolling correlations between 

    - the ARB return series and ETH return series, and
    - the GMX return series and ETH return series.

In [7]:
%timeit pearson_V1 = daily_rets.rolling(ndays).apply(lambda ser: ser.corr(daily_rets[mvar]))

252 ms ± 26.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [8]:
%timeit pearson_v2 = daily_rets.rolling(ndays).corr(daily_rets[mvar])

2.61 ms ± 33.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Calling `.corr()` directly is 100x faster than calling it inside of `apply()` because the former uses vectorization!

### Summary

- Vectorization can be 100x faster than `rolling().apply(func)`. 
- Vectorization exists for calculating rolling zscores or correlations. It may 
  also exist for other rolling statistics beyond the simple ones. Search for 
  them before blindly using `rolling().apply()`.

In the next notebook, I'll provide solutions to the second drawback of 
`rolling().apply()`, namely, it can't take multiple columns as input. Make sure
you star and watch this [repo](https://github.com/coindataschool/pytips/tree/main/pandas/apply) 
to stay informed.

### Good Read

- [All my notebooks on pandas `apply()`](https://coindataschool.substack.com/p/pandas-apply)
- Here's a [dashboard](https://coindataschool-husdlci-main-pfjljd.streamlit.app/) that 
  visualizes 30-day rolling correlations between BTC price and the Hayes USD liquidity condition index. 