# Adding weights to rolling window features
Feature Engineering for Time Series Forecasting

In this notebook we demonstrate how to add weights to rolling window features using Pandas and sktime.

## Data set synopsis
We will work with the hourly electricity demand dataset. It is the electricity demand for the state of Victora in Australia from 2002 to the start of 2015.

For instructions on how to download, prepare, and store the dataset, refer to notebook number 4, in the folder "01-Create-Datasets" from this repo.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

sns.set_context('talk')

from functools import partial

## Load data

In [2]:
data = pd.read_csv('../datasets/victoria_electricity_demand.csv',
                  usecols=['demand', 'temperature', 'date_time'],
                  index_col=['date_time'],
                  parse_dates=['date_time'])

In [3]:
# for this demo we'll use a subset of the data
data = data.loc['2010':]

In [4]:
data.head()

Unnamed: 0_level_0,demand,temperature
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1
2010-01-01 00:00:00,8314.448682,21.525
2010-01-01 01:00:00,8267.187296,22.4
2010-01-01 02:00:00,7394.528444,22.15
2010-01-01 03:00:00,6952.04752,21.8
2010-01-01 04:00:00,6867.199634,20.25


## Computing exponential weights for rolling window using Pandas


When using exponential weights it is more common to use expanding windows, but we will show how to use exponential weights with rolling windows for completeness. It will also come in handy when we want to compute custom metrics.

In [5]:
# create a copy of the data
df = data.copy()

Let's create a function that computes exponential weights for an input window size and `alpha` parameter.

The weights should be: $[(1-\alpha)^{W-1}, ..., (1-\alpha)^2, (1-\alpha), 1]$ where $W$ is the window size

In [6]:
# compute exponential weights
def exp_weights(alpha, window_size):
    weights = np.ones(window_size)    # initial weights
    for ix in range(window_size):
        weights[ix] = (1-alpha)**(window_size-1-ix)
    return weights

In [7]:
# check it works
exp_weights(alpha=0.05, window_size=12)

array([0.56880009, 0.59873694, 0.63024941, 0.66342043, 0.6983373 ,
       0.73509189, 0.77378094, 0.81450625, 0.857375  , 0.9025    ,
       0.95      , 1.        ])

In [8]:
# double check the second term
print((1-0.05)*0.95)

0.9025


In [9]:
# define our own weighted mean function to pass to agg
def exp_weighted_mean(x, alpha):
    weights = exp_weights(alpha, window_size=len(x))
    return (weights * x).sum() / weights.sum()

exp_weighted_mean = partial(exp_weighted_mean, alpha=0.05)

NameError: name 'partial' is not defined

In [None]:
result = (df['demand'].rolling(window=24*7)
                    .agg(['mean', exp_weighted_mean])
                     .shift(freq='1H'))

result = result.add_prefix('demand_window_168_')

result

Let's join this back to the original dataframe.

In [None]:
df = df.join(result, how='left')

df

In [None]:
cols = ['demand', 'demand_window_168_mean', 'demand_window_168_exp_weighted_mean']

ax = (df.iloc[-24*7*2:]           # look at the last 2 weeks of data
         .loc[:, cols]           
         .plot(figsize=(14,7)))

ax.set_title('Rolling window mean of electricity demand')
ax.set_ylabel('electricity demand')
ax.set_xlabel('Time')