# Exponential Smoothing

> Forecasts produced using exponential smoothing methods are weighted averages of past observations, with the weights decaying exponentially as the observations get older. In other words, the more recent the observation the higher the associated weight. 

\- [Forecasting: Principles and Practice](https://otexts.com/fpp2/expsmooth.html) by Rob Hyndman



## exponential smoothing

* exponential smoothing is a simple but effective tecnique to smooth time series.
* exponential smoothing is a **weighted** average of past observations and is usually a better alternative to a moving average.
* exponential smoothing depends on a coefficient $\alpha$

### exercise

* look up definition of exponential smoothing
* implement exponential smoothing and moving average for a time series and compare the two on one or more examples (with a plot)
* bonus: look up and implement double exponential smoothing (which is a very basic version of forecasting)

### Question

- How to expand the horizon?
- How $l_0$ is initialized, average of the series?
- Name of this kind of weight $\alpha x + (1 - \alpha) y$?
- How $\alpha_t$ is updated?

# Simple Exponential Smoothing


The forecast at time $t+1$ is equal to a weighted average between the most recent observation $y_t$ and the previous forecast $\hat{y}_t$:

$\hat{y}_{t+1} = \alpha y_t + (1 - \alpha) \hat{y}_{t-1}$

## Component form

**Forecast equation**: $\hat{y}_{t+h} = l_t$

**Smoothing equation**: $l_t = \alpha y_t + (1 - \alpha) l_{t-1}$


In [1]:
import pandas as pd
import numpy as np
import altair as alt
%matplotlib inline

In [2]:
url='https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv'
full_data = pd.read_csv(url, sep=',')
full_data.drop(['Wind', 'Solar', 'Wind+Solar'], axis=1, inplace=True)
full_data['Date'] = pd.to_datetime(full_data.Date)
full_data.set_index('Date', inplace=True)
data = full_data['2006'].copy()

In [3]:
data.head()

Unnamed: 0_level_0,Consumption
Date,Unnamed: 1_level_1
2006-01-01,1069.184
2006-01-02,1380.521
2006-01-03,1442.533
2006-01-04,1457.217
2006-01-05,1477.131


In [4]:
def moving_average(series: np.ndarray, w_size: int):
    i = w_size
    o = np.full(series.shape, np.nan)
    while i < len(series) - w_size:
        o[i] = series[i-w_size:i+w_size+1].mean()
        i += 1
    return o

def ses(series: np.ndarray, alpha: float):
    assert 0 < alpha < 1
    l_prev = series.mean()
    l = np.zeros_like(series)
    f = np.zeros_like(series)
    for i,y in enumerate(series):
        f[i] = l_prev
        l_prev = alpha * y + (1 - alpha) * l_prev
        l[i] = l_prev
    return l, f

In [5]:
data['MA'] = moving_average(data.Consumption, 7)

In [6]:
level, forecast = ses(data.Consumption, 0.8)
data['L'] = level
data['F'] = forecast

In [7]:
data

Unnamed: 0_level_0,Consumption,MA,L,F
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2006-01-01,1069.184,,1123.329959,1339.913797
2006-01-02,1380.521,,1329.082792,1123.329959
2006-01-03,1442.533,,1419.842958,1329.082792
2006-01-04,1457.217,,1449.742192,1419.842958
2006-01-05,1477.131,,1471.653238,1449.742192
2006-01-06,1403.427,,1417.072248,1471.653238
2006-01-07,1300.287,,1323.644050,1417.072248
2006-01-08,1207.985,1409.693133,1231.116810,1323.644050
2006-01-09,1529.323,1442.663400,1469.681762,1231.116810
2006-01-10,1576.911,1457.182200,1555.465152,1469.681762


In [8]:
full_data.loc['2007-01-01']

Consumption    1128.843
Name: 2007-01-01 00:00:00, dtype: float64

In [9]:
source = data.drop('L', axis=1).transpose().stack().reset_index().rename(columns={'level_0': 'type', 0: 'value'})

alt.Chart(source).mark_line().encode(
    x='Date',
    y='value',
    color='type',
    strokeDash='type',
).properties(
    width=800,
    height=300
).interactive()

# Double Exponential Smoothing

Extended simple exponential smoothing to allow the forecasting of data with a trend.

**Forecast equation**: $\hat{y}_{t+h} = l_t + h b_t$

**Level equation**: $l_t = \alpha y_t + (1 - \alpha)(l_{t-1} + b_{t-1})$

**Trend equation**: $b_t = \beta (l_t - l_{t-1}) + (1 - \beta) b_{t-1}$

In [10]:
def des(series: np.ndarray, alpha: float, beta: float, h: int = 1):
    assert 0 < alpha < 1
    assert 0 < beta < 1
    l_prev = series.mean()
    print(f'l_0 = {l_prev}')
    b_prev = series[0] - l_prev
    l = np.zeros_like(series)
    b = np.zeros_like(series)
    f = np.zeros_like(series)
    for i,y in enumerate(series):
        f[i] = l_prev + h * b_prev
        l[i] = alpha * y + (1 - alpha) * (l_prev + b_prev)
        b[i] = beta * (l[i] - l_prev) + (1 - beta) * b_prev
        l_prev, b_prev = l[i], b[i]
    return l, b, f        

In [11]:
data = full_data[:'2006-12-31'].copy()
level, trend, forecast = des(data.Consumption, alpha=0.8, beta=0.01, h=1)
data['L'] = level
data['T'] = trend
data['F'] = forecast

l_0 = 1339.9137972602741


In [12]:
data

Unnamed: 0_level_0,Consumption,L,T,F
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2006-01-01,1069.184,1069.184000,-270.729797,1069.184000
2006-01-02,1380.521,1264.107641,-266.073263,798.454203
2006-01-03,1442.533,1353.633276,-262.517274,998.034378
2006-01-04,1457.217,1383.996800,-259.588466,1091.116002
2006-01-05,1477.131,1406.586467,-256.766685,1124.408334
2006-01-06,1403.427,1352.705556,-254.737827,1149.819782
2006-01-07,1300.287,1259.823146,-253.119273,1097.967730
2006-01-08,1207.985,1167.728775,-251.509024,1006.703873
2006-01-09,1529.323,1406.702350,-246.604198,916.219751
2006-01-10,1576.911,1493.548431,-243.269695,1160.098153


In [13]:
source = data.drop('L', axis=1).transpose().stack().reset_index().rename(columns={'level_0': 'type', 0: 'value'})

alt.Chart(source).mark_line().encode(
    x='Date',
    y='value',
    color='type',
    strokeDash='type',
).properties(
    width=800,
    height=300
).interactive()

In [14]:
m_data = full_data.resample('MS').sum()
level, trend, forecast = des(m_data.Consumption, alpha=0.8, beta=0.1, h=1)
m_data['L'] = level
m_data['T'] = trend
m_data['F'] = forecast

l_0 = 40745.94574576389


In [15]:
m_data

Unnamed: 0_level_0,Consumption,L,T,F
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2006-01-01,45304.70400,45304.704000,4558.758254,45304.704000
2006-02-01,41078.99300,42835.886851,3856.000714,49863.462254
2006-03-01,43978.12400,44520.876713,3638.899629,46691.887565
2006-04-01,38251.76700,40233.368868,2846.258881,48159.776342
2006-05-01,38858.14300,39702.439950,2508.540101,43079.627750
2006-06-01,37253.45000,38244.956010,2111.937697,42210.980051
2006-07-01,38852.18500,39153.126742,1991.561001,40356.893708
2006-08-01,38476.85200,39010.419148,1778.134141,41144.687742
2006-09-01,39335.09800,39625.789058,1661.857718,40788.553290
2006-10-01,41638.01900,41567.944555,1689.887496,41287.646776


In [16]:
source = m_data.drop('L', axis=1).transpose().stack().reset_index().rename(columns={'level_0': 'type', 0: 'value'})

alt.Chart(source).mark_line().encode(
    x='Date',
    y='value',
    color='type',
    strokeDash='type',
).properties(
    width=800,
    height=300
).interactive()