HSE, Applied Time Series Forecasitng , Fall 2023

<font color="green"> Lesson #2: Simple Exponential Smoothing Model </font>

<span style="color:black; font-size: 12pt"></span>

Alexey Romanenko,
<font color="blue">alexromsput@gmail.com</font>

**Key words:** simple exponential smoothing, adaptive exponential smoothing, retail time series

**Your feedback:**  please provide you feedback  <a href="https://forms.gle/EQgXEVQe9PPXUBzm6"> here </a>

In [2]:
import numpy as np
from datetime import datetime, timedelta
import pylab
import pandas as pd

import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
pd.options.plotting.backend = "plotly"

import math
import pandas.tseries.offsets as ofs
import warnings as w
from utils import qualityMAPE, SimpleExponentialSmoothing
from matplotlib import gridspec

from IPython.display import Image

# %matplotlib inline

# Typical TS for SES Model

## TS in Retail

In [5]:
ts = pd.read_csv('https://raw.githubusercontent.com/aromanenko/ATSF/main/data/retail_10ts.csv', parse_dates=['Dates'], dayfirst=True, index_col='Dates')
ts.index.names=['Timestamp']
ts = ts.sort_index() # sort index
ts.head()

Unnamed: 0_level_0,Item: 165,Item: 969,Item: 2653,Item: 2654,Item: 2692,Item: 2695,Item: 2697,Item: 2765,Item: 2767,Item: 2806,Item: 2808
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2005-01-11,,2.0,4.0,,,,,,,,
2005-01-12,,5.0,8.0,,,,,,,,
2005-01-13,,2.0,20.0,,,,,,,,
2005-01-14,,42.0,14.0,,,,,,,,
2005-01-15,,,23.0,,,,,,,,


In [4]:
# Interval of ts
ts.loc['2007-01-01':'2007-01-05']

Unnamed: 0_level_0,Item: 165,Item: 969,Item: 2653,Item: 2654,Item: 2692,Item: 2695,Item: 2697,Item: 2765,Item: 2767,Item: 2806,Item: 2808
Timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2007-01-01,,,,,,,,,,,
2007-01-02,,21.0,11.0,,7.0,2.0,,7.0,1.0,3.0,5.0
2007-01-03,,10.0,15.0,,21.0,7.0,,2.0,3.0,0.0,2.0
2007-01-04,,,4.0,,7.0,,,12.0,1.0,,1.0
2007-01-05,,3.0,36.0,,14.0,,,6.0,10.0,4.0,1.0


**Questions**
    - Which charachteristic of TS can you mention so far?
    - Which components of TS can you see?

In [5]:
# fig = plt.figure()
ts.loc['2005-07-01':'2007-12-31', ts.columns[range(3)]].plot().update_layout(height=350, width=1300).show()
# to save the pictures
# plt.savefig('../Lecture_TS_Forecasting/pic/TS_Example.eps', bbox_inches='tight', pad_inches=0, format='eps', dpi=1000)

**Questions**

 - What are key aspects of these retail ts?


In [31]:
# fig = plt.figure()
# gs = gridspec.GridSpec(3, 3)
# for i in range(3):
#     for j in range(3):
#         fig.add_subplot(gs[i,j])
#         ts.loc[:, ts.columns[i+j]].plot()

ts.loc['2005-07-01':'2007-12-31'].plot().update_layout(height=350, width=1300)

**Questions**

 - Any idea how to predict such ts?
 - Can you describe statistical model for such ts?

###### Answer



$$y_{t} = l_t + \color{red}{\varepsilon_t},$$

where $l_t$ $-$ changing slowly level of time series,

$\varepsilon_t~-$ error component (unobserved noise)

Forecasting model:

$$ {\hat y_{t+d}} = \color{\red}{\hat l_t} $$

where $\hat l_t~-$ an estimation of level


## Moving Average

**Rolling window n**
       $$\hat y_{t+d} = \frac{1}{n}\left(y_{t-n+1}+\dots+ {y}_t\right)$$
   
**All points in  $[t-n+1, t]$ has the same weight**
                                   $$w = \frac{1}{n}$$
**Other points has weight**
                                    $$w = 0$$

In [34]:
# Rolling moving (n = 2 и n=28)
item_name = u'Item:  165'
start_period = '2008-01-01'
end_period = '2008-01-31'

# Note: replace NaN before average calculation
ts.loc[start_period:end_period][[item_name]].merge(
      ts[[item_name]].fillna(method='pad').rolling(2).mean().loc[start_period:end_period].rename(columns = {item_name:'MA window=2'}),  # moving average with window = 2
      how='inner', left_index = True, right_index = True
      ).merge(
      ts[[item_name]].fillna(method='pad').rolling(28).mean().loc[start_period:end_period].rename(columns = {item_name:'MA window=28'}),  # moving average with window = 28
      how='inner', left_index = True, right_index = True
      ).plot().update_layout(height=350, width=1300)

**Expanding window**
       $$\hat y_{t+d} = \frac{1}{t}\left(y_{1}+\dots+ {y}_t\right)$$
   
**All time points in $[1, t]$ has the same weight**
                                   $$w = \frac{1}{t}$$   

In [36]:
# Expanding windod
ts.loc[start_period:end_period][[item_name]].merge(
      ts[[item_name]].fillna(method='pad').expanding().mean().loc[start_period:end_period].rename(columns = {item_name:'MA expanding window'}),  # moving average with window = 2
      how='inner', left_index = True, right_index = True
      ).plot().update_layout(height=350, width=1300)

**Questions**

 - What are analytical disadvantages of using movinag average algorithm?

  <!-- - <font color="green">Answers:
    - awkward implementation (we have to remember last window ts points)
    - it is not smooth (weights are not smooth within time) </font> -->

## Exponentially Wighted Moving Average

**Exponentialy diminishing weights**
    $$\hat y_{t+d}= \alpha y_t + \alpha \left(1-\alpha\right)y_{t-1} + \alpha \left(1-\alpha\right)^2y_{t-2}+\dots = \sum_{\tau=1}^t \alpha\cdot (1-\alpha)^{t-\tau}\cdot y_\tau$$
   
**Weight for time point in moment $\tau$**
    $$w_\tau = \alpha\cdot (1-\alpha)^{t-\tau}$$  


In [None]:
# look at timestamp weights
t = 10
alpha_set = [0.001, 0.1, 0.15, 0.2, 0.5, 0.9]
pd.DataFrame(data = [[a*(1-a)**(t-tau) for a in alpha_set] for tau in range(1,t+1,1)], columns = [r'\alpha=' + str(x) for x in alpha_set], index = range(1,t+1,1)).sort_index(ascending = False)

Unnamed: 0,\alpha=0.001,\alpha=0.1,\alpha=0.15,\alpha=0.2,\alpha=0.5,\alpha=0.9
10,0.001,0.1,0.15,0.2,0.5,0.9
9,0.000999,0.09,0.1275,0.16,0.25,0.09
8,0.000998,0.081,0.108375,0.128,0.125,0.009
7,0.000997,0.0729,0.092119,0.1024,0.0625,0.0009
6,0.000996,0.06561,0.078301,0.08192,0.03125,9e-05
5,0.000995,0.059049,0.066556,0.065536,0.015625,9e-06
4,0.000994,0.053144,0.056572,0.052429,0.007812,9e-07
3,0.000993,0.04783,0.048087,0.041943,0.003906,9e-08
2,0.000992,0.043047,0.040874,0.033554,0.001953,9e-09
1,0.000991,0.038742,0.034743,0.026844,0.000977,9e-10


HINT: Approximation of $window$ in MA and $\alpha$ in Exponential Smoothing (only for $alpha<0.1$)

$$window \approx \frac{1}{2\alpha}$$

In [43]:
# Averaging with exponential weights

item_name = u'Item:  165'
start_period = '2008-01-01'
end_period = '2008-01-31'

# Note: replace NaN before average calculation
ts.loc[start_period:end_period][[item_name]].merge(
      ts[[item_name]].fillna(method='pad').ewm(alpha=0.9).mean().loc[start_period:end_period].rename(columns = {item_name:'($EWMA(\\alpha=.9)$'}),  # ewma alpha =0.9
      how='inner', left_index = True, right_index = True
      ).merge(
      ts[[item_name]].fillna(method='pad').ewm(alpha=0.5).mean().loc[start_period:end_period].rename(columns = {item_name:'$EWMA(\\alpha=.5)$'}),  # ewma alpha =0.5
      how='inner', left_index = True, right_index = True
      ).merge(
      ts[[item_name]].fillna(method='pad').ewm(alpha=0.1).mean().loc[start_period:end_period].rename(columns = {item_name:'$EWMA(\\alpha=.1)$'}),  # ewma alpha =0.1
      how='inner', left_index = True, right_index = True
      ).merge(
      ts[[item_name]].fillna(method='pad').ewm(alpha=0.01).mean().loc[start_period:end_period].rename(columns = {item_name:'$EWMA(\\alpha=.01)$'}),  # ewma alpha =0.01
      how='inner', left_index = True, right_index = True
      ).plot().update_layout(height=350, width=1300)

**Parameter $\alpha$ drives the depth of the historical period to be considered!**

$\alpha \uparrow 1 \; \Rightarrow$ EWMA is closer to Moving Average with window = 1,

$\alpha \downarrow 0 \; \Rightarrow$ EWMA is closer to Moving Average with expanding window.

In [None]:
print

In [53]:
# Compare EWMA (alpha = 0.001) and Expanding
alpha_small = 0.001
ts.loc[start_period:end_period][[item_name]].merge(
      ts[[item_name]].fillna(method='pad').expanding().mean().loc[start_period:end_period].rename(columns = {item_name:'MA expanding window'}),  # MA with expanding window
      how='inner', left_index = True, right_index = True
      ).merge(
      ts[[item_name]].fillna(method='pad').ewm(alpha=alpha_small).mean().loc[start_period:end_period].rename(columns = {item_name:'$EWMA(\\alpha={0})$'.format(alpha_small)}),  # ewma with small alpha
      how='inner', left_index = True, right_index = True
      ).plot().update_layout(height=350, width=1300)

In [55]:
# Compare EWMA (alpha = 0.9) and MA (window = 1)
# ts.loc['2008-01-01':'2009-01-10'][u'Item:  165'].plot(label='Raw')

alpha_big = 0.9
ts.loc[start_period:end_period][[item_name]].merge(
      ts[[item_name]].fillna(method='pad').rolling(1).mean().loc[start_period:end_period].rename(columns = {item_name:'MA window=1'}),  # MA with expanding window
      how='inner', left_index = True, right_index = True
      ).merge(
      ts[[item_name]].fillna(method='pad').ewm(alpha=alpha_big).mean().loc[start_period:end_period].rename(columns = {item_name:'$EWMA(\\alpha={0})$'.format(alpha_big)}),  # ewma with small alpha
      how='inner', left_index = True, right_index = True
      ).plot().update_layout(height=350, width=1300)


# Simple Exponential Smoothing

## A bit of Theory

**Sum notation**:
$$\hat{y}_{t+1} = \sum_{\tau=1}^t \alpha\cdot (1-\alpha)^{t-\tau}\cdot y_\tau~~~~~(1)$$

**Recurrent formula notation:**
$$\hat{y}_{t+1} = ~\underbrace{\color{green}\alpha\cdot y_t+ (1-\color{green}\alpha)\cdot \hat{y}_{t} }_{\text{canonical formula}}~= ~\underbrace{\hat y_t + \color{green}\alpha \cdot \color{red}{e_t}}_{\text{error correction formula}}$$
where ${\color{red}{e_t} = y_t - \hat y_t }$


**Question:**
 * What is the problem with this formla (1)?
 * Hint: calculate sum of weights of time series points.


  <!-- - <font color = 'green'>Answer
    - sum or weights is less than 1</font> -->

## SES Realization

In [56]:
# Example of realization

# Simple Exponential Smoothing
# x <array Tx1>- time series,
# h <scalar> - forecasting delay
# Params <dict> - dictionary with
#    alpha <scalar in [0,1]> - smoothing parameter

def SimpleExponentialSmoothing(x, h=1, Params={}):
    T = len(x)
    alpha = Params['alpha']
    FORECAST = [np.NaN]*(T+h)
    if alpha>1:
        w.warn('Alpha can not be more than 1')
        #alpha = 1
        return FORECAST
    if alpha<0:
        w.warn('Alpha can not be less than 0')
        #alpha = 0
        return FORECAST
    # initialization
    y = x[0]
    for cntr in range(T):
        if not math.isnan(x[cntr]):
            if math.isnan(y):
                y=x[cntr]
            y = alpha*x[cntr] + (1-alpha)*y  # = y + alpha*(x[cntr]-y)
            #else do not nothing
        FORECAST[cntr+h] = y
    return FORECAST

In [57]:
# Forecast delay = 1
h = 1
start = ts.index[-1]+timedelta(1)
end = ts.index[-1]+timedelta(h)
rng = pd.date_range(start, end)
frc_ts = pd.DataFrame(index = ts.index.append(rng), columns = ts.columns)

In [58]:
# ES params
Params ={'alpha':0.1}

# generate forecasts for each Item
for cntr in ts.columns:
    frc_ts[cntr] = SimpleExponentialSmoothing(ts[cntr], h, Params)

In [None]:
# fig = plt.figure(figsize=(25,7))
# ax1=fig.add_subplot(111)
# qlt_array=['SSE', 'MSE', 'RMSE', 'MedianAE', 'MAPE', 'MACAPE']
# for i in range(6):
#     # Quality = [np.NaN]*len(ts.columns)
#     plt.subplot(2,3,i+1)
#     Quality, _ = eval('quality'+qlt_array[i])(ts, frc_ts)
#     Quality.plot(kind='bar')
#     plt.title(qlt_array[i], y=0.9)

In [70]:
frc_ts.index[:]

DatetimeIndex(['2005-01-11', '2005-01-12', '2005-01-13', '2005-01-14',
               '2005-01-15', '2005-01-16', '2005-01-17', '2005-01-18',
               '2005-01-19', '2005-01-20',
               ...
               '2009-02-22', '2009-02-23', '2009-02-24', '2009-02-25',
               '2009-02-26', '2009-02-27', '2009-02-28', '2009-03-01',
               '2009-03-02', '2009-03-03'],
              dtype='datetime64[ns]', length=1513, freq=None)

In [109]:
# show SES forecast alpha = 0.1
def plot_ts_forecast(ts, frc_ts, ts_num=0, alg_title=''):
    frc_ts.columns = ts.columns+'; '+alg_title
    ts[[ts.columns[ts_num]]].merge(frc_ts[[frc_ts.columns[ts_num]]], how = 'outer', left_index = True, right_index = True)\
      .plot().update_layout(height=350, width=1300,
                  xaxis_title="time ticks",
                  yaxis_title="ts and forecast values").show()

    # fig = go.Figure()
    # fig.add_trace(go.Scatter(x=frc_ts.index, y=frc_ts[[frc_ts.columns[ts_num]]], mode='lines+markers',  name=alg_title))
    # fig.add_trace(go.Scatter(x=frc_ts.index, y=ts[[ts.columns[ts_num]]], mode='lines+markers', name=ts.columns[ts_num]))
    # fig.update_layout(legend_orientation="h",
    #               legend=dict(x=.5, xanchor="center"),
    #               hovermode="x",
    #               margin=dict(l=0, r=0, t=0, b=0))
    # fig.update_layout(title="Plot Title",
    #               xaxis_title="time ticks",
    #               yaxis_title="ts and forecast values")
    # fig.show()
    # ax = frc_ts[frc_ts.columns[ts_num]].plot().update_layout(height=350, width=1300, style='r-^', linewidth=1.0)
    # plt.xlabel("Time ticks")
    # plt.ylabel("TS values")
    # plt.legend()
    return

plot_ts_forecast(ts.loc['2009-01-01':'2009-03-03'], frc_ts.loc['2009-01-01':'2009-03-03'], ts_num=0, alg_title='ES alpha=0.1')
# ts_num = 0
# ts[[ts.columns[ts_num]]].merge(frc_ts[[frc_ts.columns[ts_num]]], how = 'outer', left_index = True, right_index = True)\
#       .plot().update_layout(height=350, width=1300)


**Question:**
 * Why the forecasts values for 27th, 28th, 29th and 30th of January are the same?
 * When does the last change of the forecast value occur?  

In [110]:
# Generate forecast for h = 30
h = 30
start = ts.index[-1]+timedelta(1)
end = ts.index[-1]+timedelta(h)
rng = pd.date_range(start, end)
frc_ts = pd.DataFrame(index = ts.index.append(rng), columns = ts.columns)

for cntr in ts.columns:
    frc_ts[cntr] = SimpleExponentialSmoothing(ts[cntr], h, {'alpha':0.1})

In [111]:
# show forecast h = 30, alpha = 0.1
plot_ts_forecast(ts.loc['2009-01-01':'2009-03-03'], frc_ts.loc['2009-01-01':'2009-03-03'], ts_num=0, alg_title='ES alpha=0.1')

## Search for the optimal $\alpha$

In [74]:
def build_forecast(h, ts, AlgName, AlgTitle, ParamsArray, step='D'):
  'grid'

  FRC_TS = dict()

  for p in ParamsArray:
      frc_horizon = pd.date_range(ts.index[-1], periods=h+1, freq=step)[1:]
      frc_ts = pd.DataFrame(index = ts.index.append(frc_horizon), columns = ts.columns)

      for cntr in ts.columns:
          frc_ts[cntr] = eval(AlgName)(ts[cntr], h, p)
          # frc_ts[cntr] = AlgName(ts[cntr], h, p)

#         frc_ts.columns = frc_ts.columns+('%s %s' % (AlgTitle, p))
      FRC_TS['%s %s' % (AlgTitle, p)] = frc_ts

  return FRC_TS

In [98]:
#Fit parameters
ALPHA = [0.7, 0.4, 0.2, .15, 0.1, 0.05, 0.01, 0.001]
ESParamsArray = [{'alpha':alpha} for alpha in ALPHA]
FRC_TS = build_forecast(h=1, ts=ts, AlgName =  'SimpleExponentialSmoothing', AlgTitle='ES' ,ParamsArray = ESParamsArray)

Loss of the SES forecast for all history

In [94]:
# intoduce loss function
def qualityMAPE(x,y):
    # Mean absolute percentage error
    # x,y - pandas structures
    # x - real values
    # y - forecasts
  qlt = ((x-y).abs()/x).replace([np.inf, -np.inf], np.nan)
  return qlt.mean() , (x-y).abs()

In [100]:
# compare ES parameters
quality_wholehist = pd.DataFrame(index = ts.columns, columns = FRC_TS.keys())

# Quality within all 1500 steps
for param_cntr in sorted(quality_wholehist.columns):
    frc_ts = FRC_TS[param_cntr]
    quality_wholehist[param_cntr],_ = qualityMAPE(ts, frc_ts.loc[ts.index])


px.histogram(quality_wholehist.unstack().reset_index().rename(columns = {'level_0':'algs', 'level_1':'items', 0:'MAPE'}), x="items", y="MAPE",
             color='algs', barmode='group')

# quality_wholehist.plot.bar(x="sex", y="total_bill",
#              color='smoker', barmode='group',)

* The optimal value of $\alpha$ is about $0.01$

Loss for first time series point

In [116]:
quality_initphase = pd.DataFrame(index = ts.columns, columns = FRC_TS.keys())

# Quality in first 100 steps
init_steps = 170
for model in quality_initphase.columns:
    frc_ts = FRC_TS[model]
    for ts_num in ts.columns:
        ix = pd.date_range(ts[ts_num].first_valid_index()+timedelta(140), ts[ts_num].first_valid_index()+timedelta(init_steps))
        quality_initphase[model][ts_num],_ = qualityMAPE(ts[ts_num].loc[ix], frc_ts[ts_num].loc[ix])

px.histogram(quality_initphase.unstack().reset_index().rename(columns = {'level_0':'algs', 'level_1':'items', 0:'MAPE'}), x="items", y="MAPE",
             color='algs', barmode='group')


**Question:**

Why is loss function  not calculated for some items?

(Hint: see definition of loss function)

In [114]:
# ts VS forecast in first 100 steps
model_num = [0,4,6]  # [0,6], [0,4,6]
Models = sorted(FRC_TS.keys())

ts_num = 3 # 7
plot_ts = pd.DataFrame(index =ts.index)
plot_ts[ts.columns[ts_num]] = ts[ts.columns[ts_num]]
for model in model_num:
     frc_ts = FRC_TS[Models[model]]
     plot_ts[frc_ts.columns[ts_num]+'; '+Models[model]] = frc_ts[frc_ts.columns[ts_num]]

ix = pd.date_range(ts[ts.columns[ts_num]].first_valid_index()+timedelta(140), ts[ts.columns[ts_num]].first_valid_index()+timedelta(170))
plot_ts.loc[ix].plot().update_layout(height=350, width=1300, title="$Which~\\alpha~is~better?$",
                  xaxis_title="time ticks",
                  yaxis_title="ts and forecast values").show() # :250

# frc_ts.columns = ts.columns+'; '+alg_title
#     ts[[ts.columns[ts_num]]].merge(frc_ts[[frc_ts.columns[ts_num]]], how = 'outer', left_index = True, right_index = True)\
#       .plot().update_layout(height=350, width=1300).show()


**Wow: we need to use big $\alpha$ for first steps of ES!**
    - We need to modify algorithm for tirst steps!

**Question:**
   * How algorithm can be modified for first steps?

## Examples of modification (self-study)

* First variant: let's make $\alpha$ higher for first time point of TS

In [119]:
def InitExponentialSmoothing(x, h, Params):
    T = len(x)
    alpha = Params['alpha']
    AdaptationPeriod=Params['AdaptationPeriod']
    FORECAST = [np.NaN]*(T+h)
    if alpha>1:
        w.warn('Alpha can not be more than 1')
        #alpha = 1
        return FORECAST
    if alpha<0:
        w.warn('Alpha can not be less than 0')
        #alpha = 0
        return FORECAST
    y = x[0]
    t0=0
    for t in range(0, T):
        if not math.isnan(x[t]):
            if math.isnan(y):
                y=x[t]
                t0=t
            if (t-t0+1)<AdaptationPeriod:
                y = y*(1-alpha)*(t-t0+1)/(AdaptationPeriod) + (1-(1-alpha)*(t-t0+1)/(AdaptationPeriod))*x[t]
            else:
                y = y*(1-alpha) + alpha*x[t]
            #else do not nothing
        FORECAST[t+h] = y
    return FORECAST

* Second variant: normalize weights of ES: $1 - (1-\alpha)^t$

In [120]:
def NormExponentialSmoothing(x, h, Params):
    T = len(x)
    alpha = Params['alpha']
    FORECAST = [np.NaN]*(T+h)
    if alpha>1:
        w.warn('Alpha can not be more than 1')
        #alpha = 1
        return FORECAST
    if alpha<0:
        w.warn('Alpha can not be less than 0')
        #alpha = 0
        return FORECAST

    y = 0
    norm = 0
    for t in range(0, T):
        if not math.isnan(x[t]):
            if math.isnan(y):
                norm=1  # initialize when first data point comes
            norm = norm*(1-alpha)
            y = y*(1-alpha) + (alpha)*x[t]
        FORECAST[t+h] = y/(1-norm)
    return FORECAST

In [204]:
#Fit parameters
ALPHA = [0.7, 0.4, 0.2, .15, 0.1, 0.05, 0.01]
ESParamsArray = [{'alpha':alpha, 'AdaptationPeriod': 5} for alpha in ALPHA]
FRC_TS = build_forecast(h=1, ts=ts, AlgName =  'InitExponentialSmoothing', AlgTitle='IES' ,ParamsArray = ESParamsArray)
FRC_TS.update(build_forecast(h=1, ts=ts, AlgName =  'NormExponentialSmoothing', AlgTitle='NES' ,ParamsArray = ESParamsArray))

In [205]:
# compare ES methods
qlt_ = pd.DataFrame(index = ts.columns, columns = sorted(FRC_TS.keys()))

for model in sorted(qlt_.columns):
    frc_ts = FRC_TS[model]
    qlt_[model],_ = qualityMAPE(ts, frc_ts)

qlt_plot = qlt_.unstack().reset_index().rename(columns = {'level_0':'algs', 'level_1':'items', 0:'MAPE'})
qlt_plot['alg_family'] = [x[:2] for x in qlt_plot['algs']]
qlt_plot['alpha'] = [x[x.find('alpha')+8:x.find(',')] for x in qlt_plot['algs']]
px.histogram(qlt_plot, x="alpha", y="MAPE",
             color='alg_family', barmode='group', histfunc='sum')

* Conclusion: **The optimal value of $\alpha$ is about the same for all ES modifications**

In [207]:
# compare ES methods in first 100 steps
qlt_100 = pd.DataFrame(index = ts.columns, columns = sorted(FRC_TS.keys()))

for model in qlt_100.columns:
    frc_ts = FRC_TS[model]
    for ts_num in ts.columns:
        ix = pd.date_range(ts[ts_num].first_valid_index(), ts[ts_num].first_valid_index()+timedelta(50))
        qlt_100[model][ts_num],_ = qualityMAPE(ts[ts_num].loc[ix], frc_ts[ts_num].loc[ix])


qlt_plot = qlt_100.unstack().reset_index().rename(columns = {'level_0':'algs', 'level_1':'items', 0:'MAPE'})
qlt_plot['alg_family'] = [x[:2] for x in qlt_plot['algs']]
qlt_plot['alpha'] = [x[x.find('alpha')+8:x.find(',')] for x in qlt_plot['algs']]
px.histogram(qlt_plot, x="alpha", y="MAPE",
             color='alg_family', barmode='group', histfunc='avg')

## Adaptive ES (self-study)
Лукашин Ю.П. Адаптивные методы краткосрочного прогнозирования временных рядов. Финансы и статистика. 2003, глава 1

Tracking signal see 1.

$e_t = y_t - \hat{y}_t$

$\tilde{e}_t = \gamma e_{t-1} + (1-\gamma) \tilde{e}_{t-1}$

$\overline{e}_t = \gamma \left|e_{t-1}\right| + (1-\gamma) \overline{e}_{t-1}$

* Tracking signal

$$K_t = \frac{\tilde{e}_t}{\overline{e}_t}$$

* to do algorithm more stable
$$\alpha_t = \left|K_{t-1}\right|$$

In [132]:
# AdaptiveExponentialSmoothing
# x <array Tx1>- time series,
# h <scalar> - forecasting delay
# Params <dict> - dictionary with
#    alpha <scalar in [0,1]> - smoothing parameter
#    AdaptivePeriod scalar> - adapation period for initialization
#    gamma<scalar in [0,1]> - parametr of cross validation

def AdaptiveExponentialSmoothing(x, h, Params):
    T = len(x)
    alpha = Params['alpha']
    gamma = Params['gamma']
    AdaptationPeriod=Params['AdaptationPeriod']
    FORECAST = [np.NaN]*(T+h)
    if alpha>1:
        w.warn('Alpha can not be more than 1')
        #alpha = 1
        return FORECAST
    if alpha<0:
        w.warn('Alpha can not be less than 0')
        #alpha = 0
        return FORECAST
    y = np.NaN
    t0= np.NaN
    e1= np.NaN
    e2= np.NaN
    Kt_1 = alpha
    K=alpha
    for t in range(0, T):
        if not math.isnan(x[t]):
            if math.isnan(y):
                y=x[t]
                t0=t
                e1=alpha
                e2 = 1
            else:
                if (t-t0)<h:
                    e1 = gamma*(x[t]-y)+(1-gamma)*e1
                    e2 = gamma*np.abs(x[t]-y)+(1-gamma)*e2
                else:
                    e1 = gamma*(x[t]-FORECAST[t])+(1-gamma)*e1
                    e2 = gamma*np.abs(x[t]-FORECAST[t])+(1-gamma)*e2

            if e2==0:
                K=alpha
            else:
                K=np.abs(e1/e2)

            alpha=Kt_1
            Kt_1=K

            if (t-t0+1)<AdaptationPeriod:
                y = y*(1-alpha)*(t-t0+1)/(AdaptationPeriod) + (1-(1-alpha)*(t-t0+1)/(AdaptationPeriod))*x[t]
            else:
                y = y*(1-alpha) + (alpha)*x[t]
        FORECAST[t+h] = y
    return FORECAST

In [157]:
#Fit parameters
GAMMA = [0.1, 0.01, 0.005, 0.001, 0.0005, 0.0001, 0.00005]
alpha = 0.1
AESParamsArray = [{'alpha':alpha, 'gamma':gamma, 'AdaptationPeriod': 5} for gamma in GAMMA]
FRC_TS = build_forecast(h=1, ts=ts, AlgName =  'SimpleExponentialSmoothing', AlgTitle='ES' ,ParamsArray = ESParamsArray)
FRC_TS.update(build_forecast(h=1, ts=ts, AlgName =  'InitExponentialSmoothing', AlgTitle='IES' ,ParamsArray = ESParamsArray))
FRC_TS.update(build_forecast(h=1, ts=ts, AlgName =  'NormExponentialSmoothing', AlgTitle='NES' ,ParamsArray = ESParamsArray))
FRC_TS.update(build_forecast(h=1, ts=ts, AlgName =  'AdaptiveExponentialSmoothing', AlgTitle='AES' ,ParamsArray = AESParamsArray))

In [159]:
# compare ES methods
qlt_ = pd.DataFrame(index = ts.columns, columns = sorted(FRC_TS.keys()))

for model in sorted(qlt_.columns):
    frc_ts = FRC_TS[model]
    qlt_[model],_ = qualityMAPE(ts, frc_ts)



qlt_plot = qlt_.unstack().reset_index().rename(columns = {'level_0':'algs', 'level_1':'items', 0:'MAPE'})
qlt_plot['alg_family'] = [x[:2] for x in qlt_plot['algs']]
qlt_plot['alpha'] = [x[x.find('alpha')+8:x.find(',')] for x in qlt_plot['algs']]
px.histogram(qlt_plot, x="items", y="MAPE",
             color='alg_family', barmode='group', histfunc='min')

** Conclusion**: Adaptive ES rarely outperforms SES and it's modifications

In [160]:
# Sort Quality
qlt_[qlt_.columns].mean().sort_values()

NES {'alpha': 0.01, 'AdaptationPeriod': 5}                    0.790717
ES {'alpha': 0.01, 'AdaptationPeriod': 5}                     0.791620
AES {'alpha': 0.1, 'gamma': 0.005, 'AdaptationPeriod': 5}     0.796167
NES {'alpha': 0.05, 'AdaptationPeriod': 5}                    0.800726
ES {'alpha': 0.05, 'AdaptationPeriod': 5}                     0.800987
AES {'alpha': 0.1, 'gamma': 0.001, 'AdaptationPeriod': 5}     0.802603
AES {'alpha': 0.1, 'gamma': 0.0005, 'AdaptationPeriod': 5}    0.804516
NES {'alpha': 0.1, 'AdaptationPeriod': 5}                     0.804587
ES {'alpha': 0.1, 'AdaptationPeriod': 5}                      0.804595
AES {'alpha': 0.1, 'gamma': 0.01, 'AdaptationPeriod': 5}      0.805314
IES {'alpha': 0.05, 'AdaptationPeriod': 5}                    0.806022
AES {'alpha': 0.1, 'gamma': 0.0001, 'AdaptationPeriod': 5}    0.806740
AES {'alpha': 0.1, 'gamma': 5e-05, 'AdaptationPeriod': 5}     0.807060
IES {'alpha': 0.1, 'AdaptationPeriod': 5}                     0.807400
ES {'a

## Case when SES doesn't work

In [161]:
# Wage data in RF
wage = pd.read_csv('https://raw.githubusercontent.com/aromanenko/ATSF/main/data/monthly-wage.csv', sep=';', index_col= 0, parse_dates=True)
wage.plot().update_layout(height=350, width=1300).show()

**Questions**
    - Which charachteristic of TS can you mention so far?

Wage ts forecast with SES alpha =0.1

In [162]:
ESParamsArray = [{'alpha':0.1, 'AdaptationPeriod':10}]
FRC_WAGE = build_forecast(h=1, ts=wage, AlgName =  'SimpleExponentialSmoothing', AlgTitle='SES' ,ParamsArray = ESParamsArray)

plot_ts_forecast(wage.loc['1993-01-01':'2017-01-01'], FRC_WAGE[list(FRC_WAGE)[0]].loc['1993-01-01':'2017-01-01']
               , ts_num=0, alg_title='IES alpha=0.1')

**Question**
  * What indicates that forecast is inadequate (=there could be more proper forecastl)?

Search for the optimal $\alpha$

In [163]:
ALPHA = np.linspace(0.01,0.99,99)
ESParamsArray = [{'alpha':alpha, 'AdaptationPeriod':10} for alpha in ALPHA]
FRC_WAGE = build_forecast(h=1, ts=wage, AlgName =  'SimpleExponentialSmoothing', AlgTitle='SES' ,ParamsArray = ESParamsArray)

In [164]:
# compare ES parameters
QualityStr = pd.DataFrame(index = wage.columns, columns = FRC_WAGE.keys())

ix = wage.loc['1998-09-01':'2018-01-01'].index
for param_cntr in sorted(QualityStr.columns):
    frc_wage = FRC_WAGE[param_cntr]
    QualityStr[param_cntr],_ = qualityMAPE(wage.loc[ix], frc_wage.loc[ix])

QualityStr[QualityStr.columns].mean().sort_values()

SES {'alpha': 0.38, 'AdaptationPeriod': 10}                   0.058839
SES {'alpha': 0.37, 'AdaptationPeriod': 10}                   0.058851
SES {'alpha': 0.39, 'AdaptationPeriod': 10}                   0.058851
SES {'alpha': 0.4, 'AdaptationPeriod': 10}                    0.058872
SES {'alpha': 0.36000000000000004, 'AdaptationPeriod': 10}    0.058888
                                                                ...   
SES {'alpha': 0.05, 'AdaptationPeriod': 10}                   0.129426
SES {'alpha': 0.04, 'AdaptationPeriod': 10}                   0.147997
SES {'alpha': 0.03, 'AdaptationPeriod': 10}                   0.175454
SES {'alpha': 0.02, 'AdaptationPeriod': 10}                   0.219394
SES {'alpha': 0.01, 'AdaptationPeriod': 10}                   0.298435
Length: 99, dtype: float64

draw the forecast with optimial value $\alpha $

In [166]:
algName = QualityStr[QualityStr.columns].mean().sort_values().index[0]
plot_ts_forecast(wage.loc['1999-01-01':'2017-01-01'], FRC_WAGE[algName].loc['1999-01-01':'2017-01-01']
               , ts_num=0, alg_title=algName)

**Question**
  - Why is the forecast still inadequate?
  - Is it possible to detect forecast illness looking at forecast accuracy?
  - What is the best rule to detect that SES is not proper?

Calculate loss of the forecast of TS in [02.2016, 01.2017]

In [167]:
qualityMAPE(wage.loc['2016-02-01':'2017-01-01'], FRC_WAGE[algName].loc['2016-02-01':'2017-01-01'])[0]

Real wage    0.056378
dtype: float64

SES to Yearly Wage Data

In [168]:
# Aggregate original TS by Years
wage_year = wage.resample("12MS").sum()[:-1] # cut 2017 year
wage_year[-4:]

Unnamed: 0_level_0,Real wage
Month,Unnamed: 1_level_1
2013-01-01,2940.9
2014-01-01,3007.2
2015-01-01,2766.7
2016-01-01,2790.6


In [169]:
wage_year.plot().update_layout(height=350, width=1300).show()

Search of optimal $\alpha$ for aggregated data

In [170]:
ALPHA = np.linspace(0.01,1,100)
ESParamsArray = [{'alpha':alpha, 'AdaptationPeriod':10} for alpha in ALPHA]
FRC_WAGE_YEAR = build_forecast(h=1, ts=wage_year, AlgName =  'SimpleExponentialSmoothing', AlgTitle='IES'
                              ,ParamsArray = ESParamsArray, step='12MS')

In [171]:
# compare ES parameters
QualityStr = pd.DataFrame(index = wage_year.columns, columns = FRC_WAGE_YEAR.keys())

ix = wage_year.loc['1999-01':'2010-01'].index
for param_cntr in sorted(QualityStr.columns):
    frc_wage = FRC_WAGE_YEAR[param_cntr]
    QualityStr[param_cntr],_ = qualityMAPE(wage_year.loc[ix], frc_wage.loc[ix])

QualityStr[QualityStr.columns].mean().sort_values()[:5]

IES {'alpha': 1.0, 'AdaptationPeriod': 10}                   0.132986
IES {'alpha': 0.99, 'AdaptationPeriod': 10}                  0.133505
IES {'alpha': 0.98, 'AdaptationPeriod': 10}                  0.134027
IES {'alpha': 0.97, 'AdaptationPeriod': 10}                  0.134553
IES {'alpha': 0.9600000000000001, 'AdaptationPeriod': 10}    0.135084
dtype: float64

Forecast with optimial value  $\alpha$

In [173]:
algName = QualityStr[QualityStr.columns].mean().sort_values().index[0]
plot_ts_forecast(wage_year.loc['1999-01-01':'2016-01-01'], FRC_WAGE_YEAR[algName].loc['1999-01-01':'2016-01-01']
               , ts_num=0, alg_title=algName)

print('MAPE: %s' % qualityMAPE(wage_year.loc['2015-01-01':'2016-01-01'], FRC_WAGE_YEAR[algName].loc['2015-01-01':'2016-01-01'])[0])

MAPE: Real wage    0.047746
dtype: float64


**Question**
  - Why is the forecast still inadequate?
  - Is it possible to detect forecast illness looking at forecast accuracy?
  - What is the best rule to detect that SES is not proper?



**<font color='green'> Remember</font>**
  
   Empirical rules:

   - if $\alpha^*\in(0,0.3)$ the series is stationary, SES works;

   - if $\alpha^*\in(0.3,1)$ the series is non-stationary, we need a more sophisticated (trend or seasonal) model.

# Prophet vs Simple Exponential Smoothing

In [174]:
from copy import deepcopy
quality_wholehist1 = deepcopy(quality_wholehist)


In [177]:
!pip install prophet



In [186]:
# forecast['ds']

Unnamed: 0_level_0,trend,yhat_lower,yhat_upper,trend_lower,trend_upper,additive_terms,additive_terms_lower,additive_terms_upper,weekly,weekly_lower,weekly_upper,yearly,yearly_lower,yearly_upper,multiplicative_terms,multiplicative_terms_lower,multiplicative_terms_upper,yhat
ds,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2007-01-18,4.337806,0.306423,7.046348,4.337806,4.337806,-0.604998,-0.604998,-0.604998,-0.350700,-0.350700,-0.350700,-0.254298,-0.254298,-0.254298,0.0,0.0,0.0,3.732808
2007-01-19,4.335819,1.352693,7.999425,4.335819,4.335819,0.312616,0.312616,0.312616,0.541077,0.541077,0.541077,-0.228461,-0.228461,-0.228461,0.0,0.0,0.0,4.648435
2007-01-20,4.333831,1.018691,7.889279,4.333831,4.333831,0.184866,0.184866,0.184866,0.379834,0.379834,0.379834,-0.194968,-0.194968,-0.194968,0.0,0.0,0.0,4.518698
2007-01-21,4.331844,0.839734,7.760514,4.331844,4.331844,0.048078,0.048078,0.048078,0.202403,0.202403,0.202403,-0.154324,-0.154324,-0.154324,0.0,0.0,0.0,4.379922
2007-01-24,4.325882,0.038563,6.748222,4.325882,4.325882,-0.761329,-0.761329,-0.761329,-0.764981,-0.764981,-0.764981,0.003652,0.003652,0.003652,0.0,0.0,0.0,3.564553
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2009-05-06,4.510118,1.254625,7.843604,4.483524,4.539552,0.125517,0.125517,0.125517,-0.764981,-0.764981,-0.764981,0.890498,0.890498,0.890498,0.0,0.0,0.0,4.635635
2009-05-07,4.512649,1.805827,8.158064,4.485550,4.542456,0.524675,0.524675,0.524675,-0.350700,-0.350700,-0.350700,0.875376,0.875376,0.875376,0.0,0.0,0.0,5.037325
2009-05-08,4.515181,2.703414,9.364145,4.487564,4.545372,1.395600,1.395600,1.395600,0.541077,0.541077,0.541077,0.854523,0.854523,0.854523,0.0,0.0,0.0,5.910780
2009-05-09,4.517712,2.444502,9.095164,4.489370,4.548295,1.208223,1.208223,1.208223,0.379834,0.379834,0.379834,0.828389,0.828389,0.828389,0.0,0.0,0.0,5.725935


In [192]:
from prophet import Prophet

#suppressing the info logs
import logging
logging.getLogger('fbprophet').setLevel(logging.ERROR)

for col in ts.columns:
    #creating a dataframe that fbprophet requires
    df = pd.DataFrame(ts[col])
    df['ds'] = df.index
    df.columns = ['y', 'ds']

    #making predictions
    m = Prophet()
    m.fit(df)
    #'make_future_dataframe' parameter decides how far into the future we're looking
    future = m.make_future_dataframe(100)
    forecast = m.predict(future)

    #plotting the results
    forecast.merge(df, how = 'outer', left_on = 'ds', right_on = 'ds')[['y','yhat']].plot().update_layout(height=350, width=1300).show()
    quality_wholehist1.loc[col, 'FBP'],_ = qualityMAPE(df.set_index('ds')['y'], forecast.set_index('ds')['yhat'])


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/ywo8f21o.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/rm3mrg32.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=37781', 'data', 'file=/tmp/tmp17r8sgfp/ywo8f21o.json', 'init=/tmp/tmp17r8sgfp/rm3mrg32.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_modelhob996pv/prophet_model-20231109225003.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:03 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:03 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/46v2toa_.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/5pd5txrj.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=56416', 'data', 'file=/tmp/tmp17r8sgfp/46v2toa_.json', 'init=/tmp/tmp17r8sgfp/5pd5txrj.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_modelrq0043u2/prophet_model-20231109225004.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:04 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:04 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/vng3f7n5.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/9kj0_v7d.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=72762', 'data', 'file=/tmp/tmp17r8sgfp/vng3f7n5.json', 'init=/tmp/tmp17r8sgfp/9kj0_v7d.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model77j2xj_2/prophet_model-20231109225004.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:04 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:05 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/8otcywc5.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/0f8vrn70.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=75662', 'data', 'file=/tmp/tmp17r8sgfp/8otcywc5.json', 'init=/tmp/tmp17r8sgfp/0f8vrn70.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model909qaxl2/prophet_model-20231109225005.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:05 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:05 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/rchyawvj.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/vp2o79c5.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=51536', 'data', 'file=/tmp/tmp17r8sgfp/rchyawvj.json', 'init=/tmp/tmp17r8sgfp/vp2o79c5.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_modelwwv1qwdn/prophet_model-20231109225006.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:06 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:06 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/hkm8fdbi.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/rs8argwl.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=24519', 'data', 'file=/tmp/tmp17r8sgfp/hkm8fdbi.json', 'init=/tmp/tmp17r8sgfp/rs8argwl.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model9im8d8ph/prophet_model-20231109225006.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:06 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:07 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/26ur77iq.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/psjekjqf.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=76709', 'data', 'file=/tmp/tmp17r8sgfp/26ur77iq.json', 'init=/tmp/tmp17r8sgfp/psjekjqf.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model2jbtvrjh/prophet_model-20231109225007.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:07 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:07 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/de51tmnx.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/cngzhexz.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=1776', 'data', 'file=/tmp/tmp17r8sgfp/de51tmnx.json', 'init=/tmp/tmp17r8sgfp/cngzhexz.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model9z4g4uxl/prophet_model-20231109225008.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:08 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:08 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/_8us62j7.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/dv2wz78x.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=57891', 'data', 'file=/tmp/tmp17r8sgfp/_8us62j7.json', 'init=/tmp/tmp17r8sgfp/dv2wz78x.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_modelvbzla_t2/prophet_model-20231109225009.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:09 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:09 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/mrijvub9.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/5njg3uw2.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=73753', 'data', 'file=/tmp/tmp17r8sgfp/mrijvub9.json', 'init=/tmp/tmp17r8sgfp/5njg3uw2.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_modelxl24h0yo/prophet_model-20231109225009.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:09 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:09 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/h_batuw9.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/fyn9o145.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=22146', 'data', 'file=/tmp/tmp17r8sgfp/h_batuw9.json', 'init=/tmp/tmp17r8sgfp/fyn9o145.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model7z0gncds/prophet_model-20231109225010.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:50:10 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:50:10 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


In [193]:
# generate forecast with SES for h = 100 next days after fs_start_dt
fs_start_dt = '07-01-2007'
h = 100

ALPHA = [0.4, 0.2, .15, 0.1, 0.05, 0.01, 0.005]
ESParamsArray = [{'alpha':alpha} for alpha in ALPHA]
FRC_TS = build_forecast(h=h, ts=ts[:fs_start_dt], AlgName =  'SimpleExponentialSmoothing', AlgTitle='ES' ,ParamsArray = ESParamsArray)


quality_100days = pd.DataFrame(index = ts.columns, columns = FRC_TS.keys())

# Quality within first 100 steps after fs_start_dt
for model in quality_100days.columns:
    frc_ts = FRC_TS[model]
    for ts_num in ts.columns:
        ix = pd.date_range(pd.to_datetime(fs_start_dt), pd.to_datetime(fs_start_dt)+timedelta(h) )
        quality_100days[model][ts_num],_ = qualityMAPE(ts[ts_num].loc[ix], frc_ts[ts_num].loc[ix])

In [196]:
# generate forecast since fs_start_dt
for col in ts.columns:
    #creating a dataframe that fbprophet requires
    df = pd.DataFrame(ts.loc[:fs_start_dt, col])
    df['ds'] = ts[:fs_start_dt].index
    df.columns = ['y', 'ds']

    #making predictions
    m = Prophet()
    m.fit(df)
    #'make_future_dataframe' parameter decides how far into the future we're looking
    future = m.make_future_dataframe(100)
    forecast = m.predict(future)

    #plotting the results
    forecast.set_index('ds').merge(df.set_index('ds'), how = 'outer', left_index = True, right_index = True)[['y','yhat']].plot().update_layout(height=350, width=1300).show()
    quality_100days.loc[col, 'FBP'],_ = qualityMAPE(ts.loc[fs_start_dt:, col], forecast.set_index('ds').loc[fs_start_dt:, 'yhat'])


INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/4t07oou0.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/fdx4dt3e.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=53308', 'data', 'file=/tmp/tmp17r8sgfp/4t07oou0.json', 'init=/tmp/tmp17r8sgfp/fdx4dt3e.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_modelyjdjly6e/prophet_model-20231109225511.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:11 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:11 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/ng0g3coo.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/h7d2i2lz.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=10093', 'data', 'file=/tmp/tmp17r8sgfp/ng0g3coo.json', 'init=/tmp/tmp17r8sgfp/h7d2i2lz.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model05dhhzkw/prophet_model-20231109225512.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:12 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:12 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/sdgv5vmu.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/mmaid5iq.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=26583', 'data', 'file=/tmp/tmp17r8sgfp/sdgv5vmu.json', 'init=/tmp/tmp17r8sgfp/mmaid5iq.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model78nti987/prophet_model-20231109225512.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:12 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:12 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/dpx27j6r.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/wy5ffa1y.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=95986', 'data', 'file=/tmp/tmp17r8sgfp/dpx27j6r.json', 'init=/tmp/tmp17r8sgfp/wy5ffa1y.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model21sruwjn/prophet_model-20231109225513.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:13 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:13 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/4i169tq9.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/5_a6u1bf.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=32108', 'data', 'file=/tmp/tmp17r8sgfp/4i169tq9.json', 'init=/tmp/tmp17r8sgfp/5_a6u1bf.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_modelu05fgkn2/prophet_model-20231109225513.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:13 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:13 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/8_7cmeyn.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/sgnujxsf.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=79352', 'data', 'file=/tmp/tmp17r8sgfp/8_7cmeyn.json', 'init=/tmp/tmp17r8sgfp/sgnujxsf.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_modelqdi0_rdt/prophet_model-20231109225514.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:14 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:14 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/tfoa6ikr.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/ptoz9ra4.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=64882', 'data', 'file=/tmp/tmp17r8sgfp/tfoa6ikr.json', 'init=/tmp/tmp17r8sgfp/ptoz9ra4.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model6lhkwj_j/prophet_model-20231109225515.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:15 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:15 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/_m5deicj.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/gfvj3mcd.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=1959', 'data', 'file=/tmp/tmp17r8sgfp/_m5deicj.json', 'init=/tmp/tmp17r8sgfp/gfvj3mcd.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model6tsq2o8i/prophet_model-20231109225516.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:16 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:16 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/z2gfgqfc.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/5mr9i6ti.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=91326', 'data', 'file=/tmp/tmp17r8sgfp/z2gfgqfc.json', 'init=/tmp/tmp17r8sgfp/5mr9i6ti.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_modelh5rxrexd/prophet_model-20231109225516.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:16 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:16 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/85rni_z9.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/bof5tpu7.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=56317', 'data', 'file=/tmp/tmp17r8sgfp/85rni_z9.json', 'init=/tmp/tmp17r8sgfp/bof5tpu7.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_modelrc0ypiq8/prophet_model-20231109225517.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:17 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:17 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


INFO:prophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/pdgl1waz.json
DEBUG:cmdstanpy:input tempfile: /tmp/tmp17r8sgfp/hi7h2j_2.json
DEBUG:cmdstanpy:idx 0
DEBUG:cmdstanpy:running CmdStan, num_threads: None
DEBUG:cmdstanpy:CmdStan args: ['/usr/local/lib/python3.10/dist-packages/prophet/stan_model/prophet_model.bin', 'random', 'seed=2468', 'data', 'file=/tmp/tmp17r8sgfp/pdgl1waz.json', 'init=/tmp/tmp17r8sgfp/hi7h2j_2.json', 'output', 'file=/tmp/tmp17r8sgfp/prophet_model0qypg6_a/prophet_model-20231109225517.csv', 'method=optimize', 'algorithm=lbfgs', 'iter=10000']
22:55:17 - cmdstanpy - INFO - Chain [1] start processing
INFO:cmdstanpy:Chain [1] start processing
22:55:17 - cmdstanpy - INFO - Chain [1] done processing
INFO:cmdstanpy:Chain [1] done processing


In [198]:
# compare accuracy of SES and prophet
px.histogram(quality_100days.unstack().reset_index().rename(columns = {'level_0':'algs', 'level_1':'items', 0:'MAPE'}), x="items", y="MAPE",
             color='algs', barmode='group', histfunc='sum')

# Chech Questions
  * What is an idea of exponential smoothing model? Why expinential smoothing model could work for some time series?
  * What is the difference between moving average and exponential smoothing?
  * Why ES model has exponential law?
  * Descrive behaviour of exponentail smoothing forecast when smoothing paramter $\alpha$ goes to 0 (goes to 1).
  * Write down formula of exponential smoothing forecast.
  * How to initialize forecast value of the first element of time series ?
  * How to fit smoothing paramter $\alpha$?
  * How to define that Simple Exponential Smoothing model does not fit to a time series?

# Materials
* Лукашин Ю.П. Адаптивные методы краткосрочного прогнозирования временных рядов. Финансы и статистика. 2003, главы 1,4,5,7.