Updated vignette of oes(), reflecting the model one source of uncerta…

…inty for the occurrence.
config-i1 · Feb 8, 2020 · 62b9710 · 62b9710
1 parent f7f5433
commit 62b9710
Show file tree

Hide file tree

Showing 3 changed files with 43 additions and 41 deletions.
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,8 +1,8 @@
 Package: smooth
 Type: Package
 Title: Forecasting Using State Space Models
-Version: 2.5.5.41008
-Date: 2020-01-28
+Version: 2.5.5.41009
+Date: 2020-02-08
 Authors@R: person("Ivan", "Svetunkov", email = "ivan@svetunkov.ru", role = c("aut", "cre"),
                   comment="Lecturer at Centre for Marketing Analytics and Forecasting, Lancaster University, UK")
 URL: https://github.com/config-i1/smooth

diff --git a/NEWS b/NEWS
@@ -1,4 +1,4 @@
-smooth v2.5.4 (Release data: 2020-01-25)
+smooth v2.5.5 (Release data: 2020-01-25)
 ==============
 
 Changes:
@@ -10,6 +10,7 @@ Changes:
 * forecast with prediction intervals for msdecompose().
 * residuals.smooth() now returns either et or log(1+et) - depending on the type of model (additive / multiplicative).
 * New methods: rstandard() and rstudent() for the smooth class.
+* Updated vignette for oes(), taking the last version of our paper with John E. Boylan.
 
 Bugfix:
 * es() with multiplicative models would fail sometimes in case of bounds="a" because of the negative values. We now restrict the bounds with positive values only.

diff --git a/vignettes/oes.Rmd b/vignettes/oes.Rmd
@@ -25,24 +25,22 @@ The canonical general iETS model (called iETS$_G$) can be summarised as:
     \begin{matrix}
         y_t = o_t z_t \\
 		o_t \sim \text{Bernoulli} \left(p_t \right) \\
-		p_t = f(a_t, b_t) \\
+		p_t = f(\mu_{a,t}, \mu_{b,t}) \\
 		a_t = w_a(v_{a,t-L}) + r_a(v_{a,t-L}) \epsilon_{a,t} \\
 		v_{a,t} = f_a(v_{a,t-L}) + g_a(v_{a,t-L}) \epsilon_{a,t} \\
-		(1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \\
 		b_t = w_a(v_{b,t-L}) + r_a(v_{b,t-L}) \epsilon_{b,t} \\
-		v_{b,t} = f_a(v_{b,t-L}) + g_a(v_{b,t-L}) \epsilon_{b,t} \\
-		(1 + \epsilon_{b,t}) \sim \text{log}\mathcal{N}(0, \sigma_{b}^2)
+		v_{b,t} = f_a(v_{b,t-L}) + g_a(v_{b,t-L}) \epsilon_{b,t}
     \end{matrix},
 \end{equation}
-where $y_t$ is the observed values, $z_t$ is the demand size, which is a pure multiplicative ETS model on its own, $w(\cdot)$ is the measurement function, $r(\cdot)$ is the error function, $f(\cdot)$ is the transition function and $g(\cdot)$ is the persistence function (the subscripts allow separating the functions for different parts of the model). These four functions define how the elements of the vector $v_{t}$ interact with each other. Furthermore, $\epsilon_{a,t}$ and $\epsilon_{b,t}$ are the mutually independent error terms, $o_t$ is the binary occurrence variable (1 - demand is non-zero, 0 - no demand in the period $t$) which is distributed according to Bernoulli with probability $p_t$ that has a logit-normal distribution ($o_t \sim \text{Bernoulli} \left(p_t \right)$, $p_t \sim \text{logit} \mathcal{N}$). Any ETS model can be used for $a_t$ and $b_t$, and the transformation of them into the probability $p_t$ depends on the type of the error. The general formula for the multiplicative error is:
+where $y_t$ is the observed values, $z_t$ is the demand size, which is a pure multiplicative ETS model on its own, $w(\cdot)$ is the measurement function, $r(\cdot)$ is the error function, $f(\cdot)$ is the transition function and $g(\cdot)$ is the persistence function (the subscripts allow separating the functions for different parts of the model). These four functions define how the elements of the vector $v_{t}$ interact with each other. Furthermore, $\epsilon_{a,t}$ and $\epsilon_{b,t}$ are the mutually independent error terms that follow unknown distribution, $o_t$ is the binary occurrence variable (1 - demand is non-zero, 0 - no demand in the period $t$) which is distributed according to Bernoulli with probability $p_t$. $\mu_{a,t}$ and $\mu_{b,t}$ are the conditional expectations for the unobservable variables $a_t$ and $b_t$. Any ETS model can be used for $a_t$ and $b_t$, and the transformation of them into the probability $p_t$ depends on the type of the error. The general formula for the multiplicative error is:
 \begin{equation} \label{eq:oETS(MZZ)}
-    p_t = \frac{a_t}{a_t+b_t} ,
+    p_t = \frac{\mu_{a,t}}{\mu_{a,t}+\mu_{b,t}} ,
 \end{equation}
 while for the additive error it is:
 \begin{equation} \label{eq:oETS(AZZ)}
-    p_t = \frac{\exp(a_t)}{\exp(a_t)+\exp(b_t)} .
+    p_t = \frac{\exp(\mu_{a,t})}{\exp(\mu_{a,t})+\exp(\mu_{b,t})} .
 \end{equation}
-This is because both $a_t$ and $b_t$ need to be positive, and the additive error models support the real plane. The canonical iETS model assumes that the pure multiplicative model is used for the both $a_t$ and $b_t$. This type of model is positively defined for any values of error, trend and seasonality, which is essential for the values of $a_t$ and $b_t$. If a combination of additive and multiplicative error models is used, then the additive part is exponentiated prior to the usage of the formulae for the calculation of the probability.
+This is because both $\mu_{a,t}$ and $\mu_{b,t}$ need to be positive, and the additive error models support the real plane. The canonical iETS model assumes that the pure multiplicative model is used for the both $a_t$ and $b_t$. This type of model is positively defined for any values of error, trend and seasonality, which is essential for the values of $a_t$ and $b_t$ and their expectations. If a combination of additive and multiplicative error models is used, then the additive part is exponentiated prior to the usage of the formulae for the calculation of the probability.
 
 An example of an iETS model is the basic local-level model iETS(M,N,N)$_G$(M,N,N)(M,N,N):
 \begin{equation} \label{eq:iETSGExample}
@@ -53,16 +51,18 @@ An example of an iETS model is the basic local-level model iETS(M,N,N)$_G$(M,N,N
 		(1 + \epsilon_{t}) \sim \text{log}\mathcal{N}(0, \sigma_\epsilon^2) \\
 		\\
 		o_t \sim \text{Bernoulli} \left(p_t \right) \\
-		p_t = \frac{a_t}{a_t+b_t} \\
+		p_t = \frac{\mu_{a,t}}{\mu_{a,t}+\mu_{b,t}} \\
+		\\
 		a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\
 		l_{a,t} = l_{a,t-1}( 1  + \alpha_{a} \epsilon_{a,t}) \\
-		(1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \\
+		\mu_{a,t} = l_{a,t-1} \\
+		\\
 		b_t = l_{b,t-1} \left(1 + \epsilon_{b,t} \right) \\
 		l_{b,t} = l_{b,t-1}( 1  + \alpha_{b} \epsilon_{b,t}) \\
-		(1 + \epsilon_{b,t}) \sim \text{log}\mathcal{N}(0, \sigma_{b}^2)
+		\mu_{b,t} = l_{b,t-1} \\
     \end{matrix},
 \end{equation}
-where $l_{a,t}$ and $l_{b,t}$ are the levels for each of the shape parameters and $\alpha_{a}$ and $\alpha_{b}$ are the smoothing parameters. More advanced models can be constructing by specifying the ETS models for each part and / or adding explanatory variables.
+where $l_{a,t}$ and $l_{b,t}$ are the levels for each of the shape parameters and $\alpha_{a}$ and $\alpha_{b}$ are the smoothing parameters and the error terms $1+\epsilon_{a,t}$ and $1+\epsilon_{b,t}$ are positive and have means of one. We do not make any other distributional assumptions concerning the error terms. More advanced models can be constructing by specifying the ETS models for each part and / or adding explanatory variables.
 
 In the notation of the model iETS(M,N,N)$_G$(M,N,N)(M,N,N), the first brackets describe the ETS model for the demand sizes, the underscore letter points out at the specific subtype of model (see below), the second brackets describe the ETS model, underlying the variable $a_t$ and the last ones stand for the model for the $b_t$. If only one variable is needed (either $a_t$ or $b_t$), then the redundant brackets are dropped, so that the notation simplifies, for example, to: iETS(M,N,N)$_O$(M,N,N). If the same type of model is used for both demand sizes and demand occurrence, then the second brackets can be dropped as well, simplifying the view to: iETS(M,N,N)$_G$. Furthermore, the notation without any brackets, such as iETS$_G$ stands for a general class of a specific subtype of iETS model (so any error / trend / seasonality). Also, given that iETS$_G$ is the most general model of all iETS models, the "$G$" can be dropped, when the properties are applicable to all subtypes. Finally, the "oETS" notation is used when the occurrence part of the model is discussed explicitly, skipping the demand sizes.
 
@@ -74,13 +74,13 @@ where $\textbf{Y}$ is the vector of all the in-sample observations, $\boldsymbol
 
 Depending on the restrictions on $a_t$ and $b_t$, there can be several iETS models:
 
-1. iETS$_F$: $a_t = \text{const}$, $b_t = \text{const}$ - the model with the fixed probability of occurrence;
-2. iETS$_O$: $b_t = 1$ - the "odds ratio" model, based on the logistic transform of the $o_t$ variable;
-3. iETS$_I$: $a_t = 1$ - the "inverse odds ratio" model, based on the inverse logistic transform of the $o_t$ variable;
-4. iETS$_D$: $a_t + b_t = 1$, $a_t \leq 1$ - the direct probability model, where the $p_t$ is calculated directly from the occurrence variable $o_t$;
-5. iETS$_G$: No restrictions - the model based on the evolution of both $a_t$ and $b_t$.
+1. iETS$_F$: $\mu_{a,t} = \text{const}$, $\mu_{b,t} = \text{const}$ - the model with the fixed probability of occurrence;
+2. iETS$_O$: $\mu_{b,t} = 1$ - the "odds ratio" model, based on the logistic transform of the $o_t$ variable;
+3. iETS$_I$: $\mu_{a,t} = 1$ - the "inverse odds ratio" model, based on the inverse logistic transform of the $o_t$ variable;
+4. iETS$_D$: $\mu_{a,t} + \mu_{b,t} = 1$, $\mu_{a,t} \leq 1$ - the direct probability model, where the $p_t$ is calculated directly from the occurrence variable $o_t$;
+5. iETS$_G$: No restrictions - the model based on the evolution of both $\mu_{a,t}$ and $\mu_{b,t}$.
 
-Depending on the type of the model, there are different mechanisms of the model construction, error calculation, update of the states and the generation of forecasts. The one thing uniting all the subtypes of models is that all the multiplicative error ETS models in `smooth` package rely on the conditional medians rather than on the conditional means. This is caused by the assumption of log normality of the error term and simplifies some of the calculations, preserving the main idea of ETS models (separation of components of the model).
+Depending on the type of the model, there are different mechanisms of the model construction, error calculation, update of the states and the generation of forecasts. The one thing uniting all the subtypes of models is that all the multiplicative error ETS models in `smooth` package rely on the conditional medians for the demand sizes part of the model rather than on the conditional means. This is caused by the assumption of log normality of the error term and simplifies some of the calculations, preserving the main idea of ETS models (separation of components of the model).
 
 In this vignette we will use ETS(M,N,N) model as a base for the different parts of the models. Although, this is a simplification, it allows better understanding the basics of the different types of iETS model, without the loss of generality.
 
@@ -103,7 +103,7 @@ In case of the fixed $a_t$ and $b_t$, the iETS$_G$ model reduces to:
 
 The conditional h-steps ahead mean of the demand occurrence probability is calculated as:
 \begin{equation} \label{eq:pt_fixed_expectation}
-	\mu_{o,t+h|t} = \tilde{p}_{t+h|t} = \hat{p} .
+	\hat{o}_{t+h|t} = \hat{p} .
 \end{equation}
 
 The estimate of the probability $p$ is calculated based on the maximisation of the following concentrated log-likelihood function:
@@ -112,9 +112,9 @@ The estimate of the probability $p$ is calculated based on the maximisation of t
 \end{equation}
 where $T_0$ is the number of zero observations and $T_1$ is the number of non-zero observations in the data. The number of estimated parameters in this case is equal to $k_z+1$, where $k_z$ is the number of parameters for the demand sizes part, and 1 is for the estimation of the probability $p$. Maximising this likelihood deems the analytical solution for the $p$:
 \begin{equation} \label{eq:ISSETS(MNN)FixedLikelihoodSmoothProbability}
-	\hat{p} = \frac{1}{T} \sum_{t=1}^T o_t ,
+	\hat{p} = \frac{T_1}{T},
 \end{equation}
-where $T$ is the number of all the available observations.
+where $T_1$ is the number of non-zero observations and $T$ is the number of all the available observations.
 
 The occurrence part of the model oETS$_F$ is constructed using `oes()` function:
 ```{r iETSFExample1}
@@ -130,15 +130,15 @@ es(y, "MMN", occurrence="fixed", h=10, holdout=TRUE, silent=FALSE)
 
 
 ## iETS$_O$
-The odds-ratio iETS uses only one model for the occurrence part, for the $a_t$ variable (setting $b_t=1$), which simplifies the iETS$_G$ model. For example, for the iETS$_O$(M,N,N):
+The odds-ratio iETS uses only one model for the occurrence part, for the $\mu_{a,t}$ variable (setting $\mu_{b,t}=1$), which simplifies the iETS$_G$ model. For example, for the iETS$_O$(M,N,N):
 \begin{equation} \label{eq:iETSO} \tag{5}
     \begin{matrix}
         y_t = o_t z_t \\
 		o_t \sim \text{Bernoulli} \left(p_t \right) \\
-		p_t = \frac{a_t}{a_t+1} \\
+		p_t = \frac{\mu_{a,t}}{\mu_{a,t}+1} \\
 		a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\
 		l_{a,t} = l_{a,t-1}( 1  + \alpha_{a} \epsilon_{a,t}) \\
-		(1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2)
+		\mu_{a,t} = l_{a,t-1}
     \end{matrix}.
 \end{equation}
 
@@ -152,8 +152,9 @@ The construction of the model is done via the following set of equations (exampl
 		l_{a,t} = l_{a,t-1}( 1  + \alpha_{a} e_{a,t}) \\
 		1+e_{a,t} = \frac{u_t}{1-u_t} \\
 		u_{t} = \frac{1 + o_t - \hat{p}_t}{2}
-    \end{matrix}.
+    \end{matrix},
 \end{equation}
+where $\hat{a}_t$ is the estimate of $\mu_{a,t}$.
 
 Given that the model is estimated using the likelihood (2), it has $k_z+k_a$ parameters to estimate, where $k_z$ includes all the initial values, the smoothing parameters and the scale of the error of the demand sizes part of the model, and $k_a$ includes only initial values and the smoothing parameters of the model for the demand occurrence. In case of iETS$_O$(M,N,N) this number is equal to 5.
 
@@ -177,30 +178,31 @@ This gives an additional flexibility, because the construction can be done in tw
 
 
 ## iETS$_I$
-Similarly to the odds-ratio iETS, inverse-odds-ratio model uses only one model for the occurrence part, but for the $b_t$ variable instead of $a_t$ (now $a_t=1$). Here is an example of iETS$_I$(M,N,N):
+Similarly to the odds-ratio iETS, inverse-odds-ratio model uses only one model for the occurrence part, but for the $\mu_{b,t}$ variable instead of $\mu_{a,t}$ (now $\mu_{a,t}=1$). Here is an example of iETS$_I$(M,N,N):
 \begin{equation} \label{eq:iETSI} \tag{6}
     \begin{matrix}
         y_t = o_t z_t \\
 		o_t \sim \text{Bernoulli} \left(p_t \right) \\
-		p_t = \frac{1}{1+b_t} \\
+		p_t = \frac{1}{1+\mu_{b,t}} \\
 		b_t = l_{b,t-1} \left(1 + \epsilon_{b,t} \right) \\
 		l_{b,t} = l_{b,t-1}( 1  + \alpha_{b} \epsilon_{b,t}) \\
-		(1 + \epsilon_{b,t}) \sim \text{log}\mathcal{N}(0, \sigma_{b}^2)
+		\mu_{b,t} = l_{b,t-1}
     \end{matrix}.
 \end{equation}
 
-In the estimation of the model, the initial level is set to the transformed mean probability of occurrence $l_{b,0}=\frac{1-\bar{p}}{\bar{p}}$ for multiplicative error model and $l_{b,0} = \log l_{b,0}$ for the additive one, where $\bar{p}=\frac{1}{T} \sum_{t=1}^T o_t$, the initial trend is equal to 0 in case of the additive and 1 in case of the multiplicative types. The seasonality is treated similar to the iETS$_O$ model, but using the inverse-odds transformation.
+This model resembles the logistic regression, where the probability is obtained from an underlying regressio model of $x_t'A$. In the estimation of the model, the initial level is set to the transformed mean probability of occurrence $l_{b,0}=\frac{1-\bar{p}}{\bar{p}}$ for multiplicative error model and $l_{b,0} = \log l_{b,0}$ for the additive one, where $\bar{p}=\frac{1}{T} \sum_{t=1}^T o_t$, the initial trend is equal to 0 in case of the additive and 1 in case of the multiplicative types. The seasonality is treated similar to the iETS$_O$ model, but using the inverse-odds transformation.
 
 The construction of the model is done via the set of equations similar to the ones for the iETS$_O$ model:
 \begin{equation} \label{eq:iETSIEstimation}
     \begin{matrix}
-        \hat{p}_t = \frac{\hat{b}_t}{\hat{b}_t+1} \\
+        \hat{p}_t = \frac{1}{1+\hat{b}_t} \\
 		\hat{b}_t = l_{b,t-1} \\
 		l_{b,t} = l_{b,t-1}( 1  + \alpha_{b} e_{b,t}) \\
 		1+e_{b,t} = \frac{1-u_t}{u_t} \\
 		u_{t} = \frac{1 + o_t - \hat{p}_t}{2}
-    \end{matrix}.
+    \end{matrix},
 \end{equation}
+where $\hat{b}_t$ is the estimate of $\mu_{b,t}$.
 
 So the model iETS$_I$ is like a mirror reflection of the model iETS$_O$. However, it produces different forecasts, because it focuses on the probability of non-occurrence, rather than the probability of occurrence. Interestingly enough, the probability of occurrence $p_t$ can also be estimated if $1+b_t$ in the denominator is set to be equal to the demand intervals (between the demand occurrences). The model (6) underlies Croston's method in this case.
 
@@ -225,29 +227,28 @@ es(y, "MMN", occurrence=oETSIModel, h=10, holdout=TRUE, silent=FALSE)
 ## iETS$_D$
 This model appears, when a specific restriction is imposed:
 \begin{equation} \label{eq:iETSGRestriction}
-    a_t + b_t = 1, a_t \in [0, 1]
+    \mu_{a,t} + \mu_{b,t} = 1, \mu_{a,t} \in [0, 1]
 \end{equation}
-The pure multiplicative iETS model is then transformed into:
+The pure multiplicative iETS$_G$(M,N,N) model is then transformed into iETS$_D$(M,N,N):
 \begin{equation} \label{eq:iETSD} \tag{7}
     \begin{matrix}
         y_t = o_t z_t \\
 		o_t \sim \text{Bernoulli} \left(a_t \right) \\
-		a_t = \min \left(l_{a,t-1} \left(1 + \epsilon_{a,t} \right) , 1 \right) \\
+		a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\
 		l_{a,t} = l_{a,t-1}( 1  + \alpha_{a} \epsilon_{a,t}) \\
-		(1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2)
+		\mu_{a,t} = \min(l_{a,t-1}, 1)
     \end{matrix}.
 \end{equation}
 An option with the additive model in this case has a different, more complicated form:
 \begin{equation} \label{eq:iETSDAdditive}
     \begin{matrix}
         y_t = o_t z_t \\
 		o_t \sim \text{Bernoulli} \left(a_t \right) \\
-		a_t = \max \left( \min \left(l_{a,t-1} + \epsilon_{a,t} , 1 \right), 0 \right) \\
+		a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\
 		l_{a,t} = l_{a,t-1}  + \alpha_{a} \epsilon_{a,t} \\
-		\epsilon_{a,t} \sim \mathcal{N}(0, \sigma_{a}^2)
+		\mu_{a,t} = \max \left( \min(l_{a,t-1}, 1), 0 \right)
     \end{matrix}.
 \end{equation}
-The conditional expectation and variance for the demand occurrence part in this case is quite nasty, although it has an analytical solution. In case of the additive model it might cause problems, so **the implementation in `smooth` is not accurate** and needs to be fixed. However, in case of multiplicative model, the medians are used, which are much simpler, and, as a result, the point forecasts for several steps ahead correspond to the ones from the ETS models, restricted with the [0, 1] region. So, I would recommend either sticking with the multiplicative error models or using some other iETS subtypes.
 
 The estimation of the multiplicative error model is done using the following set of equations:
 \begin{equation} \label{eq:ISSETS(MNN)_probability_estimate}