# 6 Forecasting

The goal of forecasting is to predict the future $x_{n+m}$ from the past $x_{1:n}=x_n,\dotsc,x_1$. We shall make the assumption that $\{x_t\}$ is stationary and the model parameters are known (or can be estimated).

## Mean Square Formulation 

Suppose we introduce the predictor $x_{n+m}^n$ (a constant rather a random variable) that minimize the conditional MSE. Denote $E = \mathbb E(x_{n+m}|x_{1:n})$. Then,
$$\begin{aligned}\mathbb E\left((x_{n+m} - x_{n+m}^n)^2|x_{1:n}\right)&
=\mathbb E\left(\left((x_{n+m} - E) + (E - x_{n+m}^n)\right)^2|x_{1:n}\right)\\ 
&=\mathbb E\left((x_{n+m} - E)^2|x_{1:n}\right) 
+ \mathbb E\left((E- x_{n+m}^n)^2|x_{1:n}\right)
\\
&\quad + 2\mathbb E\left(x_{n+m} - E|x_{1:n}\right) \mathbb E\left( E - x_{n+m}^n|x_{1:n}\right)
\end{aligned}$$

Note that $\mathbb E\left(x_{n+m} - E|x_{1:n}\right)=0$, we conclude 
$$\mathbb E\left((x_{n+m} - x_{n+m}^n)^2|x_{1:n}\right)\geqslant \mathbb E\left((x_{n+m} - E)^2|x_{1:n}\right) $$
with equality at $x_{n+m}^n\equiv E=\mathbb E(x_{n+m}|x_{1:n})$.

### Linear Predictor

Given a series (not necessarily stationar) $x_1,\dotsc,x_n$, the best linear predictor is given by
$$x_{n+m}^n = a_0+\sum_{k=1}^n a_k x_k.$$
If we denote $x_0 = 1$, then $x_{n+m}^n=  \sum_{k=0}^n a_k x_k$.
The coefficients are obtained by solving the linear system of (unconditional) expectance
$$\mathbb E\left((x_{n+m} - x_{n+m}^n)x_k\right)=0\quad(k=0,1,\dotsc,n).$$

In particular, if $\{x_t\}_{t\geqslant 0}$ is stationary, take $k = 0$ we obtain $\mathbb E(x_{n+m}^n)=\mathbb E(x_{n+m}) = 0$. Further, taking the expectance of $x_{n+m}^n = a_0+\sum_{k=1}^n a_k x_k$ on both sides yields $a_0 = 0$. Below we only consider the stationary process. (Nonstationary ones have the chance to be converted stationary by depriving their mean values.)

#### 


## One-step Forecasting

When $m = 1$, where we only predict one-step to the future, denote $\phi_{nj} = a_{n+1-j}$ and 
$$x_{n+1}^n = \phi_{n1}x_n +\phi_{n2}x_{n-1}+\dotsc +\phi_{nn}x_1 = \sum_{j=1}^n \phi_{nj}x_{n+1-j}.$$

On the one hand the linear system requires that $\mathbb E(x_{n+1}^nx_{n+1-k}) =\mathbb E(x_{n+1}x_{n+1-k}) = \gamma (k)$. On the other the coefficients determine that
$$\mathbb E(x_{n+1}^nx_{n+1-k}) = \sum_{j=1}^n\phi_{nj}\mathbb E(x_{n+1-j}x_{n+1-k}) =  \sum_{j=1}^n\phi_{nj}\gamma(k-j).$$

Therefore we can formulate our problem into the matrix form, $\Gamma_n\phi_n = \gamma_n$, i.e.
$$\left[\begin{matrix}\gamma(0) & \gamma(-1) & \gamma(-2) & \dotsc & \gamma(1-n)\\ 
\gamma(1) & \gamma(0) & \gamma(-1) & \dotsc & \gamma(2-n)\\
\gamma(2) & \gamma(1) & \gamma(0) &\dotsc & \gamma(3-n)\\ 
\vdots &\vdots &\vdots &\ddots &\vdots\\ 
\gamma(n-1)&\gamma(n-2) &\gamma(n-3)&\dotsc & \gamma(0) 
\end{matrix}\right]\left[\begin{matrix}\phi_{n1}\\\phi_{n2}\\\phi_{n3}\\\vdots \\\phi_{nn}\end{matrix}\right] 
= \left[\begin{matrix}\gamma(1)\\\gamma(2)\\\gamma(3)\\\vdots \\\gamma(n)\end{matrix}\right] $$

### Positivity

We have known that $\Gamma_n$ is positive semidefinite. Moreover, one can show that $\Gamma_n$ is strictly positive as long as $\gamma(0)>0$ and $\rho(h)\rightarrow 0$ as $h\rightarrow \infty$.

* Hint A: prove that $\exists v_m,\dotsc,v_1$ for some determined recurrsion $v_mx_m+v_{m-1}x_{m-1}+\dotsc +v_1x_1\equiv 0$ if $\Gamma$ is nonsingular by analyzing the variance.
* Hint B: prove the following lemma: if $A\in\mathbb C^{h\times h}$ is invertible and there exists some vector $v$ such that $A^nv\rightarrow 0$ when $n\rightarrow +\infty$ and $n\rightarrow -\infty$, then $v$ must be zero. The lemma can be proved by first considering the Jordan blocks.

### Mean Square Error

Denote by $P_{n+1}^n$ the MSE, then 
$$\begin{aligned}P_{n+1}^n= \mathbb E(x_{n+1} - \phi_n^Tx )^2&
=\mathbb E(x_{n+1}- \gamma_n^T\Gamma_n^{-1}x)^2\\
&=[1,- \gamma_n^T\Gamma_n^{-1}]^T\left[\begin{matrix}
\gamma(0) &\gamma_n^T \\
\gamma_n & \Gamma_n
\end{matrix}\right][1,- \gamma_n^T\Gamma_n^{-1}]=\gamma(0) - \gamma_n^T\Gamma_n^{-1}\gamma_n
.\end{aligned}$$


### Durbin-Levinson Algorithm

Durbin-Levinson algorithm is designed to recursively solve the Toeplitz system $\Gamma_n\phi_n = \gamma_n$. Let 
$\widetilde \gamma_n=[\gamma(n),\dotsc,\gamma(1)]^T$ and $\widetilde \phi_n = [\phi_{nn},\dotsc,\phi_{n1}]^T$ be the reverse of $\gamma_n$ and $  \phi_n$.
Note that it can be written in the form 
$$\left[\begin{matrix}
 \Gamma_{n-1}& \widetilde \gamma_{n-1} \\ \widetilde \gamma_{n-1}^T& \gamma(0)
\end{matrix}\right]\phi_n =\left[\begin{matrix}
 \gamma_{n-1} \\ \gamma(n)
\end{matrix}\right].$$
By the Schur's complement we learn 
$$\left[\begin{matrix}\Gamma_{n-1}&\widetilde \gamma_{n-1}\\\widetilde \gamma_{n-1}^* & \gamma(0)\end{matrix}\right]
 =\left[\begin{matrix}I_{n-1}& \\\widetilde \gamma_{n-1}^*\Gamma_{n-1}^{-1} &1\end{matrix}\right]
\left[\begin{matrix}\Gamma_{n-1}& \\ &\gamma(0)-\widetilde \gamma_{n-1}^*\Gamma_{n-1}^{-1}\widetilde \gamma_{n-1}\end{matrix}\right]
\left[\begin{matrix}I_{n-1} & \Gamma_{n-1}^{-1}\widetilde \gamma_{n-1}\\ &1\end{matrix}\right].$$

Thus we can solve that 
$$\begin{aligned}\phi_n &= \left[\begin{matrix}\Gamma_{n-1}^{-1}\gamma_{n-1} - \Gamma_{n-1}^{-1}\widetilde \gamma_{n-1}\frac{-\widetilde \gamma_{n-1}^*\Gamma_{n-1}^{-1}\gamma_{n-1} + \gamma(n)}{1 - \widetilde \rho_{n-1}^*\Gamma_{n-1}^{-1}\widetilde \gamma_{n-1}}\\\frac{-\widetilde \gamma_{n-1}^*\Gamma_{n-1}^{-1}\gamma_{n-1} + \gamma(n)}{\gamma(0) - \widetilde \gamma_{n-1}^*R_{n-1}^{-1}\widetilde \gamma_{n-1}}\end{matrix}\right]
=\left[\begin{matrix}\phi_{n-1}-\widetilde\phi_{n-1}\phi_{nn}\\ \phi_{nn}\end{matrix}\right]
\end{aligned}$$
where 
$$\phi_{nn}=\frac{\gamma(n)-\widetilde \gamma_{n-1}^*\phi_{n-1}}{\gamma(0) - \gamma_{n-1}^*\phi_{n-1}}.$$


### Innovations Algorithm

The innovations algorithm is another algorithm to compute $x_{n+1}^n$. Recall that $x_{t+1}^t$ stands for the predictor for $x_{t+1}$ with the knowledge of $x_t,\dotsc,x_1$. We call the error $x_{t+1} - x_{t+1}^t$ the innovation at time $t+1$. Suppose that we have  the recurrence
$$x_{t+1}^t = \sum_{j=1}^t\theta_{tj}\left(x_{t-j+1}-x_{t-j+1}^{t-j}\right)$$
and 
$$P_{t+1}^t = {\rm Var}(x_{t+1} - x_{t+1}^t)=\gamma(0) - \sum_{j=1}^t\theta_{tj}^2P_{t-j+1}^{t-j} .$$
Since we know that ${\rm Cov}(x_{t+1}^t, x_{t-j+1})=0$, we deduce
$$$$

## ARMA Forecasting

We can make a prediction with infinite many data. For example, write 
$$\tilde x_{n+m}=\mathbb E\left(x_{n+m}|x_n,\dotsc,x_1,x_0,x_{-1},\dotsc\right).$$

Intuitively, when we have sufficient data, then $x_{n+m}^n$ will approximate $\tilde x_{n+m}$. Now we consider an ARMA model  $\phi(B)x_t = \theta(B)w_t$ with both causality and invertibility, i.e.
$$\begin{aligned}x_t  = \sum_{j=0}^{\infty}\psi_j w_{t-j}\quad\quad 
w_t = \sum_{j=0}^{\infty}\pi_j x_{t-j}\end{aligned}$$

Now we use the same notation that 
$$\tilde w_t = \mathbb E\left(w_t|x_n,\dotsc,x_0,\dotsc\right)=
\left\{\begin{array}{ll}0 & t>n\\ w_t & t\leqslant n \end{array}\right. =w_t\mathbb I_{t\leqslant n}.$$

Thus we can represent $\tilde x_{n+m}$ with $w$ by 
$$\tilde x_{n+m}=\sum_{j=0}^{\infty}\psi_j \tilde w_{n+m-j}
=\sum_{j=m}^{\infty}\psi_j w_{n+m-j}.$$

The MSE is given by
$$P_{n+m}^n = \mathbb E\left((x_{n+m} - \tilde x_{n+m})^2|x_n,\dotsc\right)
=\mathbb E\left(\left(\sum_{j=0}^{m-1} \psi_j  w_{n+m-j}\right)^2|x_n,\dotsc,\right)
=\sigma_w^2\sum_{j=0}^{m-1}\psi_j^2.$$

### Triviality

As we can prove that, an invertible ARMA has $\psi\rightarrow 0$ exponentially fast (by residual theorem), we learn that $\lim_{m\rightarrow \infty}\sum_{j=0}^m \psi_j^2=0$ and when $m\rightarrow \infty$, 
$$\tilde x_{n+m}\rightarrow 0 \quad {\rm and}\quad P_{n+m}^n \rightarrow \sigma_w^2\sum_{j=0}^{\infty}\psi_j^2 = \gamma(0),$$
which implies that when $m\rightarrow \infty$ our prediction degenerates to predicting the mean and has error equal to $\gamma(0)$.

### Truncated prediction


Also we can represent $\tilde x_{n+m}$ with past $x$ by (since $0=\tilde w_t = \tilde x_{t}+\sum \pi_j\tilde x_{t-j}$)
$$\tilde x_{n+m}=-\sum_{j=1}^{\infty}\pi_j \tilde x_{n+m-j}
= -\sum_{j=1}^{m-1}\pi_j \tilde x_{n+m-j}-\sum_{j=m}^{\infty}\pi_j   x_{n+m-j}.$$
Here the second terms are already observed values $x_n,x_{n-1},\dotsc$, while the first terms are recursive prediction 
$x_{n+1},x_{n+2},\dotsc,x_{n+m}$.