7-9-matrixformulation.Rmd

---
title: "7. Time series regression models"
author: "7.9 Matrix formulation"
date: "OTexts.org/fpp3/"
classoption: aspectratio=169
titlepage: fpp3title.png
titlecolor: fpp3red
toc: false
output:
  binb::monash:
    colortheme: monashwhite
    fig_width: 7.5
    fig_height: 3
    keep_tex: no
    includes:
      in_header: fpp3header.tex
---

```{r setup, include=FALSE}
source("setup.R")
```


## Matrix formulation

\begin{block}{}
\centerline{$y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + \cdots + \beta_kx_{k,t} + \varepsilon_t.$}
\end{block}
\pause

Let $\bm{y} = (y_1,\dots,y_T)'$, $\bm{\varepsilon} = (\varepsilon_1,\dots,\varepsilon_T)'$, $\bm{\beta} = (\beta_0,\beta_1,\dots,\beta_k)'$ and
\[
\bm{X} = \begin{bmatrix}
1      & x_{1,1} & x_{2,1} & \dots & x_{k,1}\\
1      & x_{1,2} & x_{2,2} & \dots & x_{k,2}\\
\vdots & \vdots  & \vdots  &       & \vdots\\
1      & x_{1,T} & x_{2,T} & \dots & x_{k,T}
  \end{bmatrix}.
\]\pause
Then

###
\centerline{$\bm{y} = \bm{X}\bm{\beta} + \bm{\varepsilon}.$}

## Matrix formulation

**Least squares estimation**

Minimize: $(\bm{y} - \bm{X}\bm{\beta})'(\bm{y} - \bm{X}\bm{\beta})$\pause

Differentiate wrt $\bm{\beta}$ gives

\begin{block}{}
\centerline{$\hat{\bm{\beta}} = (\bm{X}'\bm{X})^{-1}\bm{X}'\bm{y}$}
\end{block}

\pause
(The "normal equation".)\pause

\[
\hat{\sigma}^2 = \frac{1}{T-k-1}(\bm{y} - \bm{X}\hat{\bm{\beta}})'(\bm{y} - \bm{X}\hat{\bm{\beta}})
\]

\structure{Note:} If you fall for the dummy variable trap, $(\bm{X}'\bm{X})$ is a singular matrix.

## Likelihood

If the errors are iid and normally distributed, then
\[
\bm{y} \sim \text{N}(\bm{X}\bm{\beta},\sigma^2\bm{I}).
\]\pause
So the likelihood is
\[
L = \frac{1}{\sigma^T(2\pi)^{T/2}}\exp\left(-\frac1{2\sigma^2}(\bm{y}-\bm{X}\bm{\beta})'(\bm{y}-\bm{X}\bm{\beta})\right)
\]\pause
which is maximized when $(\bm{y}-\bm{X}\bm{\beta})'(\bm{y}-\bm{X}\bm{\beta})$ is minimized.\pause

\centerline{\alert{So \textbf{MLE = OLS}.}}

## Cross-validation

\begin{block}{Fitted values}\vspace*{-0.8cm}
\begin{align*}
\hat{\bm{y}} &= \bm{X}\hat{\bm{\beta}} \\
& = \bm{X}(\bm{X}'\bm{X})^{-1}\bm{X}'\bm{y}\\
&= \bm{H}\bm{y}\qquad\qquad \text{where $\bm{H} = \bm{X}(\bm{X}'\bm{X})^{-1}\bm{X}'$.}\\[-0.9cm]
\end{align*}
\end{block}\pause\vspace*{0.3cm}

\begin{block}{LOO cross-validation MSE}
\centerline{$\text{CV} = \displaystyle\frac1T \sum_{t=1}^T [e_t/(1-h_t)]^2$}
\begin{itemize}\tightlist
\item $e_t=$ residual at time $t$ (from fitting model to all data)
\item $h_1,\dots,h_T$ are the diagonals of $\bm{H}$.
\end{itemize}
\end{block}

## Multiple regression forecasts

\begin{block}{Optimal forecasts}\vspace*{-0.2cm}
\[
\hat{y}^* =
\text{E}(y^* | \bm{y},\bm{X},\bm{x}^*) =
\bm{x}^*\hat{\bm{\beta}} = \bm{x}^*(\bm{X}'\bm{X})^{-1}\bm{X}'\bm{y}
\]
\end{block}\vspace*{-0.2cm}
where $\bm{x}^*$ is a row vector containing the values of the predictors for the forecasts (in the same format as $\bm{X}$).

\pause

\begin{block}{Forecast variance}\vspace*{-0.2cm}
\[
\text{Var}(y^* | \bm{X},\bm{x}^*) = \sigma^2 \left[1 + \bm{x}^* (\bm{X}'\bm{X})^{-1} (\bm{x}^*)'\right]
\]
\end{block}
\vspace*{-0.2cm}\pause

* This ignores any errors in $\bm{x}^*$.
* 95% prediction intervals assuming normal errors:

\centerline{$\hat{y}^* \pm 1.96 \sqrt{\text{Var}(y^*| \bm{X},\bm{x}^*)}.$}