forked from robjhyndman/fpp3_slides
/
7-9-matrixformulation.Rmd
121 lines (95 loc) · 3.22 KB
/
7-9-matrixformulation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: "7. Time series regression models"
author: "7.9 Matrix formulation"
date: "OTexts.org/fpp3/"
classoption: aspectratio=169
titlepage: fpp3title.png
titlecolor: fpp3red
toc: false
output:
binb::monash:
colortheme: monashwhite
fig_width: 7.5
fig_height: 3
keep_tex: no
includes:
in_header: fpp3header.tex
---
```{r setup, include=FALSE}
source("setup.R")
```
## Matrix formulation
\begin{block}{}
\centerline{$y_t = \beta_0 + \beta_1 x_{1,t} + \beta_2 x_{2,t} + \cdots + \beta_kx_{k,t} + \varepsilon_t.$}
\end{block}
\pause
Let $\bm{y} = (y_1,\dots,y_T)'$, $\bm{\varepsilon} = (\varepsilon_1,\dots,\varepsilon_T)'$, $\bm{\beta} = (\beta_0,\beta_1,\dots,\beta_k)'$ and
\[
\bm{X} = \begin{bmatrix}
1 & x_{1,1} & x_{2,1} & \dots & x_{k,1}\\
1 & x_{1,2} & x_{2,2} & \dots & x_{k,2}\\
\vdots & \vdots & \vdots & & \vdots\\
1 & x_{1,T} & x_{2,T} & \dots & x_{k,T}
\end{bmatrix}.
\]\pause
Then
###
\centerline{$\bm{y} = \bm{X}\bm{\beta} + \bm{\varepsilon}.$}
## Matrix formulation
**Least squares estimation**
Minimize: $(\bm{y} - \bm{X}\bm{\beta})'(\bm{y} - \bm{X}\bm{\beta})$\pause
Differentiate wrt $\bm{\beta}$ gives
\begin{block}{}
\centerline{$\hat{\bm{\beta}} = (\bm{X}'\bm{X})^{-1}\bm{X}'\bm{y}$}
\end{block}
\pause
(The "normal equation".)\pause
\[
\hat{\sigma}^2 = \frac{1}{T-k-1}(\bm{y} - \bm{X}\hat{\bm{\beta}})'(\bm{y} - \bm{X}\hat{\bm{\beta}})
\]
\structure{Note:} If you fall for the dummy variable trap, $(\bm{X}'\bm{X})$ is a singular matrix.
## Likelihood
If the errors are iid and normally distributed, then
\[
\bm{y} \sim \text{N}(\bm{X}\bm{\beta},\sigma^2\bm{I}).
\]\pause
So the likelihood is
\[
L = \frac{1}{\sigma^T(2\pi)^{T/2}}\exp\left(-\frac1{2\sigma^2}(\bm{y}-\bm{X}\bm{\beta})'(\bm{y}-\bm{X}\bm{\beta})\right)
\]\pause
which is maximized when $(\bm{y}-\bm{X}\bm{\beta})'(\bm{y}-\bm{X}\bm{\beta})$ is minimized.\pause
\centerline{\alert{So \textbf{MLE = OLS}.}}
## Cross-validation
\begin{block}{Fitted values}\vspace*{-0.8cm}
\begin{align*}
\hat{\bm{y}} &= \bm{X}\hat{\bm{\beta}} \\
& = \bm{X}(\bm{X}'\bm{X})^{-1}\bm{X}'\bm{y}\\
&= \bm{H}\bm{y}\qquad\qquad \text{where $\bm{H} = \bm{X}(\bm{X}'\bm{X})^{-1}\bm{X}'$.}\\[-0.9cm]
\end{align*}
\end{block}\pause\vspace*{0.3cm}
\begin{block}{LOO cross-validation MSE}
\centerline{$\text{CV} = \displaystyle\frac1T \sum_{t=1}^T [e_t/(1-h_t)]^2$}
\begin{itemize}\tightlist
\item $e_t=$ residual at time $t$ (from fitting model to all data)
\item $h_1,\dots,h_T$ are the diagonals of $\bm{H}$.
\end{itemize}
\end{block}
## Multiple regression forecasts
\begin{block}{Optimal forecasts}\vspace*{-0.2cm}
\[
\hat{y}^* =
\text{E}(y^* | \bm{y},\bm{X},\bm{x}^*) =
\bm{x}^*\hat{\bm{\beta}} = \bm{x}^*(\bm{X}'\bm{X})^{-1}\bm{X}'\bm{y}
\]
\end{block}\vspace*{-0.2cm}
where $\bm{x}^*$ is a row vector containing the values of the predictors for the forecasts (in the same format as $\bm{X}$).
\pause
\begin{block}{Forecast variance}\vspace*{-0.2cm}
\[
\text{Var}(y^* | \bm{X},\bm{x}^*) = \sigma^2 \left[1 + \bm{x}^* (\bm{X}'\bm{X})^{-1} (\bm{x}^*)'\right]
\]
\end{block}
\vspace*{-0.2cm}\pause
* This ignores any errors in $\bm{x}^*$.
* 95% prediction intervals assuming normal errors:
\centerline{$\hat{y}^* \pm 1.96 \sqrt{\text{Var}(y^*| \bm{X},\bm{x}^*)}.$}