Notes from Stat 849 at UW-Madison: http://www.stat.wisc.edu/courses/st849-bates/lectures/

## Chapter 1: The Gaussian Linear Model
Reponses: Vector-valued random variable, $\mathcal{Y}$. Observed values of the responses, represented by the vector $y$.

Predictors: $X\beta$, where the $n \times p$ matrix $X$ is the *model matrix*, $n$ is the number of obsrvations, and $p$ is the dimension of the *coefficient vector*, $\beta$. The coefficients are *parameters* in the model. We form *estimates* $\hat{\beta}$ of these paramters from the observed data. We assume that $n \geq q$.

$$\mathcal{Y} \sim \text{Multivariate Gaussian}(X\beta_T, \sigma^2I_n)$$

where $\beta_T$ is the "true", but unknown, value of the coefficient vector. The probability density of $\mathcal{Y}$ (is also called a *spherical normal density*): $$f_{\mathcal{Y}}(\mathbf{y}) = \frac{1}{(2\pi\sigma^{2})^{n/2}} \exp\left(\frac{-\parallel\mathbf{y} - \mathbf{X}\beta_{T}\parallel^{2}}{2\sigma^2}\right)$$

The likelihhod: $$L(\beta,\sigma | \mathbf{y}) = \frac{1}{(2\pi\sigma^{2})^{n/2}} \exp\left(\frac{-\parallel\mathbf{y} - \mathbf{X}\beta\parallel^{2}}{2\sigma^2}\right)$$

The *maximum likelihood estimates (mles)* of the parameters are the values of the parameters ($\hat{\beta}, \hat{\sigma}$) that maximize the likelihood.

The log-likelihood: $$\ell(\beta,\sigma|\mathbf{y})=\log(L(\beta,\sigma|\mathbf{y}))=-\frac{n}{2}\log(2\pi\sigma^{2})-\frac{-\parallel\mathbf{y}-\mathbf{X}\beta\parallel^{2}}{2\sigma^{2}}$$

The *deviance*: $$d(\beta,\sigma|\mathbf{y}) = -2\ell(\beta,\sigma|\mathbf{y}) = n\log(2\pi\sigma^{2}) + \frac{-\parallel\mathbf{y}-\mathbf{X}\beta\parallel^{2}}{\sigma^{2}}$$ 

Because of the negative sign, the mle’s are the values that minimize the deviance. For any fixed value of $\sigma^2$, the deviance is minimized with respect to $\beta$ when the residual sum of squares, $$S(\beta | \mathbf{y}) = \parallel\mathbf{y} - \mathbf{X}\beta\parallel^{2},$$  
is minimized. Thus the mle of the coefficient vector, $\hat{\beta}$, in the Gaussian linear model is the *least squares estimate* $$\hat{\beta}=\arg\,\min_{\beta}\,\parallel\mathbf{y}-\mathbf{X}\beta\parallel^2.$$


### Linear algebra of least squares
An *orthogonal* $n \times n$ matrix, $Q$, has the property $Q'Q=QQ'=I_{n}$. An orthogonal matrix has a special property that it **preserves lengths**.
$$\parallel Qx\parallel^{2}=(Qx)'Qx=x'Q'Qx=x'x=\parallel x\parallel^{2}$$

#### The QR decomposition
**Any** $n\times p$ matrix $X$ has a QR decomposition consisting of
an orthogonal $n\times n$ matrix $Q$ and a $p\times p$ matrix $R$
that is zero below the main diagonal (in other words, it is *upper
triangular*). The QR decomposition of the model matrix $X$ is written
$$X=Q\begin{bmatrix}R\\
0
\end{bmatrix}=\begin{bmatrix}Q_{1} & Q_{2}\end{bmatrix}\begin{bmatrix}R\\
0
\end{bmatrix}=Q_{1}R$$

where $Q_{1}$ is the first $p$ columns of $Q$ and $Q_{2}$ is the last $n-p$ columns of $Q$.

If the diagonal elements of $R$ are all non-zero (in practice this means that none of them are very small in absolute value) then $X$ has *full column rank* and the columns of $Q_1$ form an orthonormal basis for the column space of $X$ [col($X$)]. If the rank of $X$ is $k < p$ (rank-deficient), using a $p \times p$ permutation matrix $P$, we can make the first $k$ columns of $Q$ form an othonormal basis for col($XP$).

$$\begin{array}{cl}
\hat{\beta} & = \arg\,\min_{\beta}\,\parallel\mathbf{y}-\mathbf{X}\beta\parallel^2\\
 & = \arg\,\min_{\beta}\,\parallel Q' (\mathbf{y}-\mathbf{X}\beta)\parallel^2\\
 & = \arg\,\min_{\beta}\,\parallel Q'\mathbf{y}-Q'\mathbf{X}\beta\parallel^2\\
 & = \arg\,\min_{\beta}\,\parallel Q'\mathbf{y}-Q'QR\beta\parallel^2\\
 & = \arg\,\min_{\beta}\,\parallel Q_1'\mathbf{y}-R\beta\parallel^2 + \parallel Q_2'\mathbf{y}\parallel
\end{array} $$

If rank($X$) = $p$ then rank($R$) = $p$ and $R^{-1}$ exists so we can write $\hat{\beta} = R^{-1}Q_1'\mathbf{y}$, though you do not acutually calculate $R^{-1}$ to solve for $\hat{\beta}$.

In a model fit by the `lm()` or `aov()` functions in `R` there is a component `$effects` which is $Q'\mathbf{y}$. The component `$qr` is a condensed form of the $QR$ decomposition of the model matrix $X$. The matrix $R$ is embedded in there but the matrix $Q$ is a virtual matrix represented as a product of Householder reflections and not usually evaluated/created explicitly.

In [4]:
data(Formaldehyde)
Formaldehyde

Unnamed: 0,(Intercept),carb
1,1.0,0.1
2,1.0,0.3
3,1.0,0.5
4,1.0,0.6
5,1.0,0.7
6,1.0,0.9


In [5]:
(X <- model.matrix(lm1 <- lm(optden ~ 1 + carb, Formaldehyde)))
# model.matrix returns X

Unnamed: 0,(Intercept),carb
1,1.0,0.1
2,1.0,0.3
3,1.0,0.5
4,1.0,0.6
5,1.0,0.7
6,1.0,0.9


In [23]:
# model.frame generates X and y
(y = model.response(model.frame(lm1)))

In [6]:
class(qrlm1 <- lm1$qr)

In [7]:
(R <- qr.R(qrlm1))

Unnamed: 0,(Intercept),carb
1,-2.44949,-1.26557
2,0.0,0.6390097


In [8]:
(Q1 <- qr.Q(qrlm1))

0,1
-0.4082483,-0.6520507
-0.4082483,-0.3390663
-0.40824829,-0.02608203
-0.4082483,0.1304101
-0.4082483,0.2869023
-0.4082483,0.5998866


In [13]:
(Q1R <- Q1 %*% R) 

(Intercept),carb
1.0,0.1
1.0,0.3
1.0,0.5
1.0,0.6
1.0,0.7
1.0,0.9


In [15]:
all.equal(X, Q1R, check.attributes = F) # should be able to reconstruct model matrix X

In [19]:
(Q <- qr.Q(qrlm1, complete=TRUE)) # produce the full n*n orthogonal matrix Q

0,1,2,3,4,5
-0.4082483,-0.6520507,-0.3737045,-0.340529,-0.3073534,-0.2410023
-0.40824829,-0.33906635,0.05460995,0.22071963,0.38682932,0.71904869
-0.40824829,-0.02608203,0.86857638,-0.14397908,-0.15653455,-0.18164549
-0.4082483,0.1304101,-0.1535966,0.8125532,-0.2212971,-0.2889976
-0.4082483,0.2869023,-0.1757696,-0.2309146,0.7139404,-0.3963496
-0.4082483,0.5998866,-0.2201156,-0.3178501,-0.4155847,0.3889463


In [28]:
as.vector(lm1$effects)

In [27]:
as.vector(crossprod(Q, y)) # crossprod(A, B) creates A'B directly without creating A' from A
# crosspod(X) creates X'X; 
# tcrossprod(X) creates XX'.

In [30]:
as.vector(qr.qty(qrlm1, y)) # another way to produce Q'y

In [34]:
zapsmall(crossprod(Q1)) # Q1'Q1 = I

0,1
1,0
0,1


In [33]:
zapsmall(crossprod(Q)) # Q'Q = I

0,1,2,3,4,5
1,0,0,0,0,0
0,1,0,0,0,0
0,0,1,0,0,0
0,0,0,1,0,0
0,0,0,0,1,0
0,0,0,0,0,1
