# THE HAT MATRIX
<br>


## Introduction

<br>
In the previous notebook (Ordinary Least Squares), we determined and solved the OLS normal equations; we also went through some of the possible formulation for our estimators. 

<br>
In this notebook we will explore one more formulation, as we will express the predicted values of the response variable in terms of a particular matrix, called the hat matrix.


## The Hat Matrix

<br>
As usual, we will work equations and formulations we already know :

<br>
<blockquote>
$
    \mathbf{Y} \ = \ \mathbf{X} \ \boldsymbol{\beta} + \boldsymbol{\varepsilon} \\
    \hat{\mathbf{Y}} \ = \ \mathbf{X}\ \hat{\boldsymbol{\beta}}_\boldsymbol{OLS} \\
    \mathbf{e} \ = \ \mathbf{Y} - \hat{\mathbf{Y}}    
$
</blockquote>

<br>
<blockquote>
$
    \hat{\boldsymbol{\beta}}_\boldsymbol{OLS}
    \ = \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \mathbf{X}^{\top}\mathbf{Y}
    \ = \ \boldsymbol{\beta} + (\mathbf{X}^{\top}\mathbf{X})^{-1} \mathbf{X}^{\top}\boldsymbol{\varepsilon}
$
</blockquote>

<br>
Using the last equation, it's possible to express the predicted variable $\hat{\mathbf{Y}}$ and the residuals $\boldsymbol{\varepsilon}$ in terms of the original response variable $\mathbf{Y}$ : 

<br>
$
    \quad
    \begin{align}
        \hat{\mathbf{Y}} 
        &= \mathbf{X} \ \hat{\boldsymbol{\beta}}_\boldsymbol{OLS} 
        \newline
        &= \mathbf{X} 
          \big[ \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \ \mathbf{X}^{\top}\mathbf{Y} \ \big] 
        \newline
        &= \big[ \ \mathbf{X} \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \ \mathbf{X}^{\top} \ \big] \mathbf{Y}
        \newline
        &=  \mathbf{H} \ \mathbf{Y}        
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 
            & [\textbf{E1}] 
    \end{align}
$

<br>
$
    \quad
    \begin{align}
        \mathbf{e}
        &= \mathbf{Y}  - \hat{\mathbf{Y}} 
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad  
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad
            & \text{by } \textbf{E1} 
        \newline
        &= \mathbf{Y}  - \mathbf{H} \ \mathbf{Y} 
        \newline
        &= (\mathbf{I}  - \mathbf{H}) \ \mathbf{Y} 
        \newline
        &=  \mathbf{M} \ \mathbf{Y} 
            & [\textbf{E2}] 
        \newline \newline
        & \text{where } \quad \mathbf{H} = \big[ \ \mathbf{X} \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \ \mathbf{X}^{\top} \ \big]
        \quad \text{and} \quad \mathbf{M} = (\mathbf{I} - \mathbf{H})
    \end{align}
$

<br>
The <b>hat matrix</b> $\mathbf{H}$, sometimes also called the <b>projection matrix</b> $\mathbf{P}$, maps the vector of response values to the vector of fitted values. 

<br>
Equivalently, it describes the influence each response value $\boldsymbol{\mathbf{Y}_i}$ has on each fitted value $\hat{\mathbf{Y}}_\boldsymbol{j}$; the diagonal elements of the hat matrix are called <b>leverages</b>, and describe the influence each response value has on the fitted value for that same observation.

<br>
Another matrix we will see in the course of this notebook is the <b>residual maker</b> matrix $\mathbf{M} = ( \mathbf{I} - \mathbf{H})$, sometimes called the annihilator matrix.


## Shape of the Hat Matrix

<br>
Before introducing the properties of this matrix, it would be nice to know what is its shape :

<br>
$
    \quad
    \begin{align}
        \mathbf{X} _\textit{ m x p } \
        (\mathbf{X}^{\top} _\textit{ p x m } \mathbf{X} _\textit{ m x p })^{-1} \
        \mathbf{X}^{\top} _\textit{ p x m}
        \newline
        &=  \mathbf{X} _\textit{ m x p } \
            (\mathbf{X}^{\top} \mathbf{X})^{-1} _\textit{ p x p } \
            \mathbf{X}^{\top} _\textit{ p x m }
        \newline
        &=  \mathbf{X} _\textit{ m x p } \
            (\mathbf{X}^{\top} \mathbf{X})^{-1}  _\textit{ p x p } \
            \mathbf{X}^{\top} _\textit{ p x m }
        \newline
        &=  \mathbf{H} _\textit{ m x m }
    \end{align}
$

<br>
Therefore, $\mathbf{H}$ is a $_\textit{ m x m }$ matrix, and so is $\mathbf{M}$ .


## Properties of the Hat Matrix

<br>
The hat matrix has a number of useful algebraic properties, some of these properties are summarized below and will be discussed in the current notebook : 

<br>
<ul style="list-style-type:square">
    <li>
        $\mathbf{H}$ is symmetric, and so is $\mathbf{M} = (\mathbf{I} - \mathbf{H})$
    </li>
    <br>
    <li>
        $\mathbf{H}$ is idempotent, and so is $\mathbf{M}$
    </li>
    <br>
    <li>
        $\mathbf{X}$ is invariant under $\mathbf{H}$ : $\mathbf{H} \mathbf{X} = \mathbf{X}$, hence $\mathbf{M} \mathbf{X} = 0$ 
    </li>
    <br>
    <li>
        $\boldsymbol{e} = \mathbf{M} \ \boldsymbol{\varepsilon}$ 
    </li>
    <br>
    <li>
        $\mathrm{V}(\mathbf{e}) = \boldsymbol{\sigma^2} \mathbf{M}$
    </li>
</ul>


## [P1] Symmetry

<br>
Recall that in linear algebra a symmetric matrix is a square matrix that is equal to its transpose : $(\star = \star^{\top})$ .
Following the formal definition, the first step is to prove that the matrix $\mathbf{H}$ is equal to its transpose :

<br>
$
    \quad
    \begin{align}
    \mathbf{H}^{\top} 
    &= 
        & \text{by definition}
    \newline
    &= \big[ \mathbf{X} \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \ \mathbf{X}^{\top} \big] ^{\top} 
    \newline
    &= \big[ \mathbf{X}^{\top} \big] ^{\top} \big[ \mathbf{X} \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \big] ^{\top} 
    \newline
    &= 
        \mathbf{X} \big[ (\mathbf{X}^{\top}\mathbf{X})^{-1} \big] ^{\top} \mathbf{X}^{\top}
        \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad         
        & (\mathbf{X}^{\top}\mathbf{X}) \text{ is symmetric, therefore so is } (\mathbf{X}^{\top}\mathbf{X})^{-1}
    \newline
    &= 
        \mathbf{X} \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \ \mathbf{X}^{\top}
        & \text{by definition}
    \newline
    &= \mathbf{H}
       & [\textbf{P1}] 
    \end{align}
$

<br>
If the hat matrix is symmetric, it's easy to see that the matrix $\mathbf{M} = ( \mathbf{I} - \mathbf{P})$ is symmetric as well : the subtraction $(\mathbf{I} - \mathbf{P})$ only affects the magnitude of the diagonal elements of $\mathbf{H}$, while the elements of $\mathbf{H}$ off the diagonal remain unchanged, except they become negative.


## [P2] Idempotency

<br>
By definition, an idempotent matrix is a matrix which, when multiplied by itself, yields itself $(\star \star = \star)$ . For this product $\star \star$ to be defined, $\star$ must necessarily be a square matrix.

<br>
$
    \quad
    \begin{align}
        \mathbf{H} \ \mathbf{H}
        &= 
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 
            & \text{by definition}
        \newline
        &= 
            \big[ \mathbf{X} \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \ \mathbf{X}^{\top} \big]
            \big[ \mathbf{X} \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \ \mathbf{X}^{\top} \big]
        \newline
        &= 
            \mathbf{X} 
            \ (\mathbf{X}^{\top}\mathbf{X})^{-1} 
            \ (\mathbf{X}^{\top} \mathbf{X})
            \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \ \mathbf{X}^{\top}
        \newline
        &= 
            \mathbf{X} 
            \ (\mathbf{X}^{\top}\mathbf{X})^{-1} 
            \ \mathbf{X}^{\top}
            & \text{by definition}
        \newline
        &=  \mathbf{H}
            & [\textbf{P2-A}] 
    \end{align}
$

<br>
$
    \quad
    \begin{align}
        \mathbf{M} \ \mathbf{M}
        &= 
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 
            & \text{by definition}
        \newline
        &=
            (\mathbf{I} - \mathbf{H}) (\mathbf{I} - \mathbf{H})
        \newline
        &=
            \mathbf{I} \ (\mathbf{I} - \mathbf{H}) - \mathbf{H} (\mathbf{I} - \mathbf{H})
        \newline
        &=
            \mathbf{I} - \mathbf{H} - \mathbf{H} + \mathbf{H}\mathbf{H}
            & \text{by } \textbf{P2-A}
        \newline
        &=
            \mathbf{I} - 2\mathbf{H} + \mathbf{H}
        \newline
        &=
            \mathbf{I} - \mathbf{H}
            & \text{by definition} 
        \newline
        &= \mathbf{M}
           & [\textbf{P2-B}] 
    \end{align}
$

## [P3] Invariance and Annihilation 

<br>
$
    \quad
    \begin{align}
        \mathbf{H} \ \mathbf{X}
        &=
        \newline
        &= \big[ \ \mathbf{X} (\mathbf{X}^{\top}\mathbf{X})^{-1} \mathbf{X}^{\top} \ \big] \mathbf{X}
        \newline
        &= \mathbf{X} \ (\mathbf{X}^{\top}\mathbf{X})^{-1} \ (\mathbf{X}^{\top} \mathbf{X})
        \newline
        &=  \mathbf{X}
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad
            & [\textbf{P3-A}] 
    \end{align}
$

<br>
$
    \quad
    \begin{align}
        \mathbf{M} \ \mathbf{X}
        &=
        \newline
        &= \big[ \mathbf{I} - \mathbf{X} (\mathbf{X}^{\top}\mathbf{X})^{-1} \mathbf{X}^{\top} \ \big] \mathbf{X}
        \newline
        &= \mathbf{X} - \mathbf{X} 
        \newline
        &= 0
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad
            & [\textbf{P3-B}] 
    \end{align}
$

## [P4] Residuals as estimators of the Error term

<br>
This property follows directly from <b>E2</b> :  

<br>
$
    \quad
    \begin{align}
        \mathbf{e}
        &= 
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad 
            & \text{by } \textbf{E2}
        \newline
        &= \mathbf{M} \ \mathbf{Y}
        \newline
        &= \mathbf{M} \ (\mathbf{X} \ \boldsymbol{\beta} + \boldsymbol{\varepsilon})
        \newline
        &=  \mathbf{M} \ \mathbf{X} \ \boldsymbol{\beta} + \mathbf{M} \ \boldsymbol{\varepsilon}
            & \text{by annihilation (} \textbf{P3-B} \text{)}
        \newline
        &=  
            \mathbf{M} \ \boldsymbol{\varepsilon}    
            & [\textbf{P4}] 
    \end{align}
$

<br>
Equivalently in matrix form : 

<br>
$
    \quad
    \begin{align}
        \mathbf{e}
        \quad &= \quad
        \begin{bmatrix}
            (1 - h_{11})    &  -h_{12}       &  \dots  & -h_{1N}       \\
            -h_{21}         &  (1 - h_{22})  &  \dots  & -h_{2N}       \\
            \vdots & \vdots & \dots  & \vdots \\
            \vdots & \vdots & \ddots & \vdots \\
            -h_{N1}         &  -h_{N2}  &  \dots  & (1 - h_{NN})
        \end{bmatrix}_\textit{ N x N}
        \quad
        \begin{bmatrix}
            \varepsilon \\ \varepsilon_2 \\ \vdots \\ \vdots \\ \varepsilon_N
        \end{bmatrix}_\textit{ N x 1}
        \quad = \quad
        \begin{bmatrix}
            (1 - h_{11}) \ \varepsilon_1 + \sum_{j \neq 1}^{N} (-h_{ij}) \varepsilon_j  \\
            (1 - h_{22}) \ \varepsilon_1 + \sum_{j \neq 2}^{N} (-h_{ij}) \varepsilon_j  \\
            \vdots \\
            \vdots \\
            (1 - h_{NN}) \ \varepsilon_1 + \sum_{j \neq N}^{N} (-h_{ij}) \varepsilon_j  
        \end{bmatrix}_\textit{ N x 1}
        \quad = \quad
        \begin{bmatrix}
            e_1 \\ e_2 \\ \vdots \\ \vdots \\ e_N
        \end{bmatrix}_\textit{ N x 1} 
    \end{align}
$

<br>
Each residual $\boldsymbol{\mathbf{e}_i}$ is a linear combination of the error terms $\boldsymbol{\varepsilon}$ ; if $(1 - h_{ii})$ is reasonably big relative to $\sum_{j \neq 1}^{N} (-h_{ij})$ (if $\mathbf{H}$ is "small" relative to $\mathbf{I}$), then most of the weight is concentrated on $\boldsymbol{\varepsilon_i}$ (this is frequently not the case, though).


## [P5] Variance of the residuals

<br>
Before we start looking at the covariance matrix of the residuals, we are going to make some considerations about the variance of the response variable (in terms of both the single observation and the covariance matrix) : 

<br>
$
    \quad
    \begin{align}
        \mathrm{Var}(\boldsymbol{\mathbf{Y}_i}) 
        &=
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad
            \qquad \qquad \qquad \qquad\qquad \qquad \quad 
            & \text{by definition}
        \newline
        &= 
            \mathrm{Var} \big[ \boldsymbol{\mathbf{X}_i} \ \boldsymbol{\beta} + \boldsymbol{\varepsilon_i} \big]
        \newline
        &= 
            \mathrm{Var} \big[ \boldsymbol{\mathbf{X}_i} \ \boldsymbol{\beta} \big] 
            + \mathrm{Var} \big[ \boldsymbol{\varepsilon_i} \big]
            + 2 \ \mathrm{Cov} \big[ \boldsymbol{\mathbf{X}_i} \ \boldsymbol{\beta} , \boldsymbol{\varepsilon_i} \big]
            & \text{by strict exogeneity (} \textbf{A2} \text{) and } \mathrm{Var}(\boldsymbol{\beta}) = 0
        \newline
        &= 
            \mathrm{Var} ( \boldsymbol{\varepsilon_i} )
            & \text{by homoscedasticity (} \textbf{A3} \text{)}
        \newline
        &= \boldsymbol{\sigma^2}            
    \end{align}
$

and the covariance matrix takes the form : 

<br>
$
    \quad
    \begin{align}
        \mathrm{V}(\mathbf{Y})
        \quad &=
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad
            \qquad \qquad \qquad \qquad\qquad \qquad \qquad \quad
            & \text{by definition}
        \newline \newline
        &= \quad  
        \begin{bmatrix}
            \mathrm{Var}(y_1)        & \mathrm{Cov}(y_1, y_2) & \dots  & \mathrm{Cov}(y_1, y_m) \\
            \mathrm{Cov}(y_2, y_1)   & \mathrm{Var}(y_2)      & \dots  & \mathrm{Cov}(y_2, y_m) \\
            \vdots & \vdots & \dots  & \vdots \\
            \vdots & \vdots & \ddots & \vdots \\
            \mathrm{Cov}(y_m, y_1)   & \mathrm{Cov}(y_m, y_2) & \dots  & \mathrm{Var}(y_m)
        \end{bmatrix}_\textit{ m x m}
        \newline \newline
        &= \quad  
        \begin{bmatrix}
              \mathrm{Var}(\varepsilon_1) 
            & \mathrm{Cov}(\varepsilon_1, \varepsilon_2) 
            & \dots  & \mathrm{Cov}(\varepsilon_1, \varepsilon_m) 
            \\
              \mathrm{Cov}(\varepsilon_2, \varepsilon_1)   
            & \mathrm{Var}(\varepsilon_2)      
            & \dots  
            & \mathrm{Cov}(\varepsilon_2, \varepsilon_m) 
            \\
            \vdots & \vdots & \dots  & \vdots \\
            \vdots & \vdots & \ddots & \vdots \\
              \mathrm{Cov}(\varepsilon_m, \varepsilon_1)   
            & \mathrm{Cov}(\varepsilon_m, \varepsilon_2) 
            & \dots  
            & \mathrm{Var}(\varepsilon_m)
        \end{bmatrix}_\textit{ m x m}
            & \text{by spherical errors (} \textbf{A3 + A4} \text{)}
        \newline \newline
        &= \quad  
        \begin{bmatrix}
            \boldsymbol{\sigma^2}    &  0                        & \dots  & 0  \\
            0                        & \boldsymbol{\sigma^2}     & \dots  & 0  \\
            \vdots & \vdots & \dots  & \vdots \\
            \vdots & \vdots & \ddots & \vdots \\
            0                        & 0                         & \dots  & \boldsymbol{\sigma^2}
        \end{bmatrix}_\textit{ m x m}
        & [\textbf{P5-A}] 
    \end{align}
$

Finally, we can examine the structure of the covariance matrix of the residuals : 

<br>
$
    \quad
    \begin{align}
        \mathrm{V}(\mathbf{e})
        \quad &=
            \qquad \qquad \qquad \qquad \qquad \qquad \qquad
            \qquad \qquad \qquad \qquad\qquad \qquad \qquad \
            & \text{by } \textbf{E2}
        \newline
        &= 
            \mathrm{V} \big[ (\mathbf{I} - \mathbf{H}) \mathbf{Y} \big]
            & \mathrm{Var} (\star \bullet) = \star \mathrm{Var} (\bullet) \star^{\top}
        \newline
        &
            & \text{ when } \star \text{ is a matrix and } \bullet \text{ is a vector} 
        \newline
        &= 
            (\mathbf{I} - \mathbf{H}) \ \mathrm{V} (\mathbf{Y}) \ (\mathbf{I} - \mathbf{H})^{\top}
            & \text{by variance of the response variable (} \textbf{P5-A} \text{)}
        \newline
        &= (\mathbf{I} - \mathbf{H}) \ \boldsymbol{\sigma^2} \mathbf{I} \ (\mathbf{I} - \mathbf{H})^{\top}
        \newline
        &= 
            \boldsymbol{\sigma^2} (\mathbf{I} - \mathbf{H}) (\mathbf{I} - \mathbf{H})^{\top}
            & \text{by idempotency (} \textbf{P2-B} \text{)}
        \newline
        &= 
            \boldsymbol{\sigma^2} \ (\mathbf{I} - \mathbf{H})    
            & [\textbf{P5-B}] 
    \end{align}
$

Equivalently in matrix form : 

<br>
$
    \quad
    \mathrm{V}(\mathbf{e})
    \ = \
    \boldsymbol{\sigma^2} \ (\mathbf{I} - \mathbf{H})    
    \quad = \quad
    \boldsymbol{\sigma^2} \ 
    \begin{bmatrix}
        (1 - h_{11})    &  -h_{12}       &  \dots  & -h_{1m}       \\
        -h_{21}         &  (1 - h_{22})  &  \dots  & -h_{2m}       \\
        \vdots & \vdots & \dots  & \vdots \\
        \vdots & \vdots & \ddots & \vdots \\
        -h_{m1}         &  -h_{m2}  &  \dots  & (1 - h_{mm})
    \end{bmatrix}_\textit{ m x m}
    \ = \quad
    \boldsymbol{\sigma^2} \ 
    \begin{bmatrix}
        \mathrm{Var}(e_1)        & \mathrm{Cov}(e_1, e_2) & \dots  & \mathrm{Cov}(e_1, e_m) \\
        \mathrm{Cov}(e_2, e_1)   & \mathrm{Var}(e_2)      & \dots  & \mathrm{Cov}(e_2, e_m) \\
        \vdots & \vdots & \dots  & \vdots \\
        \vdots & \vdots & \ddots & \vdots \\
        \mathrm{Cov}(e_m, e_1)   & \mathrm{Cov}(e_m, e_2) & \dots  & \mathrm{Var}(e_m)
    \end{bmatrix}_\textit{ m x m}
$

The same conclusion can be achieved much more easily using <b>P4</b> instead of <b>P5-A</b> : computing the variance of the residuals as $\mathrm{V}(\mathbf{e}) = \mathrm{V} \big[ \mathbf{M} \ \boldsymbol{\varepsilon} \big]$ yields the same result in fewer steps, but we would have missed the considerations regarding the variance of the response variable. 

<br>
Now we know two important points : <br>

<ul style="list-style-type:square">
    <li>
         the variance of the response variable is the variance of the error term, 
         $\mathrm{V}(\mathbf{Y}) = \boldsymbol{\sigma^2} \ \mathbf{I}$
    </li>
    <br>
    <li>
        the variance of the residuals is a linear combination of the variance of the error term, 
        $\mathrm{V}(\mathbf{e}) = \boldsymbol{\sigma^2} \ (\mathbf{I} - \mathbf{H})$
    </li>
</ul>

### Estimation of the variance

<br>
As we will see again in the notebook regarding the Gauss-Markov theorem, it's crucial to notice that the variance of the error terms $\boldsymbol{\sigma^2}$ is unobservable, being the error terms unobservable themselves. An estimate of this quantity can be computed, based on the regression residuals :

<br>
$
    \quad
    \boldsymbol{s^2} 
    \ = \ \dfrac
        { \sum_{i=1}^{m} \boldsymbol{{e_i}^2} }
        { m - p }
    \ = \ \dfrac
        { \sum_{i=1}^{m} \boldsymbol{e}^{\top}\boldsymbol{e} }
        { m - p }
    \ = \ \dfrac
        { \text{SSR} }
        { m - p }
    \qquad \Rightarrow \qquad
    \widehat{\mathrm{V}(\mathbf{e})} = \boldsymbol{s^2} \ (\mathbf{I} - \mathbf{H})    
$


### Variance of the residual for a particular observation

<br>
When interested in computing the variance of the residual for a particular observation, the important elements are those on the diagonal of the hat matrix $\boldsymbol{\mathbf{H}_{ii}}$ :

<br>
$
    \quad
    \mathrm{Var}(\boldsymbol{\mathbf{e}_i}) \ = \ \boldsymbol{\sigma^2} \ (1 - \boldsymbol{\mathbf{H}_{ii}})   
$

<br>
In other words, the diagonal values of $\mathbf{M} = (\mathbf{I} - \mathbf{H})$ give a measure of how precise the estimate of $\mathbf{Y}$ is likely to be for a particular observation of $\mathbf{X}$ .

## References

<br>
<ul style="list-style-type:square">
    <li>
         University of Kansas - Paul Johnson - 
         <a href="https://bit.ly/2s7lCi5">
         The Hat Matrix and Regression Diagnostics</a>
    </li>
    <br>
    <li>
        Stack Exchange - Cross Validated -
        <a href="https://bit.ly/2s44dHJ">
        Is the residual $e$ an estimator of the error $\varepsilon$ ?</a>
    </li>
    <br>
    <li>
        Wikipedia -
        <a href="https://bit.ly/2KOjoMo">
        Projection Matrix</a>
    </li>
</ul>
