# _Multiple_ Linear Regression (MLR) Model

## The Model

In the population we have <ins>scalar</ins> random variables, $[Y,X_1,\ldots,X_{k},U]$, that fulfill the following relationship

$$
\begin{align*}
Y=\beta_0+\beta_1 X_1+\ldots+\beta_k X_k+U\text{,}
\end{align*}
$$

where $E[U|X_1,\ldots,X_{k}]=0$, there are no exact linear relationships among the set of regressors (covariates, confounders, independent variables, predictors, etc.) $[X_1,\ldots,X_{k}]$, and var$(U|X_1,\ldots,X_{k})<+\infty$. The scalar random variable, $Y$, is called the outcome, or the dependent variable.



<p style='text-align: right;'> <a href="https://en.wikipedia.org/wiki/Conditional_expectation" style="color: #cc0000">Conditional Expectation</a></p>

<p style='text-align: right;'> <a href="https://en.wikipedia.org/wiki/Conditional_variance" style="color: #cc0000">Conditional Variance</a></p>


## The Data

We observe a _random sample_ of $n$ observations taken from $[Y,X_1,\ldots,X_{k},U]$, i.e., $\{(y_i,x_{i,1},\ldots,x_{i,k}):i=1,\ldots,n\}$. Therefore one has

$$
\begin{array}
[c]{c}
y_{1}=\beta_{0}+\beta_{1}x_{1,1}+\cdots+\beta_{k}x_{1,k}+u_{1}\\
y_{2}=\beta_{0}+\beta_{1}x_{2,1}+\cdots+\beta_{k}x_{2,k}+u_{2}\\
y_{3}=\beta_{0}+\beta_{1}x_{3,1}+\cdots+\beta_{k}x_{3,k}+u_{3}\\
\vdots\\
y_{n}=\beta_{0}+\beta_{1}x_{n,1}+\cdots+\beta_{k}x_{n,k}+u_{n}
\end{array}
$$ or equivalently
$$\left[
\begin{array}
[c]{c}
y_{1}\\
y_{2}\\
y_{3}\\
\vdots\\
y_{n}
\end{array}
\right]  =\left[
\begin{array}
[c]{cccc}
1 & x_{1,1} & \cdots & x_{1,k}\\
1 & x_{2,1} & \cdots & x_{2,k}\\
1 & x_{3,1} & \cdots & x_{3,k}\\
\vdots & \vdots & \ddots & \vdots\\
1 & x_{n,1} & \cdots & x_{n,k}
\end{array}
\right]  \left[
\begin{array}
[c]{c}
\beta_{0}\\
\beta_{1}\\
\vdots\\
\beta_{k}
\end{array}
\right]  +\left[
\begin{array}
[c]{c}
u_{1}\\
u_{2}\\
u_{3}\\
\vdots\\
u_{n}
\end{array}
\right]
$$ that can be rewritten in **matrix form**  as

$$
\begin{align*}
\mathbf{y}=\mathbf{X}\mathbf{\beta}+\mathbf{u}\text{,}
\end{align*}
$$

where $\mathbf{y}$ is a $n\times 1$ [vector](https://en.wikipedia.org/wiki/Vector_(mathematics_and_physics)), $\mathbf{X}$ is a $n\times(k+1)$ [matrix](https://en.wikipedia.org/wiki/Matrix_(mathematics)) - sometimes called the [*design matrix*](https://en.wikipedia.org/wiki/Design_matrix), $\mathbf{\beta}$ is a $(k+1)\times 1$ vector of <ins>unknown</ins> parameters, and $\mathbf{u}$ is a $n\times 1$ vector.

In [None]:
## removing everything from memory
rm(list=ls())
## turning all warnings off
options(warn=-1)

## installing the 'wooldridge' package if not previously installed
if (!require(wooldridge)) install.packages('wooldridge')

## loading the packages
library(wooldridge)

data(hprice2)

##  hprice2
##  Obs:   506

##  1. price                    median housing price, $
##  2. crime                    crimes committed per capita
##  3. nox                      nitrous oxide, parts per 100 mill.
##  4. rooms                    avg number of rooms per house
##  5. dist                     weighted dist. to 5 employ centers
##  6. radial                   accessibiliy index to radial hghwys
##  7. proptax                  property tax per $1000
##  8. stratio                  average student-teacher ratio
##  9. lowstat                  % of people 'lower status'
## 10. lprice                   log(price)
## 11. lnox                     log(nox)
## 12. lproptax                 log(proptax)

head(hprice2)

## The Assumptions

**<span style="color:blue">Assumption MLR.1:</span>** $\mathbf{y}=\mathbf{X}\mathbf{\beta}+\mathbf{u}$.

In [None]:
## specifying the outcome variable (y) and regressors (X)
outcome <- "lprice"
predictors <- c("lnox", "lproptax", "crime", "rooms", "dist", "radial", "stratio", "lowstat")

## creating a specification of the linear model
f <- as.formula(
                paste(outcome, 
                      paste(predictors, collapse = " + "), 
                      sep = " ~ ")
                )
print(f)

**<span style="color:blue">Assumption MLR.2:</span>** $\{(y_i,x_{i,1},\ldots,x_{i,k}):i=1,\ldots,n\}$ is a random sample (independent and identically distributed - i.i.d.).

In [None]:
head(subset(hprice2,select=c(outcome,predictors)))

**<span style="color:blue">Assumption MLR.3:</span>** rank$[E(\mathbf{x}_i\mathbf{x}_i^\prime)]=k+1$, where $\mathbf{x}_i^\prime=[1,x_{i,1},\ldots,x_{i,k}]$. <p style='text-align: right;'> <a href="https://en.wikipedia.org/wiki/Rank_(linear_algebra)" style="color: #cc0000">Rank of a Matrix</a></p>
<p style='text-align: right;'> <a href="https://en.wikipedia.org/wiki/Transpose" style="color: #cc0000">Transpose</a></p> <p style='text-align: right;'> <a href="https://en.wikipedia.org/wiki/Expected_value" style="color: #cc0000">Expected Value</a></p>

In [None]:
## asking R to print the design matrix for the chosen model
X <- model.matrix(f,data=hprice2)
X

In [None]:
## calculating & printing the sample counterpart of E[xx']
X.X.n <- t(X)%*%X/nrow(X)
X.X.n

In [None]:
## asking R to calculate the actual rank of the estimated E[xx']
qr(X.X.n)$rank

**<span style="color:blue">Assumption MLR.4:</span>** $E[\mathbf{u}|\mathbf{X}]=\mathbf{0}$.

$$
E[\mathbf{u}|\mathbf{X}]=\left[
\begin{array}
[c]{c}
E[u_{1}|\mathbf{X}]\\
E[u_{2}|\mathbf{X}]\\
E[u_{3}|\mathbf{X}]\\
\vdots\\
E[u_{n}|\mathbf{X}]
\end{array}
\right]  =\left[
\begin{array}
[c]{c}
0\\
0\\
0\\
\vdots\\
0
\end{array}
\right]=\mathbf{0}.  $$

**<span style="color:blue">Assumption MLR.5:</span>** var$(\mathbf{u}\mathbf{u}^{\prime}|\mathbf{X})=E[\mathbf{u}\mathbf{u}^{\prime}|\mathbf{X}]=\Omega$, i.e.,

$$
E[\mathbf{u}\mathbf{u}^{\prime}|\mathbf{X}]=
\begin{bmatrix}
E[u_{1}^{2}|\mathbf{X}] & E[u_{1}u_{2}|\mathbf{X}] & \cdots & E[u_{1}u_{n}|\mathbf{X}]\\
E[u_{2}u_{1}|\mathbf{X}] & E[u_{2}^{2}|\mathbf{X}] & \cdots & E[u_{2}u_{n}|\mathbf{X}]\\
E[u_{3}u_{1}|\mathbf{X}] & E[u_{3}u_{2}|\mathbf{X}] & \cdots & E[u_{3}u_{n}|\mathbf{X}]\\
\vdots & \vdots & \ddots & \vdots\\
E[u_{n}u_{1}|\mathbf{X}] & E[u_{n}u_{2}|\mathbf{X}] & \cdots & E[u_{n}^{2}|\mathbf{X}]
\end{bmatrix}  = \Omega
$$

where $\vert\Omega\vert < +\infty$.

<p style='text-align: right;'> <a href="https://en.wikipedia.org/wiki/Determinant" style="color: #cc0000">Determinant</a></p>

## The *Ordinary Least Squares* (OLS) Estimator

Define the residual sum of squares ($RSS$) as a function of any candiate guess, $\mathbf{b}=[b_{0},b_{1},\cdots,b_{k}]^{\prime}$, for the unknown $[\beta_{0},\beta_{1},\beta_{2},\cdots,\beta_{k}]^{\prime}\equiv\mathbf{\beta}$, i.e.,

$$
RSS(\mathbf{b})=\Sigma_{i=1}^{n}(y_{1}-b_{0}-b_{1}x_{1}-b_{2}x_{2}
-\cdots-b_{k}x_{k})^{2}=(\mathbf{y}-\mathbf{Xb})^{\prime}(\mathbf{y}
-\mathbf{Xb})
$$

By standard [matrix calculus](https://en.wikipedia.org/wiki/Matrix_calculus) one has
$$
\frac{\partial RSS(\mathbf{b})}{\partial\mathbf{b}}=\left[
\begin{array}
[c]{c}
\partial RSS(\mathbf{b})/\partial b_{0}\\
\partial RSS(\mathbf{b})/\partial b_{1}\\
\partial RSS(\mathbf{b})/\partial b_{2}\\
\vdots\\
\partial RSS(\mathbf{b})/\partial b_{k}
\end{array}
\right]  =-2\mathbf{X}^{\prime}(\mathbf{y}-\mathbf{Xb})\text{,}
$$

and

$$
\frac{\partial^{2}RSS(\mathbf{b})}{\partial\mathbf{b}\partial\mathbf{b}
^{\prime}}=\left[
\begin{array}
[c]{cccc}
\partial^{2}RSS(\mathbf{b})/\partial b_{0}^{2} & \partial^{2}RSS(\mathbf{b}
)/\partial b_{0}\partial b_{1} & \cdots & \partial^{2}RSS(\mathbf{b})/\partial
b_{0}\partial b_{k}\\
\partial^{2}RSS(\mathbf{b})/\partial b_{1}\partial b_{0} & \partial
^{2}RSS(\mathbf{b})/\partial b_{1}^{2} & \cdots & \partial^{2}RSS(\mathbf{b}
)/\partial b_{1}\partial b_{k}\\
\partial^{2}RSS(\mathbf{b})/\partial b_{2}\partial b_{0} & \partial
^{2}RSS(\mathbf{b})/\partial b_{2}\partial b_{1} & \cdots & \partial
^{2}RSS(\mathbf{b})/\partial b_{2}\partial b_{k}\\
\vdots & \vdots & \ddots & \vdots\\
\partial^{2}RSS(\mathbf{b})/\partial b_{k}\partial b_{0} & \partial
^{2}RSS(\mathbf{b})/\partial b_{k}\partial b_{1} & \cdots & \partial
^{2}RSS(\mathbf{b})/\partial b_{k}^{2}
\end{array}
\right]  =2\mathbf{X}^{\prime}\mathbf{X}\text{,}
$$

where $\mathbf{X}^{\prime}\mathbf{X}$ is a [positive definite matrix](https://en.wikipedia.org/wiki/Definiteness_of_a_matrix) by **<span style="color:blue">Assumption MLR.3</span>**.. 

In [None]:
## printing 2X'X
2*t(X)%*%X

Therefore $\left.  \frac{\partial RSS(\mathbf{b})}
{\partial\mathbf{b}}\right\vert _{\mathbf{b}=\widehat{\mathbf{\beta}}
}=\mathbf{0}$ defines a [maxima](https://en.wikipedia.org/wiki/Maxima_and_minima), i.e., $\widehat{\mathbf{\beta}}=(\mathbf{X}
^{\prime}\mathbf{X})^{-1}\mathbf{X}^{\prime}\mathbf{y}$. Here we have used the
notation $\mathbf{A}^{-1}$ to denote the [inverse of a matrix](https://en.wikipedia.org/wiki/Invertible_matrix) $\mathbf{A}$.

<p style='text-align: right;'> <a href="https://en.wikipedia.org/wiki/Euclidean_distance" style="color: #cc0000">Euclidean Distance</a></p>

In [None]:
## calculating OLS by hand
solve(t(X)%*%X)%*%(t(X)%*%hprice2$lprice)

In [None]:
## calculating OLS using the `lm' command in R
coef(lm(f,data=hprice2))