# Regression Models with Multiple Regressors

## Omitted Variable Bias

The previous analysis of the relationship between test score and class size discussed in Chapters 4 and 5 has a major flaw: we ignored other determinants of the dependent variable (test score) that correlate with the regressor (class size). Remember that influences on the dependent variable which are not captured by the model are collected in the error term, which we so far assumed to be uncorrelated with the regressor. However, this assumption is violated if we exclude determinants of the dependent variable which vary with the regressor. This might induce an estimation bias, i.e., the mean of the OLS estimator’s sampling distribution is no longer equals the true mean. In our example we therefore wrongly estimate the causal effect on test scores of a unit change in the student-teacher ratio, on average. This issue is called omitted variable bias (OVB) and is summarized by Key Concept 6.1.

![title](images/chapter6/img1.jpg)

![title](images/chapter6/img2.png)

## Multiple regression model

![title](images/chapter6/img3.png)

We want to minimize $$ \sum_{i=1}^n (Y_i - b_0 - b_1 X_{1i} - b_2 X_{2i} - \dots -  b_k X_{ki})^2 \tag{6.5} $$

SER <- sqrt(1/(n-k-1) * SSR)                    # standard error of the regression   
Rsq <- 1 - (SSR / TSS)                          # R^2   
adj_Rsq <- 1 - (n-1)/(n-k-1) * SSR/TSS          # adj. R^2   

As already mentioned, $\bar{R}^2$ may be used to quantify how good a model fits the data. However, it is rarely a good idea to maximize these measures by stuffing the model with regressors. You will not find any serious study that does so. Instead, it is more useful to include regressors that improve the estimation of the causal effect of interest which is not assessed by means the $\bar{R}^2$ of the model. The issue of variable selection is covered in Chapter 8.

## Imperfect Multicollinearity

If X1 and X2 are highly correlated, OLS struggles to precisely estimate β1.   
That means that although ^β1 is a consistent and unbiased estimator for β1, it has a large variance due to X2 being included in the model.   

https://www.econometrics-with-r.org/6-4-ols-assumptions-in-multiple-regression.html