# Building Regression Models in `R`
... You will see more examples and get more experience with this in the associated practical session. However, it is useful to see a basic example before this.

## The `lm()` Function
Regression models are built in `R` using the `lm()` function. Here, `lm` corresponds to *linear model*, which is the first hint that what we are doing is much more general than just fitting a regression model to the data. 

## Regression Results

In [1]:
data(mtcars)
mod <- lm(mpg ~ wt + hp, data=mtcars)
summary(mod)


Call:
lm(formula = mpg ~ wt + hp, data = mtcars)

Residuals:
   Min     1Q Median     3Q    Max 
-3.941 -1.600 -0.182  1.050  5.854 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
wt          -3.87783    0.63273  -6.129 1.12e-06 ***
hp          -0.03177    0.00903  -3.519  0.00145 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared:  0.8268,	Adjusted R-squared:  0.8148 
F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12


### Understanding the Output Table

The first section just repeats the call to `lm` that was used, so we have a record of what was used to generate the results.

```R
Call:
lm(formula = mpg ~ wt + hp, data = mtcars)
```

Next, we have some descriptive of the distribution of the residuals. These descriptives are useful because we would expect the median to be around 0, the 1st and 3rd quantile to be similar (save for the sign) to indicate a symmetric distribution and the min/max values to both be similar and not too large (on the scale of MPG), to exclude any outliers. We will dive into this in much more detail next week.

```R
Residuals:
   Min     1Q Median     3Q    Max 
-3.941 -1.600 -0.182  1.050  5.854 
```

Below this, we have the actual results table, providing the estimates of each of the parameters. In addition, there is other information listed that is relevant to *statistical inference*, which will be part of our focus next week. FOr the time being, these other values can be ignored. Within the context of our model, we therefore have $\hat{\beta}_{0} = 37.23$, $\hat{\beta}_{1} = -3.88$ and $\hat{\beta}_{2} = -0.032$.

```R
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
wt          -3.87783    0.63273  -6.129 1.12e-06 ***
hp          -0.03177    0.00903  -3.519  0.00145 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
```

At the very bottom of the output, we have some additional information about the model

```R
Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared:  0.8268,	Adjusted R-squared:  0.8148 
F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12
```

The residual standard error is the square-root of the variance estimated from the residuals. In other words, this given $\sqrt{\hat{\sigma}^{2}} = \hat{\sigma}$. Everything else relates largely to *model fit*, which we will be discussing further next week.