# Regression model

- The values of $\beta_0$, $\beta_1$, and $\sigma^2$ will almost never be known to an investigator.
- Instead, sample data consists of n observed pairs
    
    ($x_1$, $y_1$), … , ($x_n $, $y_n$),

   from which the model parameters and the true regression line itself can be estimated.
- The data (pairs) are assumed to have been obtained independently of one another.

  where 

  $Y_i =\beta_0+\beta_1 x_i + \epsilon_i$ for $i = 1, 2, … , n$

  and the $n$ deviations $\epsilon_1, \epsilon_2, ..., \epsilon_n$
- The “best fit” line is motivated by the principle of least squares, which can be traced back to the German mathematician Gauss (1777–1855):
  
  <img src="ML-image/Multi-lin-reg.png" width="500" height="350" />

> A line provides the best fit to the data if the sum of the squared vertical distances (deviations) from the observed points to that line is as small as it can be. 

- The sum of squared vertical deviations from the points $(x_1, y_1),…, (x_n, y_n)$ to the line is then:
  
  $f(b_0, b_1) =  \sum_{i=1}^n [y_i - (b_0+b_1 x_i)]^2$

- The point estimates of $\beta_0$ and $\beta_1$, denoted by $\hat{\beta}_0$ and $\hat{\beta}_1$, are called the least squares estimates they are those values that minimize $f(b_0, b_1)$.
- The fitted regression line or least squares line is then the line whose equation is:

  $y = \hat{\beta}_0+\hat{\beta}_1 x$.

## Estimating model Parameters

- The minimizing values of b0 and b1 are found by taking partial  derivatives of $f(b_0, b_1)$ with respect to both $b_0$ and $b_1$, equating them both to zero [analogously to $fʹ(b) = 0$ in univariate calculus], and solving the equations
  
  $\frac{\partial f(b_0, b_1)}{\partial b_0} = \sum 2 (y_i - b_0 - b_1 x_i) (-1) = 0$ 

  $\frac{\partial f(b_0, b_1)}{\partial b_1} =  \sum 2 (y_i - b_0 - b_1 x_i) (-x_i) = 0$.

- Which in term gives two equations:
  
  $\sum (y_i - b_0 - b_1 x_i) = 0$

  $\sum (y_i x_i- b_0x_i - b_1 x_i^2) = 0$.

  after some simplification, we can get

  $b_1 = \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} = \frac{S_xy}{S_xx}$

  where 

   - $S_{xy}= \sum x_i y_i - \frac{\sum x_i \sum y_i}{n}$ 
   - $S_{xx} = \sum x_i^2 - \frac{\sum x_i^2}{n}$

  (Typically columns for $x_i, y_i, x_i y_i$ and $x_i^2$ and constructed and then $S_{xy}$ and $S_{xx}$ are calculated.)

- The least squares estimate of the intercept $\beta_0$ of the true
regression line is

  $b_0 = \hat{\beta}_0 = \frac{\sum y_i - \hat{\beta}_1 \sum x_i}{n} = \bar{y}- \hat{\beta}_1 \bar{x}$.

- The computational formulas for $S_{xy}$ and $S_{xx}$ require only the summary statistics $\sum x_i, \sum y_i, \sum x_i y_i$ and $\sum x_i^2$ ($\sum y_i^2$ will be needed shortly for the variance.)

## Fitted values

- **Fitted values:** The fitted (or predicted) values $\hat{y}_1$, $\hat{y}_2$, ...., $\hat{y}_n$ are obtained by substituting $x_1, x_2, ...., x_n$ into the equation of the estimated regression line:

    - $\hat{y}_1 = \hat{\beta}_0 + \hat{\beta}_1 x_1$
    - $\hat{y}_2 = \hat{\beta}_0 + \hat{\beta}_1 x_1$
    -          .
    -          .
    -          .
    - $\hat{y}_n = \hat{\beta}_0 + \hat{\beta}_1 x_n$
- **Residuals:**
  - The differences $y_1 - \hat{y}_1$, $y_2 - \hat{y}_2$, ....., $y_n - \hat{y}_n$ between the observed and fittted $y$ values.
  - When the estimated regression line is obtained via the principle of least squares, the sum of the residuals should in theory be zero, if the error distribution is symmetric, since
  
    $\sum (y_i - (\hat{\beta}_0+ \hat{\beta}_1 x_i)) = n \bar{y}- n \hat{\beta}_0 - \hat{\beta}_1 n \bar{x} = n \hat{\beta}_0 - n \hat{\beta}_0 = 0$.