# __Regression Models__

## ___Notebook 1:___ _Textbook Notes: Chapter 9; Regression: Basics_

_By: Trevor Rowland_ ([@dBCooper2](https://github.com/dBCooper2))

This notebook is a series of Notes on Chapter 9 of the Data Analysis textbook and covers the theory of regression with respect to financial engineering.

### _References_

### _Table of Contents_

## 9.1 Introduction

Regression is one of the most used statistical models.

For univariate regression, there is one response(dependent) variable and $p$ predictor(independent) variables, measured on $n$ observations.

Let $Y$ be the response variable,

Let $X_1, ..., X_p$ be the predictor/explanatory variables

For the $i$-th observation, the values are:

$Y_i$ and $X_{i,1}, ..., X_{i,p}$

__The goal of regression is determining how $Y$ is related to $X_i,...,X_p$, estimating the expectation of $Y$ given $X_i,...,X_p$, and prediction of future $Y$-values when $X_i,...,X_p$ is already available.__

The Multiple Linear Regression Model for $Y_i$ and $X_{i, 1},...,X_{i, p}$ is:

$$Y_i = \beta_0 + \beta_1X_{i,1}+...+\beta_p X_{i,p}+\epsilon_i$$

Where:

$\epsilon_i$ is the noise(also called disturbances or errors)

$\beta_0$ is the intercept, and is denoted as $\alpha$ in the CAPM regression model.

$\beta_1,...\beta_p$ are the regression coefficients and are the slopes, with the definition that $\beta_j$ is the partial derivative of the expected response with respect to the $j$-th predictor(or $j$-th independent variable)

$$\beta_j = \frac{\partial E(Y_i|X_{i,1},...,X_{i,p})}{\partial X_{i,j}}$$

Therefore, $\beta$ is the change in the expected value of $Y$ when $X$ changes.

The noise, $\epsilon_i$, is assumed to be i.i.d.(independently and identically distributed) so that the noise has mean 0 and variance $\sigma_{\epsilon}^2$.

Noise is assumed to be normally distributed, which implies Gaussian white noise.

###### This is just a note in case I have to come back and look at this, I have not covered noise yet.

## 9.2 Straight-Line Regression

Straight-Line Regression is linear regression using only one predictor variable. The model is defined as: 

$$Y_i = \beta_0 + \beta_1X+\epsilon_i$$

### 9.2.1 _Least Squares Estimation_

The regression coefficients can be estimated using the _method of least squares_. The least squares estimate involves optimizing values of $\hat{\beta_0}$ and $\hat{\beta_1}$ such that the following summation is minimized:

$$\sum_{i=1}^{n}\{Y_i-(\hat{\beta_0}+\hat{\beta_1}X_i)\}^2$$

Taking a look at Figure 9.1 from the textbook, this can be geometrically represented as minimizing the distance of the vertical residual lines that lie between the actual and fitted values of $Y$.

<img src="../../docs/notebook_images/regression_modeling/fig_9_1.png" style="width: 50%;">

The least-squares estimate [equation](#921-least-squares-estimation) can be minimized to estimate the value of $\hat{\beta_1}$:

$$\hat{\beta_1} = \frac{\sum_{i=1}^{n}(Y_i-\bar{Y})(X_i-\bar{X})}{\sum_{i=1}^{n}(X_i-\bar{X})^2} = \frac{\sum_{i=1}^{n}Y_i(X_i-\bar{X})}{\sum_{i=1}^{n}(X_i-\bar{X})^2}$$

Estimating $\hat{\beta_0}$ using the same equation:

$$\hat{\beta_0} = \bar{Y}-\hat{\beta_1}\bar{X}$$

The equation of the Least-Squares Line is:

$$\hat{Y} = \hat{\beta_0} + \hat{\beta_1}X = \bar{Y} + \hat{\beta_1}(X_i-\bar{X})$$

$$= \bar{Y} + \{\frac{\sum_{i=1}^{n}(Y_i-\bar{Y})(X_i-\bar{X})}{\sum_{i=1}^{n}(X_i-\bar{X})^2}\}(X-\bar{X}$$

$$= \bar{Y} + \frac{s_{XY}}{s_X^2}(X-\bar{X})$$

Where $s_{XY} = \frac{1}{n-1}\sum_{i=1}^{n}(Y_i-\bar{Y})(X_i-\bar{X})$ is the _sample covariance_ of $X$ and $Y$, and $s_X^2$ is the _sample variance_ of $X$.

This is a series of algorithms that can be translated into Python code to build a custom regression model for CAPM.

__NOTE__: I have already completed an algorithm similar to this that iteratively fits the regression line after following a youtube tutorial by NeuralNine. This notebook can be found [here](<https://github.com/dBCooper2/pythonic-finance/blob/main/notebooks/regression_models/simple_linear_regression.ipynb>): 

### 9.2.2 _Variance of_ $\hat{\beta_1}$

To show how the estimator $\hat{\beta_1}$ depends on aspects of the data like the sample size and values of the predictor(independent) variables, a formula for the variance of the estimator is needed. Fortunately, this can be derived from the equation for $\hat{\beta_1}$ and can rewrite this coefficient as a weighted average of the responses:

$$\hat{\beta_1} = \sum_{i=1}^{n}w_iY_i$$

Where:

$w_i$ is the weight given by: 

$$w_i = \frac{X_i-\bar{X}}{\sum_{i=1}^{n}(X_i-\bar{X})^2}$$

The textbook also provides a formula for the variance of $\hat{\beta_1}$ from the assumptions that $X_1, ..., X_n$ are random variables, and the variance of $(Y_i|X_1, ..., X_n) = \sigma_{\epsilon}^2$ and $Y_1,...,Y_n$ are uncorrelated. 

Therefore the Variance of $\hat{\beta_1}$ can be written as:

$$Var(\hat{\beta_1}|X_1,...,X_n)=\sigma_{\epsilon}^2 \sum_{i=1}^{n}w_i^2 = \frac{\sigma_{\epsilon}^2}{\sum_{i=1}^{n}(X_i-\bar{X})^2} = \frac{\sigma_{\epsilon}^2}{(n-1)s_X^2}$$

or: 

$$Var(\hat{\beta_1}|X_1,...,X_n)= \frac{\sigma_{\epsilon}^2}{(n-1)s_X^2}$$

This formula is very important to examine for the interpretation of the regression line.

First, the numerator($\sigma_{\epsilon}^2$) is the variance of $\epsilon_i$, or the noise of the graph. This is because more variance in the noise means the estimators are more varied.

The denominator shows that the variance of $\hat{\beta_1}$ is inversely proportional to $(n-1)$ and $s_X^2$. 

This means __the precision of the estimator increases__ as...

- $\sigma_{\epsilon}^2$ __decreases__

- $n$ __(the number of observations) increases__

- $s_X^2$ __increases__(this happens because a larger $s_X^2$ means the $X$ values are spread further apart, making the slope of the regression line easier to estimate)

Applying this theory to the practice of collecting financial data, a significant practical question is what sampling interval should be used for calculations?

The regression model says that the highest possible sampling frequency should be used.

Therefore, daily candles > weekly > monthly, and so on.

It is assumed that $X_t$ and $Y_t$ are white noise, but this conclusion holds if the data turns out to be _stationary but autocorrelated_(this will be covered later).

However, the noise series $\epsilon_i$ __needs to be uncorrelated__. If the noise is autocorrelated and becomes more highly correlated as the sampling frequency increases, then this conclusion may not hold, and there may be a point of diminishing returns where more frequent sampling does not improve accuracy of estimations.

## 9.3 Multiple Linear Regression

Recall that the multiple linear regression model is:

$$Y_i = \beta_0 + \beta_1X_{i,1}+...+\beta_p X_{i,p}+\epsilon_i$$

The least-squares estimate can be used for multiple linear regression as well, and the estimates are the values $\hat{\beta_0}, \hat{\beta_1}, ..., \hat{\beta_p}$ that minimize:

$$\sum_{i=1}^{n}\{Y_i-(\hat{\beta_0}, \hat{\beta_1}X_{i,1}, ..., \hat{\beta_p}X_{i,p})\}^2$$

The calculation of the least-squares estimate will be covered in Chapter 11. For now, the calculations that are relevant are:

The $i$-th _fitted value_:

$$\hat{Y_i}=\hat{\beta_0}+\hat{\beta_1}X_{i,1}+...+\hat{\beta_p}X_{i,p}$$

This estimates the Expected $Y$ Value for all $X_{i,p}$.

The $i$-th residual is:

$$\hat{\epsilon_i} = Y_i-\hat{Y_i}=Y_i-(\hat{\beta_0}+\hat{\beta_1}X_{i,1}+...+\hat{\beta_p}X_{i,p})$$

This estimates $\epsilon_i$

Taking these 2 equations, $Y_i$ can be expressed as:

$$Y_i = \hat{Y_i}+\hat{\epsilon_i}$$ 

Lastly, an unbiased estimate of the variance of the noise, $\sigma_{\epsilon}^2$ is:

$$\sigma_{\epsilon}^2 = \frac{\sum_{i=1}^{n}\hat{\epsilon_i}^2}{n-1-p}$$

Where the denominator is the sample size minus the number of regression coefficients that are estimated.

### 9.3.1 _Standard Errors, t-Values, and P-Values_

This section explores statistics used in regression output that allow for interpretation of the model.

The coefficients in a linear regression model each have 3 other statistics associated with them:

- The Standard Error(SE): estimated standard deviation of the least-squares estimator, gives the precision of the estimator coefficient.

- The t-value: the t-statistic for testing that the coefficient is 0. The t-statistic is the ratio of the estimate to its standard error, or $\frac{\hat{\beta_i}}{SE}$.

- The p-value: the p-value is associated with testing the null hypothesis that the coefficient is 0 versus the alternative, that it is not 0. If the p-value for a slope is small, then that is evidence that the slope is __not__ 0, which means that the predictor(independent variable) has a linear relationship with the response(dependent variable).

It is important to remember that the p-value only describes if the relationship is linear. If the p-value is very high, that could signal a strong non-linear relationship between the variables. Because of this, graphical analysis of the regression line is essential to better understand the model.

The formula for Standard Error(SE) for a coefficient $\hat{\beta_1}$ __when there is only one predictor variable__ is:

$$SE = \frac{\hat{\sigma_{\epsilon}}}{\sqrt{\sum_{i=1}^{n}(X_i-\bar{X_i})^2}}$$

The formula for the SE of multiple predictor variables requires matrix notation, and will be continued in Chapter 11.

## 9.4 Analysis of Variance, Sums of Squares, and $R^2$