# __Multiple Linear Regression in Python__

By: Trevor Rowland ([dBCooper2](https://github.com/dBCooper2))

Creating a Multiple Linear Regression Model from Scratch

Expanding on the Simple Linear Regression notebook, this notebook aims to implement a Multivariate Linear Regression Model for use in Fama-French 3-Factor and 5-Factor Analysis

### _References:_

[Deriving Normal Equation for Multiple Linear Regression](<https://medium.com/@bhanu0925/deriving-normal-equation-for-multiple-linear-regression-85241965ee3b>) by [Bhanumathi Ramesh](https://medium.com/@bhanu0925)

[Matrix Approach to Multiple Linear Regression](https://youtu.be/NzuK4iAfxhU?si=cxU-v8ZBgbA1s-FG) by [LearnChemE](https://www.youtube.com/@LearnChemE)

[Matrix Form Multiple Linear Regression MLR](https://youtu.be/Imjfp1cxy6g?si=gWXnA9F_XisVzFA4) Tutorial by [Boer Commander](https://www.youtube.com/@BoerCommander)

---------

[Gradient Descent Tutorial](https://www.machinelearningworks.com/tutorials/gradient-descent)

[Multiple Linear Regression from Scratch - Machine Learning Math & Python](https://youtu.be/fldD6fGmsQE?si=IwQntHRUuJCFB-iz) by [kai](https://www.youtube.com/@dylankailau6672)

[Statistics 101: Multiple Linear Regression, The Very Basics ðŸ“ˆ](https://youtu.be/dQNpSa-bq4M?si=9vpoTxdyGzZEPGOx) by Brandon Foltz

[Statistics 101: Multiple Linear Regression, Data Preparation](https://youtu.be/2I_AYIECCOQ?si=axl8PUqk-JUR8QQn) by Brandon Foltz





## Theory

### _Introduction:_

Regression models take a series of predictor(X) variables and a single response(Y) variable, and estimates a line of best fit that can be used to predict unknown response variables.

The formula for the Multiple Regression Model is:

$$y = \beta_0 + \beta_1x_1+\beta_2x_2+...+\beta_px_p + \epsilon_i$$

For the multiple regression equation, $\epsilon_i$ is assumed to be 0, however this noise does exist and needs to be accounted for in the analysis of the model.

## _Deriving the Gradient Descent Formula_

In the simple linear regression model, it was easy to calculate the gradient descent as there were only 2 partial derivatives to calculate. For the multiple regression model, there are 4 and 6 partial derivatives for the Fama-French Models. The derivatives are with respect to the 3-5 predictor variables, and with respect to the alpha, or y-intercept of the regression line.

Additionally, for the Fama-French Regression Class and future Regression Models, it is necessary to have an abstract Regression Model that can handle an indeterminate number of predictor variables. This means the derivation of the error function must be done in a way that can be translated to an array of size $n$ in Python. This requires the use of Matrices to simplify the calculations:

The Multiple Linear Regression Model is:

$$y = \beta_0 + \beta_1x_1+\beta_2x_2+...+\beta_px_p + \epsilon_i$$

Which can be translated into the Matrix Form:

$$
Y_i = 

\begin{bmatrix}
\beta_0 & \beta_1 & ... & \beta_p
\end{bmatrix}

\begin{bmatrix}
X_0 \\
X_1 \\
... \\
X_p \\
\end{bmatrix}

, X_0 = 1
$$

Setting $X_0 = 1$ allows the matrices to be the same size, which simplifies the calculations by including the Y-intercept.

Like the Simple Linear Regression Model, the Ordinary Least Squares approach will be used to estimate the $\beta$ coefficients for $\beta_0 to \beta_p$.

The Least Squares Estimate $\hat{\beta}$ is the solution for $\beta$ when the partial derivative of the Error Function is 0, and is calculated below:

### _Deriving the Least Squares Estimator_ $\hat{\beta}$

This was done using [Bhanumathi Ramesh's](https://medium.com/@bhanu0925) article [Deriving Normal Equation for Multiple Linear Regression](<https://medium.com/@bhanu0925/deriving-normal-equation-for-multiple-linear-regression-85241965ee3b>)

#### The Model and the SSE

The Multiple Linear Regression Model: $$y = \beta_0 + \beta_1x_1+\beta_2x_2+...+\beta_px_p + \epsilon_i$$

can be expressed in matrix form as:

$$\hat{Y} = X \beta + \mathcal{E}$$

The Formula for the Sum Squared Errors(SSE) is: $$E = SSE = \sum_{n}^{i=1}(Y_i-\hat{Y_i})$$

Another way to represent the Error Function is to break the summation into matrices:

$$E =

\begin{bmatrix}
y_1 - \hat{y_1}& 
y_2 - \hat{y_2}&  
... &  
y_n - \hat{y_n}
\end{bmatrix} 

\begin{bmatrix}
y_1 - \hat{y_1}\\
y_2 - \hat{y_2}\\
\vdots \\
y_n - \hat{y_n}
\end{bmatrix}
$$

Which is equivalent to:

$$E = \hat{\mathcal{E}}^T\hat{\mathcal{E}}$$

In Linear Algebra, the transpose of a sum can be decomposed in the following ways:

$$(A+B)^T = A^T+B^T$$

$$(A-B)^T = A^T-B^T$$

Which means the transpose operator in $E = \hat{\mathcal{E}}^T\hat{\mathcal{E}}$ can be distributed, making the function:

$$ E = (Y^T-\hat{Y}^T)(Y-\hat{Y})$$

Substituting the matrix form $\hat{Y} = X \beta$ into the error function returns:

$$ E = (Y^T-(X \beta)^T)(Y-(X \beta))$$

$$ E = Y^T Y - Y^T X \beta - Y(X \beta)^T + (X \beta)^T (X \beta)$$

In order to finish simplifying the equations, the following terms must be proven equal in order to simplify into the solution $\hat{\beta} = (X^T X^{-1})(X^T Y)$:

$$(X \beta)^T Y = Y^T (X \beta)$$

Let $Y = A, X \beta=B$:

Therefore the equation $(X \beta)^T Y = Y^T (X \beta)$ becomes $A^T B = B^T A$

By Linear Algebra, 

$$ (AB)^T = B^T A^T, (A+B)^T = A^T + B^T $$
$$ (A^T B)^T = B^T A, (A-B)^T = A^T - B^T $$

Therefore

$$ A^T B = B^T A = (A^T B)^T $$

$$Y^T (X \beta)  = (Y^T (X \beta))^T$$

Substituting this back into the SSE Equation allows it to be simplified:

$$ E = Y^T Y - Y^T X \beta - Y(X \beta)^T + (X \beta)^T (X \beta)$$

$$ E = Y^T Y - 2Y^T X \beta + (X \beta)^T (X \beta)$$

#### Computing the Partial Derivative:

From here, the partial derivative with respect to $\beta$ can be applied:

$$ \frac{\partial E}{\partial \beta} = \frac{\partial}{\partial \beta} [Y^T Y - 2Y^T X \beta + (X \beta)^T (X \beta)] $$

$$ \frac{\partial E}{\partial \beta} = 0 - 2Y^T X + 2X^T \beta^T X$$

To solve for $\hat{\beta}$, set the derivative equal to 0 and solve:

$$ 0 = 0 - 2Y^T X + X^T \hat{\beta}^T X$$

$$ 2Y^T X = X^T \hat{\beta}^T X$$

$$ \hat{\beta}^T = \frac{2Y^T X}{2X^T X}  = (Y^T X)(X^T X)^{-1}$$

$$ \hat{\beta} = [(Y^T X)(X^T X)^{-1}]^T $$

$$ \hat{\beta} = (Y^T X)^T [(X^T X)^{-1}]^T $$

$$ \hat{\beta} = (X^T Y) (X^T X^{-1}) $$

This lines up with the solution, and this $\hat{\beta}$ represents a matrix of the coefficients that can be solved with the predictor and response variables.

By solving for $\hat{\beta}$, the normal equations for the model have also been solved for. The normal equations are:

$$X^TX\hat{\beta} = X^T Y$$

### _The Gradient Descent Function_

In the Simple Linear Regression Model, the Formula for the Gradient Descent of the slope was:

$$m_{new} = m_{current} - \frac{\partial E}{\partial m}$$

Because the Multiple Linear Regression Model is composed of multiple slopes, a more abstract version of this equation must be constructed that uses $\beta$ for each of the slopes, as well as the intercept. This formula can be represented as:

$$\beta_{new} = \beta_{current} - \frac{\partial E}{\partial \beta}$$

Where i is the number of estimators in the regression equation.

Lastly, a learning rate, $L$, needs to be added to the equation. This will be used in the iterative steps of the Python program that will be written. This means that the completed Gradient Descent Function is:

$$\beta_{new} = \beta_{current} - L \frac{\partial E}{\partial \beta} $$

and in matrix form:

$$\hat{\beta}_{new} = \hat{\beta}_{current} - L \frac{\partial E}{\partial \hat{\beta}} $$

The partial derivative will be computed later, 

$$ \frac{\partial E}{\partial \beta} = - 2Y^T X + 2X^T \beta^T X$$

After speaking with Dr. Allen:

- Betas should be an array of existing betas, do not substitute.