In [117]:
# Import libraries
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from pandas.core.common import random_state
from sklearn.linear_model import LinearRegression

### LOS 10.a: Describe a simple linear regression model, how the least squares criterion is used to estimate regression coefficients, and the interpretation of these coefficients.

The purpose of simple linear regression is to explain the variation in a dependent variable in terms of the variation in a single independent variable. Here, the term variation is interpreted as the degree to which a variable differs from its mean value. Don't confuse variation with variance—they are related, but they are not the same.


$\text{variation in Y} = {\Large\sum_{1}^{n}} (Y_i - \bar{Y})^2$

* The **dependent variable** is the variable whose variation is explained by the independent variable. We are interested in answering the question, "What explains fluctuations in the dependent variable?" The dependent variable is also referred to as the terms explained variable, endogenous variable, or predicted variable.
</br>

* The **independent variable** is the variable used to explain the variation of the dependent variable. The independent variable is also referred to as the terms explanatory variable, exogenous variable, or predicting variable.
</br>

**Example: Dependent vs. independent variables**

Suppose you want to predict stock returns with GDP growth. Which variable is the independent variable?

&emsp;**Answer:**

Because GDP is going to be used as a predictor of stock returns, stock returns are being *explained* by GDP. Hence, stock returns are the dependent (explained) variable, and GDP is the independent (explanatory) variable.


#### Simple Linear Regression Model

The following linear regression model is used to describe the relationship between two variables, $X$ and $Y$:
<br></br>
$\Large{Y_i = b_0 + b_1X_i + \epsilon_i ,... i = 1, ..., n}$

&emsp; 

<U>where:</U>

$Y_i$ = ith observation of the dependent variable, $Y$ 

$X_i$ = ith observation of the independent variable, $X$
 
$b_0$ = regression intercept term
 
$b_1$ = regression slope coefficient
 
$\epsilon_i$ = **residual** for the $i_{th}$ observation (also referred to as the disturbance term or error term);

Based on this regression model, the regression process estimates an equation for a line through a scatter plot of the data that "best" explains the observed values for $Y$ in terms of the observed values for $X$.


#### Simple Linear Regression Model

The linear equation, often called the line of best fit or regression line, takes the following form:
<br></br>

$\Large\hat{Y}_{i} = \hat{b}_{0} + \hat{b}_{1}X_i i=1,2,3...,n$

&emsp; 

<U>where:</U>

$\hat{Y}_{i}$ = estimated value of $Y_i$ given $X_i$

$\hat{b}_{0}$ = estimated intercept term.

$\hat{b}_{1}$ = estimated slope coefficient.

 
<br>
The hat "^" above a variable or parameter indicates a predicted value.
</br>


Thus, the regression line is the line that minimizes the **SSE**. This explains why simple linear regression is frequently referred to as ordinary least squares **(OLS) regression**, and the values determined by the estimated regression equation, $\hat{Y}_i$, are called least squares estimates.

<br>
The estimated slope coefficient $\hat{b}_{1}$ for the regression line describes the change in $Y$ for a one-unit change in $X$. It can be positive, negative, or zero, depending on the relationship between the regression variables. The slope term is calculated as follows:
</br>
&emsp; 

$\Large\hat{b}_{1} = \frac{CovXY}{\sigma^2_X}$

The intercept term $\hat{b}_{0}$ is the line's intersection with the $Y$-axis at $X = 0$. It can be positive, negative, or zero. A property of the least squares method is that the intercept term may be expressed as follows:

&emsp; 
$\large\hat{b}_{0}=\bar{Y}−\hat{b}_{1}\bar{X}$

where:

Y = mean of Y

X = mean of X

The intercept equation highlights the fact that the regression line passes through a point with coordinates equal to the mean of the independent and dependent variables (i.e., the point X, Y).

<hr>

**Example: Computing the slope coefficient and intercept term**

Compute the slope coefficient and intercept term using the following information:

<table>
<thead>
  <tr>
    <th></th>
    <th></th>
    <th></th>
    <th></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>Cov(S&amp;P 500, ABC</td>
    <td>0.000336</td>
    <td>Mean return, S&amp;P 500</td>
    <td>−2.70%</td>
  </tr>
  <tr>
    <td>Var(S&amp;P 500)</td>
    <td>0.000522</td>
    <td>Mean return, ABC</td>
    <td>−4.05%</td>
  </tr>
</tbody>
</table>

**Answer:**

The slope coefficient is calculated as $\hat{b}_{1} = \frac{0.000336}{0.000522} = 0.64$.

The intercept term is calculated as follows:

$\hat{b}_{0}=\overline{ABC}−\hat{b}_{1}\overline{SP500}=−4.05\% −0.64 (−2.70\%) = −2.3\%$
<br>
</br>
The estimated regression line that minimizes the SSE in our ABC stock return example is shown in  Estimated Regression Equation for ABC vs. S&P 500 Excess Returns.

<br>
This regression line has an intercept of $–2.3\%$ and a slope of $0.64$. The model predicts that if the S&P 500 excess return is $–7.8\%$ (May 20X4 value), then the ABC excess return would be $–2.3\% + (0.64)(–7.8\%) = –7.3\%$. The residual (error) for the May 20X4 ABC prediction is $8.4\%$—the difference between the actual ABC excess return of $1.1\%$ and the predicted return of $–7.3\%$.
</br>


In [145]:
covAB= 0.000336
ABC_Var = 0.000522
SP500_Mu = -2.70
ABC_Mu = -4.05
## Define the Slope intercept
bhat_1 = covAB / ABC_Var
bhat_0 = ABC_Mu - (bhat_1*SP500_Mu)

#Excess returns
excessSP = bhat_0 + bhat_1 * -7.8

#Actual ABC returns
actABC = 1.1
sse = actABC - excessSP 

In [151]:
print("Slope Coeffiecient b^1   =  ", round(bhat_1, 2))
print("Intercept b^0            = ", round(bhat_0, 1))
print("Excess returns of SP500  = ", round(excessSP, 1))
print("Excess returns of ABC    =  ", round(excessABC, 1))
print("SSE Error ABC prediction =  ", round(sse, 1))

Slope Coeffiecient b^1   =   0.64
Intercept b^0            =  -2.3
Excess returns of SP500  =  -7.3
Excess returns of ABC    =   1.1
SSE Error ABC prediction =   8.4


<hr>


<img src="https://github.com/PachaTech/CFA-Level-1/blob/main/10_1%20graph.jpeg?raw=true">