In [1]:
import numpy as np
import pandas as pd
from scipy.stats import linregress
from sklearn.linear_model import LinearRegression
from IPython.display import display, Latex

# [Linear Regression](https://en.wikipedia.org/wiki/Linear_regression)

> [Interpreting computer output for regression](https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/assessing-the-fit-in-least-squares-regression/a/interpreting-computer-output-regression): Association does not necessarily imply causation.


In [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), **linear regression** is a [linear](https://en.wikipedia.org/wiki/Linearity "Linearity") approach for modelling the relationship between a [scalar](https://en.wikipedia.org/wiki/Scalar_(mathematics) "Scalar (mathematics)") response and one or more explanatory variables (also known as [dependent and independent variables](https://en.wikipedia.org/wiki/Dependent_and_independent_variables "Dependent and independent variables")). The case of one explanatory variable is called _[simple linear regression](https://en.wikipedia.org/wiki/Simple_linear_regression "Simple linear regression")_; for more than one, the process is called **multiple linear regression**.[[1]](https://en.wikipedia.org/wiki/Linear_regression#cite_note-Freedman09-1) This term is distinct from [multivariate linear regression](https://en.wikipedia.org/wiki/Multivariate_linear_regression "Multivariate linear regression"), where multiple [correlated](https://en.wikipedia.org/wiki/Correlation_and_dependence "Correlation and dependence") dependent variables are predicted, rather than a single scalar variable.

- $\displaystyle \hat y = a + bx$
- $\displaystyle b = r\frac{s_y}{s_x}$
- $\displaystyle a = \bar y - b \bar x$

In [2]:
def linear_reg(mu_x, mu_y, sd_x, sd_y, r):
    b = r * (sd_y / sd_x)
    a = mu_y - b * mu_x
    display(Latex(f"$\hat y = {round(a, 3)} + {round(b, 3)}x$"))

In [3]:
mu_x, mu_y, sd_x, sd_y, r = 8.9, 74.3, 4.8, 7.2, -0.88
linear_reg(mu_x, mu_y, sd_x, sd_y, r)

<IPython.core.display.Latex object>

In [4]:
X = np.array([[-2], [-1], [1], [4]])
y = np.array([-3, -1, 2, 3])
reg = LinearRegression().fit(X, y)
a = reg.intercept_
b = reg.coef_[0]
r_squared = reg.score(X, y)
precision = 3

display(Latex(f"$\hat y = {round(a, precision)} + {round(b, precision)}x$"))
display(Latex(f"$R^2 = {round(r_squared, precision)}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

In [5]:
X = X.reshape(len(X)) # reshape X
linregress(X, y)

LinregressResult(slope=0.9761904761904762, intercept=-0.23809523809523808, rvalue=0.9378934722869389, pvalue=0.062106527713061126, stderr=0.25532869749437176, intercept_stderr=0.5987988733313951)

## [Errors and residuals](https://en.wikipedia.org/wiki/Errors_and_residuals)

In [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics") and [optimization](https://en.wikipedia.org/wiki/Mathematical_optimization "Mathematical optimization"), **errors** and **residuals** are two closely related and easily confused measures of the [deviation](https://en.wikipedia.org/wiki/Deviation_(statistics) "Deviation (statistics)") of an [observed value](https://en.wikipedia.org/wiki/Observed_value "Observed value") of an [element](https://en.wikipedia.org/wiki/Elementary_event "Elementary event") of a [statistical sample](https://en.wikipedia.org/wiki/Sample_(statistics) "Sample (statistics)") from its "[true value](https://en.wikipedia.org/wiki/True_value "True value")" (not necessarily observable). 

- The **error** (or **disturbance**) of an [observation](https://en.wikipedia.org/wiki/Observation "Observation") is the deviation of the observed value from the true value of a quantity of interest (for example, a [population mean](https://en.wikipedia.org/wiki/Population_mean "Population mean")). 
- The **residual** is the difference between the observed value and the _[estimated](https://en.wikipedia.org/wiki/Estimation "Estimation")_ value of the quantity of interest (for example, a [sample mean](https://en.wikipedia.org/wiki/Sample_mean "Sample mean")). 

The distinction is most important in [regression analysis](https://en.wikipedia.org/wiki/Regression_analysis "Regression analysis"), where the concepts are sometimes called the **regression errors** and **regression residuals** and where they lead to the concept of [studentized residuals](https://en.wikipedia.org/wiki/Studentized_residual "Studentized residual").

- $\displaystyle e_i = X_i - \mu$
- $\displaystyle r_i = X_i - \bar X$

The standard deviation of the residuals, or $S$, measures the size of a typical prediction error in the $y$ variable. So the units of $S$ match the units on the $y$-variable.

## [Pearson correlation coefficient](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient)

In [statistics](https://en.wikipedia.org/wiki/Statistics "Statistics"), the **Pearson correlation coefficient** (**PCC**) ― also known as **Pearson's _r_**, the **Pearson product-moment correlation coefficient** (**PPMCC**), the **bivariate correlation**,[[1]](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#cite_note-1) or colloquially simply as **the correlation coefficient**[[2]](https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#cite_note-2) ― is a measure of [linear](https://en.wikipedia.org/wiki/Linear "Linear") [correlation](https://en.wikipedia.org/wiki/Correlation_and_dependence "Correlation and dependence") between two sets of data. It is the ratio between the [covariance](https://en.wikipedia.org/wiki/Covariance "Covariance") of two variables and the product of their [standard deviations](https://en.wikipedia.org/wiki/Standard_deviation "Standard deviation"); thus it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationship or correlation. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1 (as 1 would represent an unrealistically perfect correlation).

In [6]:
X = np.random.random(10)
y = np.random.random(10)
slope, intercept, r_value, p_value, std_err = linregress(X, y)
linregress(X, y)

LinregressResult(slope=-0.1518881489028096, intercept=0.5148172723635894, rvalue=-0.20274585383366478, pvalue=0.5742789390728125, stderr=0.25936552064492635, intercept_stderr=0.12344737360893261)

In [7]:
X = X.reshape(len(X), 1) # reshape X
reg = LinearRegression().fit(X, y)
a = reg.intercept_
b = reg.coef_[0]
r_squared = reg.score(X, y)
precision = 3

display(Latex(f"$\hat y = {round(a, precision)} + {round(b, precision)}x$"))
display(Latex(f"$R^2 = {round(r_squared, precision)}$"))

<IPython.core.display.Latex object>

<IPython.core.display.Latex object>

## [R-squared](https://en.wikipedia.org/wiki/Coefficient_of_determination) (coefficient of determination)

> [R-squared intuition](https://www.khanacademy.org/math/statistics-probability/describing-relationships-quantitative-data/assessing-the-fit-in-least-squares-regression/a/r-squared-intuition)

R-squared tells us what percent of the prediction error in the $y$ variable is eliminated when we use least-squares regression on the $x$ variable.

As a result, $r^2$ is also called the **coefficient of determination**.

Many formal definitions say that $r^2$ tells us what percent of the variability in the $y$ variable is accounted for by the regression on the $x$ variable.

It seems pretty remarkable that simply squaring $r$ gives us this measurement. Proving this relationship between $r$ and $r^2$ is pretty complex, and is beyond the scope of an introductory statistics course.

- $\displaystyle \bar y = \frac{1}{n} \sum^n_{i=1}{y_i}$
- $\displaystyle SS_{res} = \sum_{i}(y_i - f_i)^2 = \sum_{i}e^2_i$
- $\displaystyle SS_{tot} = \sum_{i}(y_i - \bar y)^2$
- $\displaystyle R^2 = 1 - \frac{SS_{res}}{SS_{tot}}$

## [Root-mean-squre deviation](https://en.wikipedia.org/wiki/Root-mean-square_deviation)

- $\displaystyle RMSE = RMSD = \sqrt{\frac{\sum^N_{i=1}(x_i - \hat x_i)^2}{N}}$