# Linear Regression

## Outline
##  Learning Algorithms Used to Estimate the Coefficients
* __Simple Linear Regression__

When there is a single input variable (x), the method is referred to as simple linear regression. 

* __Ordinary Least Squares__

When we have more than one input we can use Ordinary Least Squares to estimate the values of the coefficients. The Ordinary Least Squares procedure seeks to minimize the sum of the squared residuals.

* __Gradient Descent__

Gradient Descent is a very generic optimization algorithm capable of finding optimal solutions to a wide range of problems. The general idea of Gradient Descent is to tweak parameters iteratively in order to minimize a cost function.

* __Regularization__

These seek to both minimize the sum of the squared error of the model on the training data (using ordinary least squares) but also to reduce the complexity of the model (like the number or absolute size of the sum of all coefficients in the model).


## Prepare Data for Linear Regression
* Linear Assumption (log transform)
* Remove Noise (remove outlier)
* Remove Collinearity (overfit)
* Gaussian Distributions (log or Boxcox transform)
* Rescale Inputs (standardization or normalization)

https://machinelearningmastery.com/linear-regression-for-machine-learning/

## Variable Types and Coding
* Numerical
* Categorical: one-hot-encodeing, integer encoding
* Ordinal: a natural ordering between categories

https://www.ismll.uni-hildesheim.de/lehre/ml-07w/skript/ml-2up-01-linearregression.pdf

## Assumtion
* __Linearity & Additive__: There should be a linear relationship between dependent and independent variables and the impact of change in independent variables values should have additive impact on dependent variables.
* __Normality of error distribution__: Distribution of residuals should be normal distributed.
* __Homoscedasticity__: Variance of errors should be constant versus, a. Time, b. The predictions, c. Independent variable values
* __Statistical independence of errors__: The residuals should not have any correlation among themselves.

https://www.dezyre.com/data-science-in-r-programming-tutorial/linear-regression-tutorial

## Evaluation Metrics
* __$R^2$__: A measure of how well observed outcomes are replicated by the model, as the proportion of total variation of
outcomes explained by the model.

http://bigdata-madesimple.com/how-to-run-linear-regression-in-python-scikit-learn/

## Regression  Diagnostics
https://medium.com/@emredjan/emulating-r-regression-plots-in-python-43741952c034

## Multicolinearity
The Variance Inflation Factor (VIF) is a measure of colinearity among predictor variables within a multiple regression. Inspect the factors for each predictor variable, if the VIF is between 5-10, multicolinearity is likely present and you should consider dropping the variable.
$$VIF = \frac{1}{(1 - R^2)}$$

## Regularized Linear Models
### Ridge Regression
Ridge Regression cost function

$J(\theta) = MSE(\theta) + \alpha\sum_{i=1}^{n} \theta_i^2$

### Lasso Regression
Least Absolute Shrinkage and Selection Operator Regression (simply called Lasso Regression) is another regularized version of Linear Regression: just like Ridge Regression, it adds a regularization term to the cost function, but it uses the ℓ1 norm
of the weight vector instead of half the square of the ℓ2 norm.

Lasso Regression cost function

$J(\theta) = MSE(\theta) + \alpha\sum_{i=1}^{n} |\theta_i|$

### Elastic Net
Elastic Net is a middle ground between Ridge Regression and Lasso Regression. The regularization term is a simple mix of both Ridge and Lasso’s regularization terms, and you can control the mix ratio r. When r = 0, Elastic Net is equivalent to Ridge
Regression, and when r = 1, it is equivalent to Lasso Regression.

Elastic Net cost function

$J(\theta) = MSE(\theta) + r\alpha\sum_{i=1}^{n} |\theta_i| + \frac{1-r}{2}\alpha\sum_{i=1}^{n}\theta_i^2$

## Linear Models in Python
* __scipy.stats.linregress__ only handles the case of a single explanatory variable with specialized code and calculates a few extra statistics.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html

* __statsmodels.OLS__ is a generic linear model (OLS) estimation class. It doesn't prespecify what the explanatory variables are and can handle any multivariate array of explanatory variables, or formulas and pandas DataFrames. It not only returns the estimated parameters, but also a large set of results staistics and methods for statistical inference and prediction.

* __sklearn.linear_model__

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html