# ABOUT REGRESSION

## Linear Regression: How it works

#### What it is

Regression is a **supervised machine learning** technique used to model the relationships between one or more independent/feature variables and how they contribute to producing a particular outcome, represented by a **continuous dependent/target variable**.   


#### Goal 

A function that 'mimics' or 'models' this relationship, so that when new observations are available, predictions of the output, or dependent variable can be made by computing the function with the new input variables.   


#### Assumptions

The dependent variable is continuous and a linear function (at least to some approximation) of the independent variables. 

> $y = c_{0} + c_{1}x_{1} + c_{2}x_{2} + c_{3}x_{3} + … + c_{n}x_{n}$


#### How

The algorithm attempts to find the “best” choices of values for the parameters, which in a linear regression model are the coefficients, $c_{i}$, in order to make the formula as “accurate” as possible.  Once estimated, the parameters (intercept and coefficients) allow the value of the dependent variable to be obtained from the values of the independent variables. 

## Univariate Linear Regression

### Single Independent Variable

- **Number of independent/feature variables:** 1

- **Goal:** minimize the error between the actual values and the estimated values.  

- **Parameters:**  slope, $\alpha$, y-intercept, $\beta$

- **Function:** $y_{i} = \beta x_{i} + \alpha + \epsilon_{i}$, where $\epsilon$ is the error term

![univariate_bestfitline.png](univariate_bestfitline.png)

image source:  https://towardsdatascience.com/polynomial-regression-bbe8b9d97491

## Multi-Variate LInear Regression 

### Multiple Independent Variables

Models the relationship between **multiple** independent input variables (feature variables) and an output dependent variable. The model remains linear in that the output is a linear combination of the input variables.  
![multivariate.png](multivariate.png)

image source:  http://nbviewer.ipython.org/urls/s3.amazonaws.com/datarobotblog/notebooks/multiple_regression_in_python.ipynb


## Polynomial Regression

Models the non-linear relationship of the independent variables and the dependent variable. For example, the relationship could follow a sine, cosine, exponential, logrithmic, or quadratic function, to name a few.  This is still considered to be linear model as the coefficients/weights associated with the features are still linear. While the curve we are fitting may be quadratic in nature. x² is only a feature. We must convert the original features into their higher order terms. (In Python, we can use the PolynomialFeatures class provided by scikit-learn, and then train the model using Linear Regression.)

![polynomial_regression.png](polynomial_regression.png)

image source:  https://towardsdatascience.com/polynomial-regression-bbe8b9d97491

## Evaluation of regression models

### Sum of Squared Error

a.k.a. the residual sum of squares, the sum of squared residuals.  It is a measure of the discrepancy between the data and an estimation model. A small RSS indicates a tight fit of the model to the data. 

### Coefficient of Determination

aka R-Squared, measures the fraction of the total variation in the dependent variable that is captured by the model. 

## The Curse of Dimensionality
As the dimensionality of the feature space increases, the number of configurations can grow exponentially, and thus the number of configurations covered by an observation decreases.  
This is visualized below, showing fewer observations per region as dimensionality increases.

![curse_of_dimensionality.png](curse_of_dimensionality.png)

image source:  https://www.kdnuggets.com/2015/03/deep-learning-curse-dimensionality-autoencoders.html/2

## Common Regression Algorithms

1. Ordinary Least Squares
2. Stepwise Regression

### Ordinary Least Squares  

OLS minimizes the sum of the squares error to find the best parameters (line intercept and coefficients) given a series of X's and Y's.  
The algorithm chooses the parameters of a linear function by minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and the predicted value. 
This is known as the 'least squares' principle. 

#### Pros

1. Fast to model 
2. Useful when the relationship to be modeled is not extremely complex
3. Useful also when you don’t have a lot of data.
4. Simple to understand and explain to stakeholders which can be very valuable for business decisions.

#### Cons

1. Must be univariate, that is, a single independent variables and single dependent variables. Generalized Linear Model is the substitute algorithm in those cases.  
2. Very sensitive to Outliers. It can terribly affect the regression line and eventually the forecasted values.


![ordinary_least_squares.jpeg](ordinary_least_squares.jpeg)


### Stepwise Regression

Stepwise regression is an appropriate analysis when you have many variables and you’re interested in identifying a useful subset of the predictors. It is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some prespecified criterion

#### Options

1. **Forward selection**:  The algorithm begins with predictors in the model and adds the most significant variable for each step. It stops when all variables not in the model have p-values that are greater than the specified Alpha-to-Enter value.
2. **Backward elimination**:  The algorithm begins with all predictors in the model and removes the least significant variable for each step. It stops when all variables in the model have p-values that are less than or equal to the specified Alpha-to-Remove value.

#### Pros

1. The stepwise approach is much faster , it's less prone to overfit the data, you often learn something by watching the order in which variables are removed or added, and it doesn't tend to drown you in details of rankings data that cause you to lose sight of the big picture.
2. It can take into account all the predictors, as opposed to analyzing each predictor separately.   

#### Cons

1. If two independent variables are highly correlated, only one may end up in the model even though both may be important.
2. Risk of overfitting
3. No algorithm can take into account special knowledge the data scientist or analyst may have about the data. Therefore, the model selected may not be the most practical one.
4. It is important to note that charting the individual predictors against the response is often misleading because these do not account for other predictors in the model.


### Other Regression Algorithms

1. Lasso
2. Elastic Net
3. Ridge Regression