# What is Regression Analysis

Form of predictive modelling technique which investigates the relationship between a dependent (target) and independent (predictor) variables. Can be used for forecasting, time series modelling, and finding the causal effect relationship between variables.

# Why do we use it?

It can indicate the significant relationships between dependent and indepent variables

It indicates the strength of impact of multiple independent variables on a dependent variable.

Also allows us to compare the effects of variables measured on different scales ie. effect of price changes and the number of promotional activities.

These benefits help to eliminate and evaluate the best set of variables to be used for building predictive models.

# 1. Linear Regression

Establishes a relationship between dependent variable (y) and one or more independent variables (X) using a BEST FIT STRAIGHT LINE (AKA Regression line)

Equation:
y = a + bx + 3

Where a is the intercept, b is the slope of the line, and e is the error term.

The Difference between a simple linear regression and multiple linear regression is that, multiple linear regression has (>1) independent variables, whereas simple linear regression has only 1 independent variable.

How do we obtain a best fit line?
By using the least square method, which calculates the best fit line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line

We can then evaluate the model performance by using the metric R-square

IMPORT POINTS
- There must be a linear relationship between independent and dependent variables.
- Multiple regression may suffer from multicollinearity, autocorrelation, heteroskedasticity
- Linear Regression is very sensitive to outliers which can terribly affect the regression line and eventually the forecasted values.
- Multicollinearity can increase the variance of the coefficient estimates and make the estimates very sensitive to minor changes in the model. The result is that the coefficient estimates are unstable.

# Logistic Regression

Used to find the probability of event-Success and event-Failure - ie. the dependent variable is binary in nature. The parameters are chosen to maximize the likelihood of observing the sample values rather than minimizing the sum of squared errors like in ordinary regression.

IMPORT POINTS
- Widely used for classification problems
- Doesn't require linear relationships between dependent and independent variables. It can handle various types of relationships because it applies a non-linear log transformation to the predicted odds ratio
- To avoid over fitting and under fitting, we should include all significant variables. A good approach to ensure this practice is to use a step wise method to estimate the logistic regression.
- It requires LARGE SAMPLE SIZES because maximum likelihood estimates are less powerful at low samples sizes than ordinary least square
- The independent variables should not be correlated with each other ie. NO MULTICOLLINEARITY. However, we have the options to include interaction effects of categorical variables in the analysis and in the model.
- If the values of dependent variables is ordinal, then it is called as Ordinal logistic regression
- If dependent variable is multi class then it is known as Multinomial Logistic Regression

# Polynomial Regression

A regression equation is a polynomial regression equation if the power of the independent variable is more than 1

Equation:
y = a + bx^2

In regression technique, the best fit line is not a straight line but rather a curve that fits into the data points.

IMPORTANT POINTS
- While it may be tempting to fit a higher degree polynomial to get lower error, this can result in over-fitting.
- Always plot the relationships to see the fit and focus on making sure that the curve fits the nature of the problem
- Especially look out for  curve towards the end and see whether those shapes and trends make sense. Higher polynomials can end up producing weird results on extrapolation.

# Stepwise Regression

This form of regression is used when we deal with multiple independent variables. In this technique, the selection of independent variables is done with the help of an automatic process which involves NO HUMAN INTERVENTION

This is done by observing statistical values like R-square, t-stats, and AIC metric to discern significant variables.

Basically fits the regression model by adding/dropping co-variates one at a time based on specified criterion. 

Standard Stepwise Regression
- adds and removes predictors as needed for each step

Forward Selection
- starts with the most significant predictor in the model and adds a variable for each step

Backward elimination
- starts with all predictors in the model and removes the least significant variable in each step

The aim of this modeling technique is to maximize the prediction power with minimum number of predictor variables. It is one of the methods used to handle higher dimensionality of a data set.

# Ridge Regression

Technique used when data suffers from multicollinearity (independent variables are highly correlated).

In multicollinearity, even though the least squares estimates (OLS) are unbiased, their variances are large which deviates the observed value far from the true value. By adding a degree of bias to the regression estimates, ridge regression reduces the standard errors.

Equation:
y = a + b1x1 + b2x2 + ... + e, for multiple independent variables where e is the error term/value needed to correct for a prediction error between the observed and predicted value

In a linear equation, prediction errors can be decomposed into two sub components. First is due to bias and the second due to variance. Prediction error can occur due to any one of these two or both components. 

Ridge regression solves the multicollinearity problem through shrinkage parameter

IMPORTANT POINTS
- The assumptions of this regression is the same as least squared regression except normality is not to be assumed.
- It shrinks the value of coefficients but doesn't reach zero, which suggests NO FEATURE SELECTION FEATURE
- This is a regularization method and uses l2 regularization

# Lasso Regression

Similar to Ridge Regression, Lasso (Least Absolute Shrinkage and Selection Operator) also penalizes the absolute size of the regression coefficients. In addition, it is capable of reducing the variability and improving the accuracy of linear regression models. 

Lasso Regression DIFFERS from ridge regression in a way that it uses absolute values in the penalty function, instead of squares. This leads to penalizing (or equivalently constraining the sum of the absolute values of the estimates) values which causes some of the parameter estimates to turn out exactly zero. The larger the penalty applied, the further the estimates get shrunk towards absolute zero. This results to variable selection of of given n variables

IMPORTANT POINTS
- The assumptions of this regression are the same as least squared regression except normality is not to be assumed.
- It shrinks coefficients to zero (exactly zero) which helps in feature selection
- This is a regularization method and uses l1 regularization
- If a group of predictors are highly correlated, lasso picks only one of them and shrinks the others to zero

# ElasticNet Regression

Hybrid of Lasso and Ridge Regression Techniques. It is trained with L1 and L2 prior as a regularizer.

Useful when there are multiple features which are correlated. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both

A practical advantage of trading off between Lasso and Ridge is that it allows Elastic-Net to inherit some of Ridge's stability under rotation

IMPORTANT POINTS
- Encourages group effect in case of highly correlated variables
- There are no limitations on the number of selected variables
- It can suffer with double shrinkage

# Other Models to explore
Bayesian Regression
Ecological Regression
Robust Regression

# How to select the right regression model?

Below are Key Factors that you should practice to select the right regression model

1. Data Exploration should always be the first step before selecting the right model to identify the relationship and impact of variables

2. To compare the goodness of fit for different models, we can analyse different metrics like statistical significance of parameters, R-square, Adjusted r-quare, AIC, BIC, and error term.

3. CROSS-VALIDATION is the best way to evaluate models used for prediction. Here you divide your data set into two groups (training and testing). A simple mean squared difference between the observed and predicted values gives you a measure for prediction accuracy

4. If your data set has multiple confounding variables, you should NOT choose an automatic model selection method because you do not want to put these in a model at the same time.

5. It'll also depend on your objective. It can occur that a less powerful model is easy to implement as compared to a highly statistically significant model.

6. Regression regularization methods (Lasso, Ridge, and ElasticNet) works well in case of high dimensionality and multicollinearity among the variables in the data set.

