# Regression

In statistical modeling, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). More specifically, regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Most commonly, regression analysis estimates the conditional expectation of the dependent variable given the independent variables – that is, the average value of the dependent variable when the independent variables are fixed. Less commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution.

Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables. However this can lead to illusions or false relationships, so caution is advisable;[2] for example, correlation does not imply causation.

----------------------------------------------------------------------------------------------------------------------

##### Ordinary Least Squares Regression (OLSR) - 

Pros: 

Cons: 

##### Linear Regression - 

Pros: 

Cons: 

##### Binary Logistic Regression - 

Pros: 

Cons: 

##### Multinomial Logistic Regression - 

Pros: 

Cons: 

##### Stepwise Regression - 

Pros: 

Cons: 

##### Multivariate Adaptive Regression Splines (MARS) - 

Pros: 

Cons: 

##### Locally Estimated Scatterplot Smoothing (LOSS) - 

Pros: 

Cons: 

##### Jackknife Regression - 

Pros: 

Cons: 

----------------------------------------------------------------------------------------------------------------------


## -------- Ordinary Least Squares Regression (OLSR)

#### Wiki Definitation: 
http://www.theanalysisfactor.com/interpreting-regression-coefficients/
https://en.wikipedia.org/wiki/Linear_regression

In statistics, ordinary least squares (OLS) or linear least squares is a method for estimating the unknown parameters in a linear regression model, with the goal of minimizing the sum of the squares of the differences between the observed responses in the given dataset and those predicted by a linear function of a set of explanatory variables (visually this is seen as the sum of the vertical distances between each data point in the set and the corresponding point on the regression line – the smaller the differences, the better the model fits the data).
#### Input Data: 
X(Numeric)/X(Categorical)
#### Initial Parameters: 
NA
#### Cost Function: 
OLS (Ordinary Least Squared) Estimator
#### Process Flow: 
Adjust parameters to minimize the residuals (Y-Ypred)^2; Fitting a straight line
#### Evaluation Methods: 

#### Tips: 


In [None]:
# ------------------------------------- R Code

# https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html
lm(formula = y ~ x1 + x2, data = data)

In [None]:
# ------------------------------------- Python Code
# http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=0ahUKEwjAzf6q39jRAhXK5IMKHbsDDjIQFggaMAA&url=http%3A%2F%2Fscikit-learn.org%2Fstable%2Fauto_examples%2Flinear_model%2Fplot_ols.html&usg=AFQjCNGznqftMY9pFJOOT9Agq2t9HqMubw&bvm=bv.144686652,d.amc
# Estimate coefficient of a regression model 
from sklearn.linear_model import LinearRegression 
slr = LinearRegression() 
slr.fit(x, y) 
slr.coef_ # all parameters 
slr.intercept_ # intercept

# How to evaluate a regression model 
# -- Residual plots -> non-linearity / outliers (Nonlinear pattern? center by y = 0?) 
plt.scatter(Y_train_pred, Y_train_pred - Y_train,  
 	        c='bule', marker='o', label='Training data') 
plt.scatter(Y_test_pred, Y_test_pred - Y_test,  
 	        c='lightgreen', marker='s', label='Test data') 
plt.xlabel('Predicted Values') 
plt.ylabel('Residuals') 
plt.legend(loc='upper left') 
plt.hlines(y=0, xmin=-10, xmax=50, lw=2, color='red') 
plt.xlim([-10, 50]) 
plt.show() 
 
# -- Use MSE(Mean Squared Error) 
from sklearn.metrics import mean_squared_error 
mean_squared_error(Y_train, Y_train_pred) # Trainning MSE 
mean_squared_error(Y_test, Y_test_pred) # Test MSE 

# -- R square 
from sklearn.metrics import r2_score 
r2_score(Y_train, Y_train_pred) 
r2_score(Y_test, Y_test_pred)


## -------- Linear Regression

#### Wiki Definitation: 
http://www.theanalysisfactor.com/interpreting-regression-coefficients/
https://en.wikipedia.org/wiki/Linear_regression

On the other hand, linear regression is a statistical inference problem. The "y values" take on the interpretation of data you wish to model, and the "x values" take on the interpretation of extra information you have about each data point that might be helpful in predicting their "y values". You are trying to build a probabilistic model that describes "y" while taking into account "x", and a linear model is one of many ways to do this. A linear model assumes that "y" has a different mean for each possible value of "x", and that these means happen to follow a straight line with a certain intercept and a certain slope. As with any statistical inference problem, you estimate the unknown parameters using maximum likelihood estimation. But since in this case the unknown parameters are an intercept and a slope, the end result of maximum likelihood estimation is basically that you are choosing a straight line that fits the observed data best, so this essentially becomes the curve fitting problem discussed above.
#### Input Data: 
X(Numeric)/X(Categorical)
#### Initial Parameters: 
NA
#### Cost Function: 
Maximum Likelihood Estimator
#### Process Flow: 
When solving the statistical linear regression problem, a very common modeling assumption is that for every possible value of "x", the quantity "y" is normally distributed with a mean that is linear in "x". Therefore, the likelihood function is essentially a product of PDFs of the normal distribution. As stated above, you estimate the unknown parameters (and therefore find the best fitting line) by maximizing the likelihood function. If you look at what the product of normal PDFs looks like, you will notice that maximizing this expression happens to be equivalent to... you guessed it... minimizing the sum of squared errors. That is, the line you get performing curve fitting via least squares is equivalent to the line you get performing linear regression using a normal model.
#### Evaluation Methods: 

#### Tips: 

In [None]:
# ------------------------------------- R Code

# Same Above


In [None]:
# ------------------------------------- Python Code

# Same Above


## -------- Binary Logistic Regression

#### Wiki Definitation: 
In statistics, logistic regression, or logit regression, or logit model[1] is a regression model where the dependent variable (DV) is categorical. This article covers the case of a binary dependent variable—that is, where it can take only two values, "0" and "1", which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. Binary Logistic Regression is a special type of regression where binary response variable is related to a set of explanatory variables, which can be discrete and/or continuous. The important point here to note is that in linear regression, the expected values of the response variable are modeled based on combination of values taken by the predictors. In logistic regression Probability or Odds of the response taking a particular value is modeled based on combination of values taken by the predictors. Like regression (and unlike log-linear models that we will see later), we make an explicit distinction between a response variable and one or more predictor (explanatory) variables.
#### Input Data: 
X(Numeric)/X(Categorical)
#### Initial Parameters: 
NA
#### Cost Function: 
OLS (Ordinary Least Squared) Estimator
#### Process Flow: 

#### Evaluation Methods: 

#### Tips: 

In [None]:
# ------------------------------------- R Code

# https://stat.ethz.ch/R-manual/R-patched/library/stats/html/glm.html
train <- data[1:800,]
test <- data[801:889,]
model <- glm(Survived ~.,family=binomial(link='logit'),data=train)
summary(model)

In [None]:
# ------------------------------------- Python Code

# http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=0ahUKEwiYl5aW4djRAhVs3IMKHXMkB80QFggaMAA&url=http%3A%2F%2Fscikit-learn.org%2Fstable%2Fmodules%2Fgenerated%2Fsklearn.linear_model.LogisticRegression.html&usg=AFQjCNGpSyUzpbaClG8IQEPJmB63CQZlrg&bvm=bv.144686652,d.eWE
# load lib {numpy, sklearn} 
from sklearn import datasets 
import numpy as np 
 
# loading datasets 
isir = datasets.load_iris() 
X = iris.data[:,[2,3]] 
y = iris.target 
# Spliting datasets 
from sklearn.cross_validation import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) 
# Scaling data 
from sklearn.preprocessing import StandardScaler 
sc = StandardScaler() # - define scaler object 
sc.fit(X_train) # fit the object with data to get meansure 
X_train_std = sc.transform(X_train) # scale data 
X_test_std = sc.transform(X_test) # scale data 
# Fitting Model 
from sklearn.linear_model import LogisticRegression 
lr = LogisticRegression(C=1000.0, random_state=0) # C {penality parameter} 
lr.fit(X_train_std, y_train) 
lr.predict_proba(X_test_std[0,:]) # P() of predict on one sample 

" array([[ 0.000, 0.063, 0.937 ]]) " # three classes p()s

## -------- Multinomial Logistic Regression

#### Wiki Definitation: 
https://en.wikipedia.org/wiki/Multinomial_logistic_regression

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes.[1] That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which may be real-valued, binary-valued, categorical-valued, etc.). Multinomial logistic regression is used when the dependent variable in question is nominal (equivalently categorical, meaning that it falls into any one of a set of categories which cannot be ordered in any meaningful way) and for which there are more than two categories. 
#### Input Data: 
X(Numeric)/X(Categorical)
#### Initial Parameters: 

#### Cost Function: 

#### Process Flow: 

#### Evaluation Methods: 

#### Tips: 

In [None]:
# ------------------------------------- R Code

# http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=0ahUKEwims-Xa4djRAhWH3oMKHXyIApEQFggaMAA&url=http%3A%2F%2Fwww.ats.ucla.edu%2Fstat%2Fr%2Fdae%2Fmlogit.htm&usg=AFQjCNH4Bx0iybTQM0u6mYPcfypfB050Eg


In [None]:
# ------------------------------------- Python Code

# http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=0ahUKEwiYl5aW4djRAhVs3IMKHXMkB80QFggaMAA&url=http%3A%2F%2Fscikit-learn.org%2Fstable%2Fmodules%2Fgenerated%2Fsklearn.linear_model.LogisticRegression.html&usg=AFQjCNGpSyUzpbaClG8IQEPJmB63CQZlrg&bvm=bv.144686652,d.eWE
# load lib {numpy, sklearn} 
from sklearn import datasets 
import numpy as np 
 
# loading datasets 
 isir = datasets.load_iris() 
X = iris.data[:,[2,3]] 
y = iris.target 
# Spliting datasets 
from sklearn.cross_validation import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0) 
# Scaling data 
from sklearn.preprocessing import StandardScaler 
sc = StandardScaler() # - define scaler object 
sc.fit(X_train) # fit the object with data to get meansure 
X_train_std = sc.transform(X_train) # scale data 
X_test_std = sc.transform(X_test) # scale data 
# Fitting Model 
from sklearn.linear_model import LogisticRegression 
lr = LogisticRegression(C=1000.0, random_state=0) # C {penality parameter}  / Find ‘multinomial’ option
lr.fit(X_train_std, y_train) 
lr.predict_proba(X_test_std[0,:]) # P() of predict on one sample 
" array([[ 0.000, 0.063, 0.937 ]]) " # three classes p()s

## -------- Multivariate Adaptive Regression Splines (MARS)

#### Wiki Definitation: 
In statistics, multivariate adaptive regression splines (MARS) is a form of regression analysis introduced by Jerome H. Friedman in 1991.[1] It is a non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions between variables.
#### Input Data: 
X(Numeric)/X(Categorical)
#### Initial Parameters: 

#### Cost Function: 

#### Process Flow: 
MARS builds a model in two phases: the forward and the backward pass. This two-stage approach is the same as that used by recursive partitioning trees.
The forward pass:
MARS starts with a model which consists of just the intercept term (which is the mean of the response values). MARS then repeatedly adds basis function in pairs to the model. At each step it finds the pair of basis functions that gives the maximum reduction in sum-of-squares residual error (it is a greedy algorithm). The two basis functions in the pair are identical except that a different side of a mirrored hinge function is used for each function. Each new basis function consists of a term already in the model (which could perhaps be the intercept term) multiplied by a new hinge function. A hinge function is defined by a variable and a knot, so to add a new basis function, MARS must search over all combinations of the following:

1 existing terms (called parent terms in this context)

2 all variables (to select one for the new basis function)

3 all values of each variable (for the knot of the new hinge function).

To calculate the coefficient of each term MARS applies a linear regression over the terms.
This process of adding terms continues until the change in residual error is too small to continue or until the maximum number of terms is reached. The maximum number of terms is specified by the user before model building starts.
The search at each step is done in a brute force fashion, but a key aspect of MARS is that because of the nature of hinge functions the search can be done relatively quickly using a fast least-squares update technique. Actually, the search is not quite brute force. The search can be sped up with a heuristic that reduces the number of parent terms to consider at each step ("Fast MARS" [4]).

The backward pass:
The forward pass usually builds an overfit model. (An overfit model has a good fit to the data used to build the model but will not generalize well to new data.) To build a model with better generalization ability, the backward pass prunes the model. It removes terms one by one, deleting the least effective term at each step until it finds the best submodel. Model subsets are compared using the GCV criterion described below.

The backward pass has an advantage over the forward pass: at any step it can choose any term to delete, whereas the forward pass at each step can only see the next pair of terms.
The forward pass adds terms in pairs, but the backward pass typically discards one side of the pair and so terms are often not seen in pairs in the final model. A paired hinge can be seen in the equation for y ^ in the first MARS example above; there are no complete pairs retained in the ozone example.
#### Evaluation Methods: 

#### Tips: 

In [None]:
# ------------------------------------- R Code

# https://cran.r-project.org/web/packages/earth/earth.pdf
Earth(y ~ x)

In [None]:
# ------------------------------------- Python Code

# https://github.com/scikit-learn-contrib/py-earth
git clone git://github.com/scikit-learn-contrib/py-earth.git
cd py-earth
sudo python setup.py install

import numpy
from pyearth import Earth
from matplotlib import pyplot

#Create some fake data
numpy.random.seed(0)
m = 1000
n = 10
X = 80*numpy.random.uniform(size=(m,n)) - 40
y = numpy.abs(X[:,6] - 4.0) + 1*numpy.random.normal(size=m)

#Fit an Earth model
model = Earth()
model.fit(X,y)

#Print the model
print(model.trace())
print(model.summary())

#Plot the model
y_hat = model.predict(X)
pyplot.figure()
pyplot.plot(X[:,6],y,'r.')
pyplot.plot(X[:,6],y_hat,'b.')
pyplot.xlabel('x_6')
pyplot.ylabel('y')
pyplot.title('Simple Earth Example')
pyplot.show()

## -------- Locally Estimated Scatterplot Smoothing (LOSS) 

#### Wiki Definitation: 
https://en.wikipedia.org/wiki/Local_regression

LOESS and LOWESS (locally weighted scatterplot smoothing) are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. "LOESS" is a later generalization of LOWESS; although it is not a true initialism, it may be understood as standing for "LOcal regrESSion".
#### Input Data: 
X(Numeric)/X(Categorical)
#### Initial Parameters: 
NA
#### Cost Function: 

#### Process Flow: 
LOESS and LOWESS thus build on "classical" methods, such as linear and nonlinear least squares regression. They address situations in which the classical procedures do not perform well or cannot be effectively applied without undue labor. LOESS combines much of the simplicity of linear least squares regression with the flexibility of nonlinear regression. It does this by fitting simple models to localized subsets of the data to build up a function that describes the deterministic part of the variation in the data, point by point. In fact, one of the chief attractions of this method is that the data analyst is not required to specify a global function of any form to fit a model to the data, only to fit segments of the data.
#### Evaluation Methods: 

#### Tips: 

In [None]:
# ------------------------------------- R Code

# https://stat.ethz.ch/R-manual/R-devel/library/stats/html/loess.html
loess(formula, data, weights, subset, na.action, model = FALSE,
      span = 0.75, enp.target, degree = 2,
      parametric = FALSE, drop.square = FALSE, normalize = TRUE,
      family = c("gaussian", "symmetric"),
      method = c("loess", "model.frame"),
      control = loess.control(...), ...)



In [None]:
# ------------------------------------- Python Code

# http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=0ahUKEwijq5245djRAhWk0YMKHdnjA9MQFggmMAI&url=http%3A%2F%2Fstackoverflow.com%2Fquestions%2F36252434%2Fpredicting-on-new-data-using-locally-weighted-regression-loess-lowess&usg=AFQjCNEcjlI4A3t8r06qIerqkcQn--gmpA


## -------- Jackknife Regression

#### Wiki Definitation: 
In statistics, the jackknife is a resampling technique especially useful for variance and bias estimation. The jackknife predates other common resampling methods such as the bootstrap. The jackknife estimator of a parameter is found by systematically leaving out each observation from a dataset and calculating the estimate and then finding the average of these calculations. Given a sample of size N   {\displaystyle N}  N, the jackknife estimate is found by aggregating the estimates of each N-1-sized sub-sample. Perform OLS regression on those estimators.
#### Input Data: 
X(Numeric)/X(Categorical)
#### Initial Parameters: 
NA
#### Cost Function: 

#### Process Flow: 

#### Evaluation Methods: 

#### Tips: 

In [None]:
# ------------------------------------- R Code



In [None]:
# ------------------------------------- Python Code

