# Spot-Check Regression Algorithms
### Starting with four linear machine learning algorithms:
- Linear Regression.
- Ridge Regression.
- LASSO Linear Regression.
- Elastic Net Regression.
### three nonlinear machine learning algorithms:
- k-Nearest Neighbors.
- Classification and Regression Trees.
- Support Vector Machines.

Note that mean squared error values are inverted (negative). This is a quirk of the cross val score() function used that requires all algorithm metrics to be sorted in ascending order (larger value is better).

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression

names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
df = pd.read_csv('housing.data',names=names,delim_whitespace=True)
# separate array into input and output components
X = df.drop('MEDV',axis='columns')
Y = df['MEDV']

In [2]:
kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

final = []
final.append(np.sqrt(np.abs(results.mean())))

-34.7052559445


## Ridge Regression
Ridge regression is an extension of linear regression where the loss function is modified to minimize the complexity of the model measured as the sum squared value of the coefficient values (also called the L2-norm).

https://www.datasciencecentral.com/profiles/blogs/intuition-behind-bias-variance-trade-off-lasso-and-ridge

In [4]:
from sklearn.linear_model import Ridge
model = Ridge()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

final.append(np.sqrt(np.abs(results.mean())))

-34.0782462093


## LASSO Regression
The Least Absolute Shrinkage and Selection Operator (or LASSO for short) is a modification of linear regression, like ridge regression, where the loss function is modified to minimize the complexity of the model measured as the sum absolute value of the coefficient values (also called the L1-norm).

In [5]:
from sklearn.linear_model import Lasso
model = Lasso()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

final.append(np.sqrt(np.abs(results.mean())))

-34.4640845883


## ElasticNet Regression
ElasticNet is a form of regularization regression that combines the properties of both Ridge Regression and LASSO regression. It seeks to minimize the complexity of the regression model (magnitude and number of regression coefficients) by penalizing the model using both the L2-norm (sum squared coefficient values) and the L1-norm (sum absolute coefficient values).

In [6]:
from sklearn.linear_model import ElasticNet
model = ElasticNet()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

final.append(np.sqrt(np.abs(results.mean())))

-31.1645737142


# Nonlinear Machine Learning Algorithms
## K-Nearest Neighbors
- The k-Nearest Neighbors algorithm (or KNN) locates the k most similar instances in the training dataset for a new data instance.
- From the k neighbors, a <span class="girk">mean or median output variable is taken as the prediction.</span>
- Of note is the distance metric used (the metric argument). The <span class="mark">Minkowski distance is used by default</span>, which is a <span class="mark">generalization</span> of both the <span class="burk">Euclidean distance</span> (used when all inputs have the same scale) and <span class="burk">Manhattan distance</span> (for when the scales of the input variables differ).

In [7]:
from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

final.append(np.sqrt(np.abs(results.mean())))

-107.28683898


## Classification and Regression Trees
Decision trees or the Classification and Regression Trees (CART as they are known) use the training data to select the best points to split the data in order to minimize a cost metric. The <span class="burk">default cost metric</span> for regression decision trees is the <span class="girk">mean squared error,</span> specified in the criterion parameter.

In [8]:
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

final.append(np.sqrt(np.abs(results.mean())))

-43.4218278431


## Support Vector Machines
Support Vector Machines (SVM) were developed for binary classification. The technique has been extended for the prediction real-valued problems called Support Vector Regression (SVR). Like the classification example, SVR is built upon the LIBSVM library.

In [9]:
from sklearn.svm import SVR
model = SVR()
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

final.append(np.sqrt(np.abs(results.mean())))

-91.0478243332


In [10]:
final

[5.8911166975815883,
 5.8376575960961654,
 5.8706119432561987,
 5.5825239555464305,
 10.357936038632028,
 6.589524098380493,
 9.5418983610833052]

In [11]:
mdls = ['Linear', 'Ridge', 'LASSO', 'ElasticNet', 'knn', 'CART', 'svm']

In [13]:
resdict = pd.DataFrame.from_dict(dict(zip(mdls,final)),orient='index')
resdict.sort_values(0)

Unnamed: 0,0
ElasticNet,5.582524
Ridge,5.837658
LASSO,5.870612
Linear,5.891117
CART,6.589524
svm,9.541898
knn,10.357936


ElasticNet did good, bcoz it takes careoff both l1,l2 regularization