# Spot-Checking Regression Algorithms

Same as we did for CLASSIFICATION algos, now for REGRESSION algos. 

# Algorithms overview

In this lesson we are going to take a look at 7 regression algorithms that you can spot-check on your dataset. We start with 4 linear ML algorithms:

* Linear Regression
* Ridge Regression
* LASSO Linear Regression
* Elastic Net Regression

Then, we will look at 3 nonlinear ML algorithms:

* k-Nearest Neighbors
* Classification and Regression Trees
* Support Vector Machines and Support Vector Regression

Each recipe is demonstrated on the housing dataset. This is a regression problem where all attributes are numeric. A test harness with 10-fold cross-validation is used to demonstrate how to spot-check each ML algorithm. Mean squared error measures are used to indicate algorithm performance. 

NOTE: mean squared error values are inverted (negative). This is a quirk of the $cross_val_score()$ function used that requires all algorithm metrics to be sorted in ascending order (larger value is better). 

*DISCLAIMER: The recipes assume that you know about each ML algorithm and how to use them. We will not go into the API or parameterization of each algorithm.*

# Linear ML Algorithms

## Linear Regression

You can construct a linear regression model using the LinearRegression class, documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)    

In [1]:
from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

In [2]:
from sklearn.linear_model import LinearRegression                              # <---

In [3]:
# input data
filename = 'housing.data.csv'
names=['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO',
'B', 'LSTAT', 'MEDV']
dataframe = read_csv(filename, delim_whitespace=True, names=names)
array = dataframe.values
X = array[:,0:13]
Y = array[:,13]

# Linear Regression
kfold = KFold(n_splits=10, random_state=7)
model = LinearRegression()                              # <---
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-34.7052559445


## Ridge Regression

You can construct a ridge regression model by using the Ridge class, documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html).

In [4]:
from sklearn.linear_model import Ridge                              # <---

In [5]:
# Linear Regression
kfold = KFold(n_splits=10, random_state=7)
model = Ridge()                              # <---
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-34.0782462093


## LASSO Regression

You can construct a LASSO model by using the Lasso class [here](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html).

In [6]:
from sklearn.linear_model import Lasso                              # <---

In [7]:
# Lasso Regression
kfold = KFold(n_splits=10, random_state=7)
model = Lasso()                              # <---
scoring =  'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-34.4640845883


## ElasticNet Regression

You can construct an ElasticNet model using the ElasticNet class, documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html).

In [8]:
from sklearn.linear_model import ElasticNet                              # <---

In [9]:
# ElasticNet Regression
kfold = KFold(n_splits=10, random_state=7)
model = ElasticNet()                              # <---
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-31.1645737142


# Non-linear ML Algorithms

## k-Nearest Neighbors (kNN)

You can construct a KNN model for regression using the KNeighborsRegressor class, documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor. html).

In [11]:
from sklearn.neighbors import KNeighborsRegressor                              # <--- note! Regressor now!

In [12]:
# KNN Regression
kfold = KFold(n_splits=10, random_state=7)
model = KNeighborsRegressor()                              # <---
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-107.28683898


## Classification and Regression Trees

You can create a CART model for regression using the DecisionTreeRegressor class, documented [here](6http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html).

In [13]:
from sklearn.tree import DecisionTreeRegressor                              # <---

In [14]:
# Decision Tree Regression
kfold = KFold(n_splits=10, random_state=7)
model = DecisionTreeRegressor()                              # <---
scoring =  'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-36.2850470588


## Support Vector Machines and Support Vector Regression

You can create an SVM model for regression using the SVR class, documented [here](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html).

In [15]:
from sklearn.svm import SVR                              # <---

In [16]:
# SVM Regression
kfold = KFold(n_splits=10, random_state=7)
model = SVR()                              # <---
scoring = 'neg_mean_squared_error'
results = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)
print(results.mean())

-91.0478243332


## Summary

What we did:

* we discovered how to spot-check ML algorithms for regression problems in Python using scikit-learn. Specifically, you learned about 4 linear machine learning algorithms (Linear Regression, Ridge Regression, LASSO Linear Regression and Elastic Net Regression) as well as 3 nonlinear algorithms (k-Nearest Neighbors, Classification and Regression Trees, SVM/SVR).

## What's next 

Now that you know how to use classification and regression algorithms, you need to know how to compare the results of different algorithms to each other. We will now discover how to design simple experiments to directly compare ML algorithms to each other on your dataset.