## Spot-Check Regression Algorithms

In [50]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [51]:
dataset = pd.read_csv('dataset/boston_housing.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [52]:
from sklearn.model_selection import KFold, cross_val_score
kfold = KFold(n_splits=10, shuffle=True, random_state=42)


### Linear Machine Learning Algorithms

#### Linear Regression
Assumed that input variables have a Gaussian distribution and they are relevant to the output

In [53]:
from sklearn.linear_model import LinearRegression
model = LinearRegression()

kf = KFold(n_splits=10, shuffle=True, random_state=42)
scoring = 'neg_mean_squared_error'
scores = cross_val_score(model, X, y, cv=kf, scoring=scoring)
scores.mean()

-23.364203007530854

#### Ridge Regression
It's and extension of linear regression where the loss function is modified to minimize the complexity of the model  measured as the sum squared value of the coe coefficient values

In [54]:
from sklearn.linear_model import Ridge
model = Ridge()

scoring = 'neg_mean_squared_error'
scores = cross_val_score(model, X, y, cv=kf, scoring=scoring)
scores.mean()

-23.55453050994862

#### LASSO Regression
The Least Absolute Shrinkage and Selection Operator (or LASSO for short) is a modification of linear regression, like ridge regression, where the loss function is modified to minimize the complexity of the model measured as the sum absolute value of the coefficient values 

In [55]:
from sklearn.linear_model import Lasso
model = Lasso()

scoring = 'neg_mean_squared_error'
scores = cross_val_score(model, X, y, cv=kf, scoring=scoring)
scores.mean()

-28.235449776422662

#### ElasticNet Regression
A form of regularization regression that combines the properties of both Ridge Regression and LASSO regression. 

In [56]:
from sklearn.linear_model import ElasticNet
model = ElasticNet()

scoring = 'neg_mean_squared_error'
scores = cross_val_score(model, X, y, cv=kf, scoring=scoring)
scores.mean()

-27.760635084781303

### Nonlinear Machine Learning Algorithms

#### K-Nearest Neighbors
locates the k most similar instances and consider their average

In [57]:
from sklearn.neighbors import KNeighborsRegressor
model = KNeighborsRegressor()

scoring = 'neg_mean_squared_error'
scores = cross_val_score(model, X, y, cv=kfold, scoring=scoring)
scores.mean()

-38.681226933333335

#### Regression Tree
split the data in order of minimize the cost metric.

In [58]:
from sklearn.tree import DecisionTreeRegressor
model = DecisionTreeRegressor()

scoring = 'neg_mean_squared_error'
scores = cross_val_score(model, X, y, cv=kfold, scoring=scoring)
scores.mean()

-19.115754117647057

#### Support Vector Machines

In [60]:
from sklearn.svm import SVR
model = SVR()

scoring = 'neg_mean_squared_error'
scores = cross_val_score(model, X, y, cv=kfold, scoring=scoring)
scores.mean()

-67.56030921170223