### Boston Dataset used for Linear Regression, Scaling, K-Folds, Support Vector Regression (SVR), and calculating model performance with R2 score.

##### Import and read the bunch object "Boston" from scikit-learn.
##### Predict the "MEDV"
##### Use the "LSTAT","INDUS","TAX","RM","CRIM" columns.
##### Save model into a variable called boston_model.
##### Fit the model.
##### Print the intercept, coefficients, and Mean Absolute Error (MAE)

In [25]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston 
from sklearn import linear_model
from sklearn.metrics import mean_absolute_error
import warnings
warnings.filterwarnings('ignore')
import seaborn as sns; sns.set()
from matplotlib import pyplot as plt

Boston_ = load_boston()
Boston = pd.DataFrame(Boston_.data, columns = Boston_.feature_names)
Boston["MEDV"] = Boston_.target

 #### Look at the keys(features) included in the bunch object
 #### Explore

In [26]:
Boston.keys()

Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT', 'MEDV'],
      dtype='object')

In [27]:
type(Boston)

pandas.core.frame.DataFrame

In [28]:
Boston.head(10)

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,MEDV
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2
5,0.02985,0.0,2.18,0.0,0.458,6.43,58.7,6.0622,3.0,222.0,18.7,394.12,5.21,28.7
6,0.08829,12.5,7.87,0.0,0.524,6.012,66.6,5.5605,5.0,311.0,15.2,395.6,12.43,22.9
7,0.14455,12.5,7.87,0.0,0.524,6.172,96.1,5.9505,5.0,311.0,15.2,396.9,19.15,27.1
8,0.21124,12.5,7.87,0.0,0.524,5.631,100.0,6.0821,5.0,311.0,15.2,386.63,29.93,16.5
9,0.17004,12.5,7.87,0.0,0.524,6.004,85.9,6.5921,5.0,311.0,15.2,386.71,17.1,18.9


In [29]:
X = Boston[["LSTAT","INDUS","TAX", "RM","CRIM"]]
y = Boston["MEDV"]

In [30]:
multiple_prediction = linear_model.LinearRegression()
boston_model = multiple_prediction.fit(X,y)
boston_model
predictions = boston_model.predict(X)

In [31]:
print(f"Intercept: {multiple_prediction.intercept_}")
print(f"Coefficients: {multiple_prediction.coef_}")
print(f"Score: {multiple_prediction.score(X,y)}")

Intercept: -1.510423759820231
Coefficients: [-0.54671792  0.04509313 -0.00612077  5.27782007 -0.05859972]
Score: 0.6510689983930412


##### Create a Pandas.DataFrame and a Pandas.Series for the features and the targets:

##### Create a Pandas.DataFrame with the features. 
##### Assign the object to the variable features.
##### Create a Pandas.Series object for the targets.
##### Set the name of the Series to price.
##### Assign the object to the variable output.

In [35]:
data = load_boston()

In [36]:
features = pd.DataFrame(data.data, columns=data.feature_names)
output = pd.Series(data.target, name='price')

##### Instantiate a scaler and assign it to a variable called scaler_stand.
##### Scale the features using the method fit_transform.
##### Assign the result to a variable called features_stand.

In [37]:
from sklearn.preprocessing import StandardScaler

scaler_stand = StandardScaler()
features_stand = scaler_stand.fit_transform(features)

print("features_stand:\n mean: {}\n stdev: {}".format(features_stand.mean(0), features_stand.std(0)))

features_stand:
 mean: [-8.78743718e-17 -6.34319123e-16 -2.68291099e-15  4.70199198e-16
  2.49032240e-15 -1.14523016e-14 -1.40785495e-15  9.21090169e-16
  5.44140929e-16 -8.86861950e-16 -9.20563581e-15  8.16310129e-15
 -3.37016317e-16]
 stdev: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


##### Instantiate a MinMaxScaler to scale the data between 0 and 1. 
##### Assign it to a variable called scaler_minmax. 
##### Scale the features using the method fit_transform() 
##### Assign the result to a variable called features_minmax.

In [38]:
from sklearn.preprocessing import MinMaxScaler

scaler_minmax = MinMaxScaler(feature_range=(0, 1))
features_minmax = scaler_minmax.fit_transform(features)

print("features_minmax:\n min: {}\n max: {}".format(features_minmax.min(0), features_minmax.max(0)))

features_minmax:
 min: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 max: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


##### Create a k-Fold cross-validation object with 45 splits with shuffling. 
##### Set the random_state of the object to 14. 
##### Assign the KFold object to a variable named cv.

In [40]:
from sklearn.model_selection import KFold

cv = KFold(n_splits=45, random_state=14, shuffle=True)

##### Instantiate a Epsilon-Support Vector Regression (SVR) model using the rbf kernel.
##### Assign it to a variable called svr_model.

In [41]:
from sklearn.svm import SVR

svr_model = SVR(kernel='rbf')

##### Evaluate the score of our model using k-fold cross-validation by measuring the performance of the model in each of the splits.

##### Build a for loop and use the k-fold cross validation object cv to iterate over the different splits.

##### With these indices create, at each iteration, the objects X_train, X_test, y_train, y_test. 
##### Get the data from the scaled features features_stand.

##### Create a template by cloning the svr_model.
##### Assign the copy to iter_model.
##### Fit iter_model with the training samples (X_train, y_train).
##### Compute the R2 score on the test samples and append the score of each iteration to the list scores.

In [49]:
from sklearn.base import clone

scores = []
for train_idx, test_idx in cv.split(features):
    X_train, X_test, y_train, y_test = features_stand[train_idx], features_stand[test_idx], output.iloc[train_idx], output.iloc[test_idx]
iter_model = clone(svr_model)
iter_model.fit(X_train, y_train)
scores.append(iter_model.score(X_test, y_test))


print("The average R2 score of the model is: {}".format(np.mean(scores)))

The average R2 score of the model is: 0.8799597219733448
