# Results
- Models used with hyperparameters 
- Best Model parameters 
- Mean Cross validation score of Best model
- Test score of best model 
- Train score of best model 

 Out of all the models,Lasso works best with this dataset.

The best model parameter:
    Lasso parameters:  {'alpha': 1109.090909090909}.
    
Best Mean Cross-validation score: 0.89

Lasso Train Performance:  0.9045742635271026

Lasso Test Performance:  0.8868403098159492


## Data PreProcessing

In [1]:
from math import sqrt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

pd.pandas.set_option('display.max_columns', None)
%matplotlib inline

### Load Datasets

In [2]:
#from google.colab import drive
#drive.mount('/content/drive')

In [3]:
# load dataset
# your code here
data = pd.read_csv('houseprice.csv')


### Types of variables



In [4]:
# we have an Id variable, that we should not use for predictions:

print('Number of House Id labels: ', len(data.Id.unique()))
print('Number of Houses in the Dataset: ', len(data))

Number of House Id labels:  1460
Number of Houses in the Dataset:  1460


#### Find categorical variables

In [5]:
# find categorical variables- hint data type = 'O'

categorical = [var for var in data.columns if data[var].dtype=='O']

print(f'There are {len(categorical)} categorical variables')

There are 43 categorical variables


#### Find temporal variables

In [6]:
# make a list of the numerical variables first= Hint data type != O
numerical = [var for var in data.columns if data[var].dtype!='O']

# list of variables that contain year information= Hint variable namme has Yr or 
year_vars = [var for var in numerical if 'Yr' in var or 'Year' in var]

year_vars

['YearBuilt', 'YearRemodAdd', 'GarageYrBlt', 'YrSold']

#### Find discrete variables

To identify discrete variables- numerical variables with less than 20 unique values 

In [7]:
# let's visualise the values of the discrete variables
discrete = [var for var in numerical if len(data[var].unique()) < 20 and var not in year_vars]

print(f'There are {len(discrete)} discrete variables')

There are 14 discrete variables


#### Continuous variables

In [8]:
# find continuous variables- hint numerical variables not in discrete and  year_years 
# Also remove the Id variable and the target variable SalePrice
# which are both also numerical

continuous = [var for var in numerical if var not in discrete and var not in [
    'Id', 'SalePrice'] and var not in year_vars]

print('There are {} numerical and continuous variables'.format(len(numerical)))

There are 38 numerical and continuous variables


### Separate train and test set

In [9]:
# Let's separate into train and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data.drop(['Id', 'SalePrice'], axis=1),
                                                    data['SalePrice'],
                                                    test_size=0.1,
                                                    random_state=0)

X_train.shape, X_test.shape

((1314, 79), (146, 79))

**Now we will move on and engineer the features of this dataset. The most important part for this course.**

### Craete New Variables

Replace 'YearBuilt', 'YearRemodAdd', 'GarageYrBlt  with time elapsed since YrSold
So YearBuilt = YrSold-YearBuilt. 

Similarly transform 'YearRemodAdd', 'GarageYrBlt.
After making transformation drop YrSold

In [10]:
# function to calculate elapsed time

def elapsed_years(df, var):
    # capture difference between year variable and
    # year the house was sold
    
    df[var] = df['YrSold'] - df[var]
    return df

In [11]:
for var in ['YearBuilt', 'YearRemodAdd', 'GarageYrBlt']:
    X_train = elapsed_years(X_train, var)
    X_test = elapsed_years(X_test, var)

In [12]:
# drop YrSold
X_train.drop('YrSold', axis=1, inplace=True)
X_test.drop('YrSold', axis=1, inplace=True)

In [13]:
year_vars.remove('YrSold')

In [14]:
# capture the column names for use later in the notebook
final_columns = X_train.columns
final_columns

Index(['MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street', 'Alley',
       'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope',
       'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle',
       'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd', 'RoofStyle',
       'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea',
       'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond',
       'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2',
       'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating', 'HeatingQC',
       'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF', 'LowQualFinSF',
       'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath', 'HalfBath',
       'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual', 'TotRmsAbvGrd',
       'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType', 'GarageYrBlt',
       'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual', 'GarageCond',
       'PavedDrive', 'Wo

In [15]:
pip install feature_engine

Note: you may need to restart the kernel to use updated packages.


### Feature Engineering Pipeline

In [16]:
# I will treat discrete variables as if they were categorical
# to treat discrete as categorical using Feature-engine
# we need to re-cast them as object

X_train[discrete] = X_train[discrete].astype('O')
X_test[discrete] = X_test[discrete].astype('O')

In [17]:
# import relevant modules for feature engineering
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from feature_engine import missing_data_imputers as mdi
from feature_engine import categorical_encoders as ce
from feature_engine.variable_transformers import YeoJohnsonTransformer
from sklearn.preprocessing import StandardScaler
from feature_engine.discretisers import DecisionTreeDiscretiser

In [18]:
house_preprocess = Pipeline([
    
    # missing data imputation 
    ('missing_ind', mdi.AddNaNBinaryImputer(
        variables=['LotFrontage', 'MasVnrArea',  'GarageYrBlt'])),
    ('imputer_num', mdi.MeanMedianImputer(imputation_method='mean',
                                          variables=['LotFrontage', 'MasVnrArea',  'GarageYrBlt'])),
    ('imputer_cat', mdi.CategoricalVariableImputer(variables=categorical)),

    # categorical encoding 
     ('rare_label_enc', ce.RareLabelCategoricalEncoder(
         tol=0.01,n_categories=6, variables=categorical+discrete)),
    ('categorical_enc', ce.MeanCategoricalEncoder(variables = categorical + discrete)),
     
    # Transforming Numerical Variables
    ('yjt', YeoJohnsonTransformer(variables = ['LotFrontage','MasVnrArea', 'GarageYrBlt'])),

    
    # discretisation and encoding
    ('treeDisc',  DecisionTreeDiscretiser(cv=2, scoring='neg_mean_squared_error',
                                   regression=True,
                                   param_grid={'max_depth': [1,2,3,4,5,6]})),

    # feature Scaling
    ('scaler', StandardScaler()),
    
    

])

In [19]:
house_preprocess.fit(X_train,y_train)

Pipeline(memory=None,
         steps=[('missing_ind',
                 AddNaNBinaryImputer(variables=['LotFrontage', 'MasVnrArea',
                                                'GarageYrBlt'])),
                ('imputer_num',
                 MeanMedianImputer(imputation_method='mean',
                                   variables=['LotFrontage', 'MasVnrArea',
                                              'GarageYrBlt'])),
                ('imputer_cat',
                 CategoricalVariableImputer(variables=['MSZoning', 'Street',
                                                       'Alley', 'LotShape',
                                                       'LandContour',
                                                       'Utilities', '...
                                                    'Utilities', 'LotConfig',
                                                    'LandSlope', 'Neighborhood',
                                                    'Condition1', 'Condition2',
    

In [20]:
# Apply Transformations
X_train=house_preprocess.transform(X_train)
X_test=house_preprocess.transform(X_test)

## <span class="mark">DO NOT CHANGE STEPS BEFORE THIS POINT</span>

## Regression Models- Tune different models one by one

# LINEAR REGRESSION

In [46]:
# Train a linear regression model, report the coefficients and model performance 
import warnings
warnings.filterwarnings("ignore")
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV

lr = LinearRegression()
lr.fit(X_train, y_train)
cv_scores = cross_val_score(lr, X_train, y_train,)

# Mean Cross validation Score
print("Mean Cross-validation scores: {}".format(cv_scores))

# Print Co-efficients
print("lr.coef_:", lr.coef_)
print("lr.intercept_:", lr.intercept_)
#print("lr.score:",lr.best_score_)

# Check test data set performance
print("LR Train Performance Test: ", lr.score(X_train,y_train))
print("LR test Performance Test",lr.score(X_test,y_test))


Mean Cross-validation scores: [ 8.68250071e-01 -1.82946152e+21  8.77161153e-01  8.98296189e-01
  8.92093773e-01]
lr.coef_: [ 8.74692295e+02  9.72879918e+02  1.43598424e+03  2.41530789e+03
  1.55922446e+03  3.31539942e+02  5.84804509e+02  1.20332634e+03
  1.48520143e+03  2.33009441e+03  1.17630965e+03  1.13744157e+04
  1.26748073e+03  2.03977054e+03  1.11272495e+03 -1.03635447e+03
  1.58434519e+04 -2.93676378e+02 -5.41610661e+03  3.98221701e+03
  3.68590080e+02 -1.04048824e+03  3.19187631e+03 -2.39551624e+03
 -8.64923875e+02  2.87894951e+02  2.67846806e+03  7.08957253e+02
  1.02948711e+02  2.34865476e+03  5.71438886e+02  3.60875596e+03
  1.23986809e+03  5.64889112e+03 -1.47594682e+03  1.93575359e+03
 -2.70259491e+02  7.61385836e+03  2.15847425e+02  1.45627677e+03
  1.18463265e+03 -7.40189368e+02  1.22327269e+04  1.12776771e+04
  4.16965516e+03  5.78069444e+03  2.85514634e+03 -1.32019971e+03
  4.18391534e+03  5.24407111e+03  6.05123160e+02  3.06070473e+03
  3.74704855e+03  3.12566418e+03

# SGD REGRESSOR


In [22]:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import SGDRegressor
from math import sqrt
reg_sgd_pipe = Pipeline([
    ('sgd_reg', SGDRegressor(max_iter=1000, tol = 1e-6))
        ])
param_sgd = {'sgd_reg__eta0':np.linspace(0.0001,1,100), 'sgd_reg__penalty' :['l1','l2'],'sgd_reg__alpha' :[100,10,1,0.1,0.01,0.001]}
grid_sgd = GridSearchCV(reg_sgd_pipe, param_sgd,cv=5, n_jobs=-1, return_train_score = True,scoring='neg_mean_squared_error')
# Check test data set performance

grid_sgd.fit(X_train, y_train)

# let's get the predictions
X_train_preds = grid_sgd.predict(X_train)
X_test_preds = grid_sgd.predict(X_test)
print("grid_sgd.coef_:", grid_sgd.best_params_)
print("grid_sgd.score", grid_sgd.best_score_)

print('train mse: {}'.format(mean_squared_error(y_train, X_train_preds)))
print('train rmse: {}'.format(sqrt(mean_squared_error(y_train, X_train_preds))))
print('train r2: {}'.format(r2_score(y_train, X_train_preds)))
print()
print('test mse: {}'.format(mean_squared_error(y_test, X_test_preds)))
print('test rmse: {}'.format(sqrt(mean_squared_error(y_test, X_test_preds))))
print('test r2: {}'.format(r2_score(y_test, X_test_preds)))

grid_sgd.coef_: {'sgd_reg__alpha': 0.01, 'sgd_reg__eta0': 0.010199999999999999, 'sgd_reg__penalty': 'l2'}
grid_sgd.score -730358068.6082113
train mse: 579375275.099471
train rmse: 24070.215518342808
train r2: 0.9072083799576792

test mse: 873059303.7710655
test rmse: 29547.576952621097
test r2: 0.8729562489344264


# RIDGE

In [23]:
# Train a Ridge regression model, report the coefficients, the best parameters, and model performance 
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import Ridge
ridge = Ridge()
cv_scores = cross_val_score(ridge, X_train, y_train,cv=5)

#define a list of parameters
param_ridge = {'alpha':np.linspace(0.001,1000,100),'max_iter':[1,10,100,1000],'tol':[0.1,0.01,0.001,0.0001,0.00001] }

grid_ridge = GridSearchCV(ridge, param_ridge, cv=7, return_train_score = True)
grid_ridge.fit(X_train, y_train)
# Mean Cross Validation Score
print("Best Mean Cross-validation score: {:.2f}".format(grid_ridge.best_score_))

print("Number of iteration:{}".format(len(cv_scores)))
print("Mean Accuracy:{}".format(cv_scores.mean()))

#find best parameters
print('Ridge parameters: ', grid_ridge.best_params_)

# print co-eff

print("Ridge.coef_:", grid_ridge.best_estimator_.coef_)
print("Ridge.intercept_:", grid_ridge.best_estimator_.intercept_)

# Check test data set performance

print("Ridge Test Performance: ", grid_ridge.score(X_test,y_test))
X_train_preds = grid_ridge.predict(X_train)
X_test_preds = grid_ridge.predict(X_test)
print('train mse: {}'.format(mean_squared_error(y_train, X_train_preds)))
print('train rmse: {}'.format(sqrt(mean_squared_error(y_train, X_train_preds))))
print('train r2: {}'.format(r2_score(y_train, X_train_preds)))
print()
print('test mse: {}'.format(mean_squared_error(y_test, X_test_preds)))
print('test rmse: {}'.format(sqrt(mean_squared_error(y_test, X_test_preds))))
print('test r2: {}'.format(r2_score(y_test, X_test_preds)))

Best Mean Cross-validation score: 0.89
Number of iteration:5
Mean Accuracy:0.8818969790895619
Ridge parameters:  {'alpha': 111.11200000000001, 'max_iter': 1, 'tol': 0.1}
Ridge.coef_: [ 6.14923570e+02  1.18815132e+03  1.43374536e+03  2.66397231e+03
  1.50256700e+03  2.57909805e+02  6.77444690e+02  1.18505571e+03
  1.41272144e+03  2.23195970e+03  1.25655577e+03  9.80143992e+03
  1.22913309e+03  1.97789166e+03  1.21185105e+03 -6.37455745e+02
  1.28761649e+04 -4.74099437e+02 -2.66752274e+03  3.18082395e+03
  7.08025188e+02 -5.71762854e+02  1.70812640e+03 -1.19053599e+03
 -8.03633343e+02  8.61663506e+02  3.43176014e+03  6.86957598e+02
 -3.05551189e+02  2.74455022e+03  4.47955751e+02  3.44259635e+03
  1.38072610e+03  5.86835782e+03 -1.18908757e+03  1.82979133e+03
 -2.41222354e+02  7.77736208e+03  1.56649872e+02  1.41839632e+03
  1.23312397e+03 -5.26924115e+02  1.06539171e+04  1.00738499e+04
  3.69647792e+03  6.27870376e+03  2.52255395e+03 -1.08401246e+03
  3.79347973e+03  4.44808380e+03  8.4

# LASSO

In [24]:
# Train a Lasso regression model, report the coefficients, the best parameters, and model performance 

# YOUR CODE HERE

from sklearn.linear_model import Lasso
lasso = Lasso(random_state=11)

#define a list of parameters

param_lasso = {'alpha':np.linspace(100,100000,100)}

grid_lasso = GridSearchCV(lasso, param_lasso, cv=7, return_train_score = True)
grid_lasso.fit(X_train, y_train)

# Mean Cross Validation Score
print("Best Mean Cross-validation score: {:.2f}".format(grid_lasso.best_score_))
print()

#find best parameters
print('Lasso parameters: ', grid_lasso.best_params_)

# print co-eff

print("Lasso.coef_:", grid_lasso.best_estimator_.coef_)
print("Lasso.intercept_:", grid_lasso.best_estimator_.intercept_)

# Check test data set performance
print("Lasso Train Performance: ", grid_lasso.score(X_train,y_train))

print("Lasso Test Performance: ", grid_lasso.score(X_test,y_test))

Best Mean Cross-validation score: 0.89

Lasso parameters:  {'alpha': 1109.090909090909}
Lasso.coef_: [ 5.94841396e+02  7.46183065e+02  5.43402399e+02  2.97270207e+03
  2.67105934e+02  0.00000000e+00  2.19337119e+02  5.23935000e+02
  9.65855405e+01  1.36857752e+03  6.38955344e+02  1.07178204e+04
  3.38519535e+02  1.21606364e+03  1.02208738e+03  0.00000000e+00
  1.61244090e+04 -0.00000000e+00 -0.00000000e+00  2.53218590e+03
  0.00000000e+00 -0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  1.72745723e+03  0.00000000e+00
  0.00000000e+00  1.25068685e+03  0.00000000e+00  3.02624383e+03
  0.00000000e+00  6.21707390e+03 -0.00000000e+00  8.26682290e+02
 -0.00000000e+00  8.20211675e+03  0.00000000e+00  4.07343171e+02
  9.51707656e+02  0.00000000e+00  1.20039329e+04  1.13400370e+04
  2.58611453e+03  6.09049609e+03  2.16389139e+03 -1.49084533e+02
  2.01193506e+03  3.79039979e+03  2.51428333e+02  1.69670258e+03
  4.09161557e+03  2.52831844e+03  1.53012663e+03  1.32

# ELASTIC NET

In [47]:
from sklearn.linear_model import ElasticNet
elastic=ElasticNet()
param_elasticnet = {'alpha':[0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10,100], 'l1_ratio' :[0.2,0.4,0.6,0.8],'max_iter':[1,10,100,500]}
grid_elasticnet = GridSearchCV(elastic , param_elasticnet, cv=5, return_train_score = True)
grid_elasticnet.fit(X_train, y_train)

grid_elasticnet_train_score = grid_elasticnet.score(X_train, y_train)
grid_elasticnet_test_score = grid_elasticnet.score(X_test, y_test)

print('Training set score: ', grid_elasticnet_train_score)
print('Test score: ', grid_elasticnet_test_score)

#find best parameters
print('Best parameters: ', grid_elasticnet.best_params_)
print('Best cross-validation score:', grid_elasticnet.best_score_)

Training set score:  0.9102410052276642
Test score:  0.8730194756756253
Best parameters:  {'alpha': 0.1, 'l1_ratio': 0.2, 'max_iter': 100}
Best cross-validation score: 0.8840709989966689


# k NEIGHBORS REGRESSOR

In [26]:
from sklearn.neighbors import KNeighborsRegressor
knn_reg=KNeighborsRegressor()
param_knn = {'n_neighbors': range(11,15)}
grid_knn = GridSearchCV(knn_reg, param_knn, cv=5, return_train_score=True)
grid_knn.fit(X_train, y_train)

print('train score: ', grid_knn.score(X_train, y_train))
print('test score: ', grid_knn.score(X_test, y_test))

#find best parameters
print('Best parameters: ', grid_knn.best_params_)
print('Best cross-validation score:', grid_knn.best_score_)



train score:  0.843268818865971
test score:  0.7208736881444381
Best parameters:  {'n_neighbors': 11}
Best cross-validation score: 0.7988607307818896


# POLYNOMIAL REGRESSION


In [27]:
from sklearn.preprocessing import PolynomialFeatures
pipe_poly=Pipeline([ 
    ('polynomialfeatures', PolynomialFeatures()),
    ('norm_reg', LinearRegression()) 
])
param_poly = {'polynomialfeatures__degree':[1,2]}

grid_poly = GridSearchCV(pipe_poly, param_poly,cv=5, n_jobs=-1, return_train_score = True,scoring='neg_mean_squared_error')


grid_poly.fit(X_train, y_train)

# let's get the predictions
X_train_preds = grid_poly.predict(X_train)
X_test_preds = grid_poly.predict(X_test)

# check model performance:

print('train score: ', grid_poly.score(X_train, y_train))
print('test score: ', grid_poly.score(X_test, y_test))

#find best parameters
print('Best parameters: ', grid_poly.best_params_)
print('Best score:',grid_poly.best_score_)




train score:  -27493.178878248375
test score:  -6.112494792154257e+25
Best parameters:  {'polynomialfeatures__degree': 2}
Best score: -5.864770099797923e+25


# DECISION TREE REGRESSOR

In [28]:
from sklearn.tree import DecisionTreeRegressor
pt_tree = DecisionTreeRegressor(random_state = 0)
param_DT = {'max_depth': range(1,20),'criterion':['mse','friedman_mse','mae'],'splitter':['best']}

grid_tree = GridSearchCV(pt_tree,param_DT,cv=5)
grid_tree.fit(X_train,y_train)
print(grid_tree.best_params_)
print("Accuracy on training set: {:.3f}".format(grid_tree.score(X_train, y_train)))
print("Accuracy on test set: {:.3f}".format(grid_tree.score(X_test, y_test)))
print("Best Score:",grid_tree.best_score_)


{'criterion': 'mse', 'max_depth': 5, 'splitter': 'best'}
Accuracy on training set: 0.858
Accuracy on test set: 0.800
Best Score: 0.7615945500938354


# SUPPORT VECTOR MACHINES-REGRESSION

In [32]:
from sklearn.svm import SVR

param_grid = [{'kernel': ['rbf'],
               'C': [0.001, 0.01, 0.1, 1, 10, 100],
               'gamma': [0.001, 0.01, 0.1, 1, 10, 100]},
              {'kernel': ['sigmoid'],
               'C':[0.001,0.01,0.1,0.25,0.5,1,10],
               'gamma': [0.1,1,2,5,10,100,500]},
              {'kernel':['poly'],
               'degree':[1,2,3],
               'C':[0.001,0.01,0.1,0.25,0.5,1,10],
               'gamma':[0.1,1,2,5,10,100]}]
print("List of grids:\n{}".format(param_grid))

List of grids:
[{'kernel': ['rbf'], 'C': [0.001, 0.01, 0.1, 1, 10, 100], 'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}, {'kernel': ['sigmoid'], 'C': [0.001, 0.01, 0.1, 0.25, 0.5, 1, 10], 'gamma': [0.1, 1, 2, 5, 10, 100, 500]}, {'kernel': ['poly'], 'degree': [1, 2, 3], 'C': [0.001, 0.01, 0.1, 0.25, 0.5, 1, 10], 'gamma': [0.1, 1, 2, 5, 10, 100]}]


In [30]:
grid_search = GridSearchCV(SVR(), param_grid, cv=3,
                          return_train_score=True)
grid_search.fit(X_train, y_train)
print('train score: ', grid_search.score(X_train, y_train))
print('test score: ', grid_search.score(X_test, y_test))
print("Best parameters: {}".format(grid_search.best_params_))
print("Best cross-validation score: {:.2f}".format(grid_search.best_score_))

train score:  0.9016336525800952
test score:  0.8702806425506886
Best parameters: {'C': 10, 'degree': 1, 'gamma': 100, 'kernel': 'poly'}
Best cross-validation score: 0.89


## Tune Multiple Models with one GridSearch

In [33]:
#model_gs = Pipeline([('Linear Regression',LinearRegression()),
#                    ('SGD Regressor',SGDRegressor(max_iter=1000, tol = 1e-6)),
#                    ('Ridge',Ridge()),
#                   ('Lasso',Lasso()),
#                    ('K Nearest neighbors',KneighborsRegressor()),
#                    ('Polynomial Regression', POlynomialFeatures()),
#                    ('Decision Tree Regressor',DecisionTreeRegressor()),
#                    ('Support Vector Machine',SVR())])

In [36]:
#model_parm_gd = [
 #   { 'sgd_reg__eta0':np.linspace(0.0001,1,100), 'sgd_reg__penalty' :['l1','l2'],'sgd_reg__alpha' :[100,10,1,0.1,0.01,0.001]} ,
    
 ## {'regressor__ridge':np.linspace(0.001,1,100)},
    
 #   {'regressor__alpha':np.linspace(100,100000,100)},
 #   {'regressor_elasticnet':{'alpha':[0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10,100], 'l1_ratio' :[0.2,0.4,0.6,0.8],'max_iter':[1,10,100,500]}},
 #    {'regressor_knn':{'n_neighbors': range(11,15)}},
 #    {'regressor_poly':{'polynomialfeatures__degree':range(1,3)}},
#   {'regressor_decisiontree':{'max_depth': range(1,30),'criterion':['mse','friedman_mse','mae'],'splitter':['best']}},
 #    {'regressor_SVM':{'kernel': ['rbf'],
  #             'C': [0.001, 0.01, 0.1, 1, 10, 100],
   #            'gamma': [0.001, 0.01, 0.1, 1, 10, 100]},
    #          {'kernel': ['sigmoid'],
     #          'C':[0.001,0.01,0.1,0.25,0.5,1,10],
      #         'gamma': [0.1,1,2,5,10,100,500]},
       #       {'kernel':['poly'],
        #       'degree':[1,2,3],
         #      'C':[0.001,0.01,0.1,0.25,0.5,1,10],
          #     'gamma':[0.1,1,2,5,10,100]}}
 
     #]

In [37]:
#grid_search_house_pipe = GridSearchCV(model_gs, model_parm_gd)

In [38]:
#grid_search_house_pipe.fit(X_train,y_train)

In [39]:
#print(grid_search_house_pipe.best_params_)

In [40]:
# let's get the predictions
#X_train_preds = grid_search_house_pipe.predict(X_train)
#X_test_preds = grid_search_house_pipe.predict(X_test)

In [41]:
#print("Best Mean Cross-validation score: {}".format(grid_search_house_pipe.best_score_))

In [44]:
# check model performance:
#from sklearn.metrics import mean_squared_error
#from sklearn.metrics import r2_score

#print('train mse: {}'.format(mean_squared_error(y_train, X_train_preds)))
#print('train rmse: {}'.format(sqrt(mean_squared_error(y_train, X_train_preds))))
#print('train r2: {}'.format(r2_score(y_train, X_train_preds)))
#print()
#print('test mse: {}'.format(mean_squared_error(y_test, X_test_preds)))
#print('test rmse: {}'.format(sqrt(mean_squared_error(y_test, X_test_preds))))
#print('test r2: {}'.format(r2_score(y_test, X_test_preds)))