# Table of Contents <a id = "top"></a>

### 0. [Import Libraries](#0)

### 1. [HR Analytics](#1)
- [1.1 Load and Explore the data](#1.1)
- [1.2 Build the Model(s)](#1.2)
    - [1.2.1 Logistic Classifier](#1.2.1)
    - [1.2.2 DecisionTreeClassifier](#1.2.2)
    - [1.2.3 MLPClassifier](#1.2.3)
    - [1.2.4 RandomForestClassifier](#1.2.4)
    - [1.2.5 BaggingClassifier](#1.2.5)
    - [1.2.6 AdaBoostClassifier](#1.2.6)
    - [1.2.7 XGBClassifier](#1.2.7)
    - [1.2.8 SVC](#1.2.8)
    - [1.2.9 VotingClassifier](#1.2.9)
- [1.3 Evaluate and Improve the Model(s)](#1.3)
    - [1.3.1 Logistic Classifier](#1.3.1)
    - [1.3.2 DecisionTreeClassifier](#1.3.2)
    - [1.3.3 MLPClassifier](#1.3.3)
    - [1.3.4 RandomForestClassifier](#1.3.4)
    - [1.3.5 BaggingClassifier](#1.3.5)
    - [1.3.6 AdaBoostClassifier](#1.3.6)
    - [1.3.7 XGBClassifier](#1.3.7)
    - [1.3.8 SVC](#1.3.8)
    - [1.3.9 VotingClassifier](#1.3.9)

### 2. [Airbnb](#2)
- [2.1 Load and Explore the data](#2.1)
- [2.2 Build the Model(s)](#2.2)
    - [2.2.1 Linear Regressor](#2.2.1)
    - [2.2.2 DecisionTreeRegressor](#2.2.2)
    - [2.2.3 MLPRegressor](#2.2.3)
    - [2.2.4 RandomForestRegressor](#2.2.4)
    - [2.2.5 BaggingRegressor](#2.2.5)
    - [2.2.6 AdaBoostRegressor](#2.2.6)
    - [2.2.7 XGBRegressor](#2.2.7)
    - [2.2.8 SVRRegressor](#2.2.8)
    - [2.2.9 VotingRegressor](#2.2.9)
- [2.3 Evaluate and Improve the Model(s)](#2.3)
    - [2.3.1 Linear Regressor](#2.3.1)
    - [2.3.2 DecisionTreeRegressor](#2.3.2)
    - [2.3.3 MLPRegressor](#2.3.3)
    - [2.3.4 RandomForestRegressor](#2.3.4)
    - [2.3.5 BaggingRegressor](#2.3.5)
    - [2.3.6 AdaBoostRegressor](#2.3.6)
    - [2.3.7 XGBRegressor](#2.3.7)
    - [2.3.8 SVRRegressor](#2.3.8)
    - [2.3.9 VotingRegressor](#2.3.9)

# 0. Import Libraries <a id = "0"></a>

[Back to top](#top)

In [1]:
# Import the required packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Import warnings to turn off warning messages
import warnings
warnings.filterwarnings('ignore')

# Regressors
from sklearn.linear_model import LinearRegression, LogisticRegression

# Decision Tree
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor

# Multi-layer classifier
from sklearn.neural_network import MLPClassifier, MLPRegressor

# Ensemble learning
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor, BaggingRegressor, BaggingClassifier, AdaBoostClassifier, AdaBoostRegressor, VotingClassifier, VotingRegressor

# XGBClassifier
from xgboost import XGBClassifier, XGBRegressor

# Support Vector Machine
from sklearn.svm import SVC, SVR

# Model evaluation
from sklearn.metrics import mean_squared_error

# GridSearch
from sklearn.model_selection import GridSearchCV

# 1. HR Analytics <a id = "1"></a>

[Back to top](#top)

## 1.1 Load and Sample the data <a id = "1.1"></a>

[Back to top](#top)

In [2]:
# target column = is_promoted
# 0 = not promoted
# 1 = promoted

hr_xtrain = pd.read_csv('./data/hr_final_Xtrain.csv')
hr_xtest = pd.read_csv('./data/hr_final_Xtest.csv')

hr_ytrain = pd.read_csv('./data/hr_final_ytrain.csv')
hr_ytest = pd.read_csv('./data/hr_final_ytest.csv')

## 1.2 Build the Model(s) <a id = "1.2"></a>

[Back to top](#top)

### 1.2.1 Logistic Classifier <a id = "1.2.1"></a>

[Back to top](#top)

In [3]:
# Create the model
lg = LogisticRegression(solver = 'lbfgs', max_iter = 10000, random_state = 0)

# Fit the model
lg.fit(hr_xtrain, hr_ytrain)

LogisticRegression(max_iter=10000, random_state=0)

### 1.2.2 DecisionTreeClassifier <a id = "1.2.2"></a>

[Back to top](#top)

In [4]:
# Create the model
dtc = DecisionTreeClassifier(max_depth = 2, random_state = 0)

# Fit the model
dtc.fit(hr_xtrain, hr_ytrain)

DecisionTreeClassifier(max_depth=2, random_state=0)

### 1.2.3 MLPClassifier <a id = "1.2.3"></a>

[Back to top](#top)

In [5]:
# Create the model
mlpc = MLPClassifier(hidden_layer_sizes = (10,), max_iter = 2000, random_state = 0)

# Fit the model
mlpc.fit(hr_xtrain, hr_ytrain)

MLPClassifier(hidden_layer_sizes=(10,), max_iter=2000, random_state=0)

### 1.2.4 RandomForestClassifier <a id = "1.2.4"></a>

[Back to top](#top)

In [6]:
# Create the model
rfc = RandomForestClassifier(n_estimators = 10, max_depth = 4, random_state = 0)

# Fit the model
rfc.fit(hr_xtrain, hr_ytrain)

RandomForestClassifier(max_depth=4, n_estimators=10, random_state=0)

### 1.2.5 BaggingClassifier <a id = "1.2.5"></a>

[Back to top](#top)

In [7]:
# Create the model
bgc = BaggingClassifier(n_estimators = 10, random_state = 0)

# Fit the model
bgc.fit(hr_xtrain, hr_ytrain)

BaggingClassifier(random_state=0)

### 1.2.6 AdaBoostClassifier <a id = "1.2.6"></a>

[Back to top](#top)

In [8]:
# Create the model
adbc = AdaBoostClassifier(DecisionTreeClassifier(max_depth = 3), n_estimators = 10, learning_rate = 0.1, random_state = 0)

# Fit the model
adbc.fit(hr_xtrain, hr_ytrain)

AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=3),
                   learning_rate=0.1, n_estimators=10, random_state=0)

### 1.2.7 XGBClassifier <a id = "1.2.7"></a>

[Back to top](#top)

In [9]:
# Create the model
xgbc = XGBClassifier(n_estimators = 20, learning_rate = 0.1, eval_metric = 'logloss', random_state = 0)

# Fit the model
xgbc.fit(hr_xtrain, hr_ytrain)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              eval_metric='logloss', gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.1, max_delta_step=0,
              max_depth=6, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=20, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

### 1.2.8 SVC <a id = "1.2.8"></a>

[Back to top](#top)

In [10]:
# Create the model
svc = SVC(C = 0.8, kernel = 'rbf', random_state = 0)
# test C, kernal

# Fit the model
svc.fit(hr_xtrain, hr_ytrain)

SVC(C=0.8, random_state=0)

### 1.2.9 VotingClassifier <a id = "1.2.9"></a>

[Back to top](#top)

In [11]:
# Create the model
vc = VotingClassifier(estimators = [('dtc', dtc), ('mlpc', mlpc), ('rfc', rfc), 
                                    ('bgc', bgc), ('adbc', adbc), ('xgbc', xgbc), ('svc', svc)], 
                      voting = 'hard')

# Fit the model
vc.fit(hr_xtrain, hr_ytrain)

VotingClassifier(estimators=[('dtc',
                              DecisionTreeClassifier(max_depth=2,
                                                     random_state=0)),
                             ('mlpc',
                              MLPClassifier(hidden_layer_sizes=(10,),
                                            max_iter=2000, random_state=0)),
                             ('rfc',
                              RandomForestClassifier(max_depth=4,
                                                     n_estimators=10,
                                                     random_state=0)),
                             ('bgc', BaggingClassifier(random_state=0)),
                             ('adbc',
                              AdaBoostClassifier(base_estimator=DecisionTreeClassifier(ma...
                                            interaction_constraints='',
                                            learning_rate=0.1, max_delta_step=0,
                                        

## 1.3 Evaluate and Improve the Model(s) <a id = "1.3"></a>

[Back to top](#top)

### 1.3.1 Logistic Classifier <a id = "1.3.1"></a>

[Back to top](#top)

In [12]:
# Accuracy
train_acc = lg.score(hr_xtrain, hr_ytrain)
test_acc = lg.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.7364957918898241
Testing accuracy: 0.5080328454123527


In [13]:
# Create a GridSearch model to find the best parameters

param_grid = {"solver": ['lbfgs', 'liblinear', 'newton-cg', 'newton-cholesky', 'sag', 'saga'],
              "max_iter": [25, 50, 75, 100, 125, 150, 175]
             }

gs = GridSearchCV(lg, param_grid = param_grid, scoring = 'accuracy', cv = 10, n_jobs = -1)

gs = gs.fit(hr_xtrain, hr_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.7346560920896732
{'max_iter': 25, 'solver': 'lbfgs'}


In [14]:
# Create the model
lg = LogisticRegression(solver = 'lbfgs', 
                        max_iter = 25, 
                        random_state = 0)

# Fit the model
lg.fit(hr_xtrain, hr_ytrain)

LogisticRegression(max_iter=25, random_state=0)

In [15]:
# Accuracy
train_acc = lg.score(hr_xtrain, hr_ytrain)
test_acc = lg.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.7364957918898241
Testing accuracy: 0.5080328454123527


### 1.3.2 DecisionTreeClassifier <a id = "1.3.2"></a>

[Back to top](#top)

In [16]:
# Accuracy
train_acc = dtc.score(hr_xtrain, hr_ytrain)
test_acc = dtc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.6910482019892884
Testing accuracy: 0.5080328454123527


In [17]:
# Create a GridSearch model to find the best parameters

param_grid = {"criterion" : ["gini", "entropy"], 
              'splitter': ['best', 'random'],
              "min_samples_leaf" : [1, 2, 3, 4, 5], 
              "min_samples_split" : [2, 4, 6, 8, 10], 
              'max_depth' : [None, 1, 2, 3, 4, 5]
             }

gs = GridSearchCV(dtc, param_grid = param_grid, scoring = 'accuracy', cv = 10, n_jobs = -1)
gs = gs.fit(hr_xtrain, hr_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.7801099137830103
{'criterion': 'entropy', 'max_depth': None, 'min_samples_leaf': 5, 'min_samples_split': 2, 'splitter': 'random'}


In [18]:
# Create the model
dtc = DecisionTreeClassifier(criterion = 'entropy', 
                             max_depth = None, 
                             min_samples_leaf = 5, 
                             min_samples_split = 2,
                             splitter = 'random',
                             random_state = 0)

# Fit the model
dtc.fit(hr_xtrain, hr_ytrain)

DecisionTreeClassifier(criterion='entropy', min_samples_leaf=5, random_state=0,
                       splitter='random')

In [19]:
# Accuracy
train_acc = dtc.score(hr_xtrain, hr_ytrain)
test_acc = dtc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.8465187452180566
Testing accuracy: 0.5080328454123527


### 1.3.3 MLPClassifier <a id = "1.3.3"></a>

[Back to top](#top)

In [20]:
# Accuracy
train_acc = mlpc.score(hr_xtrain, hr_ytrain)
test_acc = mlpc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.7935730680948737
Testing accuracy: 0.5080328454123527


In [21]:
# Create a GridSearch model to find the best parameters

param_grid = {"activation": ['logistic', 'relu', 'identity','relu'],
              'hidden_layer_sizes': [(50,),(100,),(150,)],
              'learning_rate':['constant','invscaling','adaptive'],
              'max_iter': [100,200,300],
              'solver': ['sgd', 'adam']
             }

gs = GridSearchCV(mlpc, param_grid = param_grid, scoring = 'accuracy', cv = 5, n_jobs = -1)
gs = gs.fit(hr_xtrain, hr_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.8006120887528692
{'activation': 'relu', 'hidden_layer_sizes': (150,), 'learning_rate': 'constant', 'max_iter': 300, 'solver': 'adam'}


In [22]:
# Create the model
mlpc = MLPClassifier(activation = 'relu', 
                     hidden_layer_sizes = (150,), 
                     learning_rate = 'constant',
                     max_iter = 300, 
                     solver = 'adam', 
                     random_state = 0)

# Fit the model
mlpc.fit(hr_xtrain, hr_ytrain)

MLPClassifier(hidden_layer_sizes=(150,), max_iter=300, random_state=0)

In [23]:
# Accuracy
train_acc = mlpc.score(hr_xtrain, hr_ytrain)
test_acc = mlpc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.8125478194338179
Testing accuracy: 0.5080328454123527


### 1.3.4 RandomForestClassifier <a id = "1.3.4"></a>

[Back to top](#top)

In [24]:
# Accuracy
train_acc = rfc.score(hr_xtrain, hr_ytrain)
test_acc = rfc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.7690895179801072
Testing accuracy: 0.5080328454123527


In [25]:
# Create a GridSearch model to find the best parameters

param_grid = {"criterion" : ["gini", "entropy"], 
              "max_depth": [1,2,3,4,5], 
              'min_samples_split':[2,3,4,5],
              "min_samples_leaf" : [1,2,3,4,5], 
              "n_estimators": [25,50,100,125,150]
             }

gs = GridSearchCV(estimator = rfc, param_grid = param_grid, scoring = 'accuracy', cv = 5, n_jobs = -1)
gs = gs.fit(hr_xtrain, hr_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.7677123182861514
{'criterion': 'gini', 'max_depth': 5, 'min_samples_leaf': 4, 'min_samples_split': 2, 'n_estimators': 150}


In [27]:
# Create the model
rfc = RandomForestClassifier(criterion = 'gini', 
                             n_estimators = 150, 
                             max_depth = 5, 
                             min_samples_leaf = 4, 
                             min_samples_split = 2,
                             random_state = 0)

# Fit the model
rfc.fit(hr_xtrain, hr_ytrain)

RandomForestClassifier(max_depth=5, min_samples_leaf=4, n_estimators=150,
                       random_state=0)

In [28]:
# Accuracy
train_acc = rfc.score(hr_xtrain, hr_ytrain)
test_acc = rfc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.7684774292272379
Testing accuracy: 0.5080328454123527


### 1.3.5 BaggingClassifier <a id = "1.3.5"></a>

[Back to top](#top)

In [29]:
# Accuracy
train_acc = bgc.score(hr_xtrain, hr_ytrain)
test_acc = bgc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.9827084927314461
Testing accuracy: 0.5080328454123527


In [30]:
# Create a GridSearch model to find the best parameters

param_grid = {'max_samples': [1,2,3,4,5],
              'max_features': [1,2,3,4,5],
              "n_estimators": [5, 10, 15, 20, 25]
             }

gs = GridSearchCV(estimator = bgc, param_grid = param_grid, scoring = 'accuracy', cv = 5, n_jobs = -1)
gs = gs.fit(hr_xtrain, hr_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.6587605202754399
{'max_features': 3, 'max_samples': 5, 'n_estimators': 25}


In [31]:
# Create the model
bgc = BaggingClassifier(n_estimators = 25, 
                        max_features = 3, 
                        max_samples = 5, 
                        random_state = 0)

# Fit the model
bgc.fit(hr_xtrain, hr_ytrain)

BaggingClassifier(max_features=3, max_samples=5, n_estimators=25,
                  random_state=0)

In [32]:
# Accuracy
train_acc = bgc.score(hr_xtrain, hr_ytrain)
test_acc = bgc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.6306044376434583
Testing accuracy: 0.5080328454123527


### 1.3.6 AdaBoostClassifier <a id = "1.3.6"></a>

[Back to top](#top)

In [33]:
# Accuracy
train_acc = adbc.score(hr_xtrain, hr_ytrain)
test_acc = adbc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.7951032899770467
Testing accuracy: 0.5080328454123527


In [34]:
# Create a GridSearch model to find the best parameters

param_grid = {'learning_rate': [0.1, 0.5, 1, 1.5, 2],
              "n_estimators": [25,50,75,100,125],
              'algorithm': ["SAMME", "SAMME.R"]
             }

gs = GridSearchCV(estimator = adbc, param_grid = param_grid, scoring = 'accuracy', cv = 5, n_jobs = -1)
gs = gs.fit(hr_xtrain, hr_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.8120887528691659
{'algorithm': 'SAMME.R', 'learning_rate': 0.1, 'n_estimators': 125}


In [35]:
# Create the model
adbc = AdaBoostClassifier(dtc, 
                          n_estimators = 125, 
                          learning_rate = 0.1, 
                          algorithm = 'SAMME.R', 
                          random_state = 0)

# Fit the model
adbc.fit(hr_xtrain, hr_ytrain)

AdaBoostClassifier(base_estimator=DecisionTreeClassifier(criterion='entropy',
                                                         min_samples_leaf=5,
                                                         random_state=0,
                                                         splitter='random'),
                   learning_rate=0.1, n_estimators=125, random_state=0)

In [36]:
# Accuracy
train_acc = adbc.score(hr_xtrain, hr_ytrain)
test_acc = adbc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.9935730680948738
Testing accuracy: 0.5080328454123527


### 1.3.7 XGBClassifier <a id = "1.3.7"></a>

[Back to top](#top)

In [37]:
# Accuracy
train_acc = xgbc.score(hr_xtrain, hr_ytrain)
test_acc = xgbc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.8180566182096404
Testing accuracy: 0.5080328454123527


In [38]:
# Create a GridSearch model to find the best parameters

param_grid = {'learning_rate': [0.1, 0.3, 0.5, 1],
              'max_depth':[2,4,6,8,10],
              'eval_metric': ['logloss', 'rmse', 'mae']
             }

gs = GridSearchCV(estimator = xgbc, param_grid = param_grid, scoring = 'accuracy', cv = 5, n_jobs = -1)
gs = gs.fit(hr_xtrain, hr_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.8125478194338178
{'eval_metric': 'logloss', 'learning_rate': 0.5, 'max_depth': 4}


In [39]:
# Create the model
xgbc = XGBClassifier(learning_rate = 0.5, 
                     eval_metric = 'logloss', 
                     max_depth = 4,
                     random_state = 0)

# Fit the model
xgbc.fit(hr_xtrain, hr_ytrain)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
              eval_metric='logloss', gamma=0, gpu_id=-1, importance_type=None,
              interaction_constraints='', learning_rate=0.5, max_delta_step=0,
              max_depth=4, min_child_weight=1, missing=nan,
              monotone_constraints='()', n_estimators=100, n_jobs=12,
              num_parallel_tree=1, predictor='auto', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [40]:
# Accuracy
train_acc = xgbc.score(hr_xtrain, hr_ytrain)
test_acc = xgbc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.8800306044376435
Testing accuracy: 0.5080328454123527


### 1.3.8 SVC <a id = "1.3.8"></a>

[Back to top](#top)

In [41]:
# Accuracy
train_acc = svc.score(hr_xtrain, hr_ytrain)
test_acc = svc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.7860749808722265
Testing accuracy: 0.5080328454123527


In [42]:
# Create a GridSearch model to find the best parameters

param_grid = {'C': [0.5, 0.75, 1, 1.25, 1.5],
              "kernel": ['linear', 'poly', 'rbf', 'sigmoid'],
              'gamma': ['scale', 'auto']
             }

gs = GridSearchCV(estimator = svc, param_grid = param_grid, scoring = 'accuracy', cv = 5, n_jobs = -1)
gs = gs.fit(hr_xtrain, hr_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.7869931140015302
{'C': 1.5, 'gamma': 'scale', 'kernel': 'rbf'}


In [43]:
# Create the model
svc = SVC(C = 1.5, 
          kernel = 'rbf', 
          gamma = 'scale', 
          random_state = 0)

# Fit the model
svc.fit(hr_xtrain, hr_ytrain)

SVC(C=1.5, random_state=0)

In [44]:
# Accuracy
train_acc = svc.score(hr_xtrain, hr_ytrain)
test_acc = svc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.7918898240244836
Testing accuracy: 0.5080328454123527


### 1.3.9 VotingClassifier <a id = "1.3.9"></a>

[Back to top](#top)

In [45]:
# Accuracy
train_acc = vc.score(hr_xtrain, hr_ytrain)
test_acc = vc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.8012241775057384
Testing accuracy: 0.5080328454123527


In [46]:
# Create the model
vc = VotingClassifier(estimators = [('dtc', dtc), ('mlpc', mlpc), ('rfc', rfc), 
                                    ('bgc', bgc), ('adbc', adbc), ('xgbc', xgbc), ('svc', svc)], 
                      voting = 'hard')

# Fit the model
vc.fit(hr_xtrain, hr_ytrain)

VotingClassifier(estimators=[('dtc',
                              DecisionTreeClassifier(criterion='entropy',
                                                     min_samples_leaf=5,
                                                     random_state=0,
                                                     splitter='random')),
                             ('mlpc',
                              MLPClassifier(hidden_layer_sizes=(150,),
                                            max_iter=300, random_state=0)),
                             ('rfc',
                              RandomForestClassifier(max_depth=5,
                                                     min_samples_leaf=4,
                                                     n_estimators=150,
                                                     random_state=0)),
                             ('bgc',
                              BaggingClassifier(max_features=3, max_...
                                            interaction_const

In [47]:
# Accuracy
train_acc = vc.score(hr_xtrain, hr_ytrain)
test_acc = vc.score(hr_xtest, hr_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')

Training accuracy: 0.8485080336648814
Testing accuracy: 0.5080328454123527



# 2. Airbnb <a id = "2"></a>

[Back to top](#top)

## 2.1 Load and Sample the data <a id = "2.1"></a>

[Back to top](#top)

In [48]:
# target column = price

airbnb_xtrain = pd.read_csv('./data/airbnb_final_Xtrain.csv')
airbnb_xtest = pd.read_csv('./data/airbnb_final_Xtest.csv')

airbnb_ytrain = pd.read_csv('./data/airbnb_final_ytrain.csv')
airbnb_ytest = pd.read_csv('./data/airbnb_final_ytest.csv')

## 2.2 Build the Model(s) <a id = "2.2"></a>

[Back to top](#top)

### 2.2.1 Linear Regressor <a id = "2.2.1"></a>

[Back to top](#top)

In [49]:
# Create the model
lm = LinearRegression()

# Fit the model
lm.fit(airbnb_xtrain, airbnb_ytrain)

LinearRegression()

### 2.2.2 DecisionTreeRegressor <a id = "2.2.2"></a>

[Back to top](#top)

In [50]:
# Create the model
dtr = DecisionTreeRegressor(max_depth = 2, 
                            random_state = 0)

# Fit the model
dtr.fit(airbnb_xtrain, airbnb_ytrain)

DecisionTreeRegressor(max_depth=2, random_state=0)

### 2.2.3 MLPRegressor <a id = "2.2.3"></a>

[Back to top](#top)

In [51]:
# Create the model
mlpr = MLPRegressor(hidden_layer_sizes = (10,), 
                    max_iter = 2000, 
                    random_state = 0)

# Fit the model
mlpr.fit(airbnb_xtrain, airbnb_ytrain)

MLPRegressor(hidden_layer_sizes=(10,), max_iter=2000, random_state=0)

### 2.2.4 RandomForestRegressor <a id = "2.2.4"></a>

[Back to top](#top)

In [52]:
# Create the model
rfr = RandomForestRegressor(n_estimators = 10, 
                            max_depth = 4, 
                            random_state = 0)

# Fit the model
rfr.fit(airbnb_xtrain, airbnb_ytrain)

RandomForestRegressor(max_depth=4, n_estimators=10, random_state=0)

### 2.2.5 BaggingRegressor <a id = "2.2.5"></a>

[Back to top](#top)

In [53]:
# Create the model
bgr = BaggingRegressor(n_estimators = 10, 
                       random_state = 0)

# Fit the model
bgr.fit(airbnb_xtrain, airbnb_ytrain)

BaggingRegressor(random_state=0)

### 2.2.6 AdaBoostRegressor <a id = "2.2.6"></a>

[Back to top](#top)

In [54]:
# Create the model
adbr = AdaBoostRegressor(dtr, 
                         n_estimators = 10, 
                         learning_rate = 0.1, 
                         random_state = 0)

# Fit the model
adbr.fit(airbnb_xtrain, airbnb_ytrain)

AdaBoostRegressor(base_estimator=DecisionTreeRegressor(max_depth=2,
                                                       random_state=0),
                  learning_rate=0.1, n_estimators=10, random_state=0)

### 2.2.7 XGBRegressor <a id = "2.2.7"></a>

[Back to top](#top)

In [55]:
# Create the model
xgbr = XGBRegressor(n_estimators = 20, 
                    learning_rate = 0.1, 
                    eval_metric = 'logloss', 
                    random_state = 0)

# Fit the model
xgbr.fit(airbnb_xtrain, airbnb_ytrain)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
             eval_metric='logloss', gamma=0, gpu_id=-1, importance_type=None,
             interaction_constraints='', learning_rate=0.1, max_delta_step=0,
             max_depth=6, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=20, n_jobs=12,
             num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,
             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
             validate_parameters=1, verbosity=None)

### 2.2.8 SVR <a id = "2.2.8"></a>

[Back to top](#top)

In [56]:
# Create the model
svr = SVR(C = 0.8, kernel = 'rbf')

# Fit the model
svr.fit(airbnb_xtrain, airbnb_ytrain)

SVR(C=0.8)

### 2.2.9 VotingRegressor <a id = "2.2.9"></a>

[Back to top](#top)

In [57]:
# Create the model
vr = VotingRegressor(estimators = [('dtr', dtr), ('mlpr', mlpr), ('rfr', rfr), 
                                    ('bgr', bgr), ('adbr', adbr), ('xgbr', xgbr), ('svr', svr)])

# Fit the model
vr.fit(airbnb_xtrain, airbnb_ytrain)

VotingRegressor(estimators=[('dtr',
                             DecisionTreeRegressor(max_depth=2,
                                                   random_state=0)),
                            ('mlpr',
                             MLPRegressor(hidden_layer_sizes=(10,),
                                          max_iter=2000, random_state=0)),
                            ('rfr',
                             RandomForestRegressor(max_depth=4, n_estimators=10,
                                                   random_state=0)),
                            ('bgr', BaggingRegressor(random_state=0)),
                            ('adbr',
                             AdaBoostRegressor(base_estimator=DecisionTreeRegressor(max_depth...
                                          gpu_id=-1, importance_type=None,
                                          interaction_constraints='',
                                          learning_rate=0.1, max_delta_step=0,
                                    

## 2.3 Evaluate and Improve the Model(s) <a id = "2.3"></a>

[Back to top](#top)

### 2.3.1 LinearRegressor <a id = "2.3.1"></a>

[Back to top](#top)

In [58]:
# Accuracy
train_acc = lm.score(airbnb_xtrain, airbnb_ytrain)
test_acc = lm.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(lm.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(lm.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.057399293521222305
Testing accuracy: -1145.7546145421225

the training mean squared error is: 104565.41385857265
the testing mean squared error is: 145477666.13345268


### 2.3.2 DecisionTreeRegressor <a id = "2.3.2"></a>

[Back to top](#top)

In [59]:
# Accuracy
train_acc = dtr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = dtr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(dtr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(dtr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.16363243485508594
Testing accuracy: -0.021580178126967775

the training mean squared error is: 92780.66522352397
the testing mean squared error is: 129597.99611658695


In [60]:
# Create a GridSearch model to find the best parameters

param_grid = {'criterion': ['friedman_mse', 'poisson'],
              "splitter": ['best', 'random'],
              'max_depth': [2, 3, 4, 5],
              'min_samples_split': [2,3,4],
              'min_samples_leaf': [1,2,3]
             }

gs = GridSearchCV(estimator = dtr, param_grid = param_grid, scoring = 'r2', cv = 5, n_jobs = -1, error_score = 'raise')
gs = gs.fit(airbnb_xtrain, airbnb_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.06047986105585419
{'criterion': 'poisson', 'max_depth': 5, 'min_samples_leaf': 1, 'min_samples_split': 3, 'splitter': 'random'}


In [61]:
# Create the model
dtr = DecisionTreeRegressor(criterion = 'poisson', 
                            splitter = 'random', 
                            max_depth = 5, 
                            min_samples_leaf = 1,
                            min_samples_split = 3,
                            random_state = 0)

# Fit the model
dtr.fit(airbnb_xtrain, airbnb_ytrain)

DecisionTreeRegressor(criterion='poisson', max_depth=5, min_samples_split=3,
                      random_state=0, splitter='random')

In [62]:
# Accuracy
train_acc = dtr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = dtr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(dtr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(dtr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.17071716165148576
Testing accuracy: 0.0175466341511199

the training mean squared error is: 91994.73605494962
the testing mean squared error is: 124634.35589114024


### 2.3.3 MLPRegressor <a id = "2.3.3"></a>

[Back to top](#top)

In [63]:
# Accuracy
train_acc = mlpr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = mlpr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(mlpr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(mlpr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.06806817809812504
Testing accuracy: -2372.5903084999773

the training mean squared error is: 103381.88373439033
the testing mean squared error is: 301114444.24004537


In [64]:
# Create a GridSearch model to find the best parameters

param_grid = {"activation": ['logistic', 'relu', 'identity','tanh'],
              'hidden_layer_sizes': [(90,), (100,), (110,)],
              'max_iter': [100, 200, 300],
              'solver': ['sgd', 'adam'],
              'learning_rate':['constant', 'invscaling', 'adaptive']
             }

gs = GridSearchCV(mlpr, param_grid = param_grid, scoring = 'r2', cv = 5, n_jobs = -1)
gs = gs.fit(airbnb_xtrain, airbnb_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.08297406351508259
{'activation': 'relu', 'hidden_layer_sizes': (90,), 'learning_rate': 'adaptive', 'max_iter': 100, 'solver': 'sgd'}


In [65]:
# Create the model
mlpr = MLPRegressor(activation = 'relu', 
                    hidden_layer_sizes = (90,), 
                    learning_rate = 'adaptive',
                    max_iter = 100, 
                    solver = 'sgd', 
                    random_state = 0)

# Fit the model
mlpr.fit(airbnb_xtrain, airbnb_ytrain)

MLPRegressor(hidden_layer_sizes=(90,), learning_rate='adaptive', max_iter=100,
             random_state=0, solver='sgd')

In [66]:
# Accuracy
train_acc = mlpr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = mlpr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(mlpr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(mlpr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.17077675575449436
Testing accuracy: -15877063.637383914

the training mean squared error is: 91988.12510929491
the testing mean squared error is: 2014169621997.8398


### 2.3.4 RandomForestRegressor <a id = "2.3.4"></a>

[Back to top](#top)

In [67]:
# Accuracy
train_acc = rfr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = rfr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(rfr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(rfr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.2880554769480188
Testing accuracy: -1.8895565241443633

the training mean squared error is: 78978.05845634663
the testing mean squared error is: 366570.0874123432


In [68]:
# Create a GridSearch model to find the best parameters

param_grid = {'criterion': ['friedman_mse', 'absolute_error', 'poisson'],
              'n_estimators': [90, 100, 200, 300],
              "max_depth": [1, 2, 3, 4, 5], 
              "min_samples_leaf" : [1, 2, 3, 4, 5]
             }

gs = GridSearchCV(rfr, param_grid = param_grid, scoring = 'r2', cv = 5, n_jobs = -1)
gs = gs.fit(airbnb_xtrain, airbnb_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.09011808002126782
{'criterion': 'friedman_mse', 'max_depth': 5, 'min_samples_leaf': 4, 'n_estimators': 90}


In [69]:
# Create the model
rfr = RandomForestRegressor(criterion = 'friedman_mse', 
                            max_depth = 5,
                            min_samples_leaf = 4, 
                            n_estimators = 90, 
                            random_state = 0)

# Fit the model
rfr.fit(airbnb_xtrain, airbnb_ytrain)

RandomForestRegressor(criterion='friedman_mse', max_depth=5, min_samples_leaf=4,
                      n_estimators=90, random_state=0)

In [70]:
# Accuracy
train_acc = rfr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = rfr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(rfr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(rfr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.2239200577669188
Testing accuracy: -0.30442129085139213

the training mean squared error is: 86092.78540654943
the testing mean squared error is: 165479.31234932548


### 2.3.5 BaggingRegressor <a id = "2.3.5"></a>

[Back to top](#top)

In [71]:
# Accuracy
train_acc = bgr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = bgr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(bgr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(bgr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.769126940854773
Testing accuracy: -1.9107598079363632

the training mean squared error is: 25611.41405091756
the testing mean squared error is: 369259.9429414243


In [72]:
# Create a GridSearch model to find the best parameters

param_grid = {'max_samples': [1, 2, 3, 4, 5],
              'max_features': [1, 2, 3, 4, 5],
              "n_estimators": [5, 10, 15, 20, 25]
             }

gs = GridSearchCV(estimator = bgr, param_grid = param_grid, scoring = 'r2', cv = 5, n_jobs = -1)
gs = gs.fit(airbnb_xtrain, airbnb_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.030682993925259062
{'max_features': 4, 'max_samples': 4, 'n_estimators': 5}


In [73]:
# Create the model
bgr = BaggingRegressor(max_features = 4, 
                       max_samples = 4, 
                       n_estimators = 5, 
                       random_state = 0)

# Fit the model
bgr.fit(airbnb_xtrain, airbnb_ytrain)

BaggingRegressor(max_features=4, max_samples=4, n_estimators=5, random_state=0)

In [74]:
# Accuracy
train_acc = bgr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = bgr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(bgr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(bgr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.022273973107930334
Testing accuracy: -0.0001434429748456978

the training mean squared error is: 108461.9669172389
the testing mean squared error is: 126878.5248714707


### 2.3.6 AdaBoostRegressor <a id = "2.3.6"></a>

[Back to top](#top)

In [75]:
# Accuracy
train_acc = adbr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = adbr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(adbr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(adbr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.07912478090447972
Testing accuracy: -1.1596883787934331e-05

the training mean squared error is: 102155.34290923514
the testing mean squared error is: 126861.79883315992


In [76]:
# Create a GridSearch model to find the best parameters

param_grid = {'learning_rate': [0.1, 0.5, 1, 1.5, 2],
              "n_estimators": [25, 50, 75, 100, 125],
              'loss': ["linear", "square", 'exponential']
             }

gs = GridSearchCV(estimator = adbr, param_grid = param_grid, scoring = 'r2', cv = 5, n_jobs = -1)
gs = gs.fit(airbnb_xtrain, airbnb_ytrain)
print(gs.best_score_)
print(gs.best_params_)

-0.21006618021213946
{'learning_rate': 0.1, 'loss': 'exponential', 'n_estimators': 25}


In [77]:
# Create the model
adbr = AdaBoostRegressor(dtr, learning_rate = 0.1, 
                         loss = 'exponential', 
                         n_estimators = 25, 
                         random_state = 0)

# Fit the model
adbr.fit(airbnb_xtrain, airbnb_ytrain)

AdaBoostRegressor(base_estimator=DecisionTreeRegressor(criterion='poisson',
                                                       max_depth=5,
                                                       min_samples_split=3,
                                                       random_state=0,
                                                       splitter='random'),
                  learning_rate=0.1, loss='exponential', n_estimators=25,
                  random_state=0)

In [78]:
# Accuracy
train_acc = adbr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = adbr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(adbr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(adbr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.28552646344773724
Testing accuracy: 0.003399460909406238

the training mean squared error is: 79258.6092150011
the testing mean squared error is: 126429.07092388671


### 2.3.7 XGBRegressor <a id = "2.3.7"></a>

[Back to top](#top)

In [79]:
# Accuracy
train_acc = xgbr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = xgbr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(xgbr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(xgbr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.5375175453802377
Testing accuracy: -0.051553837574127126

the training mean squared error is: 51304.51201368031
the testing mean squared error is: 133400.46437488362


In [80]:
# Create a GridSearch model to find the best parameters

param_grid = {'learning_rate': [0.1, 0.3, 0.5, 1, 1.5],
              'max_depth':[1,4,6,8,10],
              'sampling_method':['uniform','subsample','gradient_based'],
              "n_estimators": [50, 100, 200, 300, 400],
              'eval_metric': ["logloss", "mae"]
             }

gs = GridSearchCV(estimator = xgbr, param_grid = param_grid, scoring = 'r2', cv = 5, n_jobs = -1)
gs = gs.fit(airbnb_xtrain, airbnb_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.042783151473505976
{'eval_metric': 'logloss', 'learning_rate': 0.1, 'max_depth': 1, 'n_estimators': 50, 'sampling_method': 'uniform'}


In [81]:
# Create the model
xgbr = XGBRegressor(n_estimators = 50, 
                    learning_rate = 0.1, 
                    eval_metric = 'logloss', 
                    max_depth = 1,
                    sampling_method = 'uniform',
                    random_state = 0)

# Fit the model
xgbr.fit(airbnb_xtrain, airbnb_ytrain)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
             eval_metric='logloss', gamma=0, gpu_id=-1, importance_type=None,
             interaction_constraints='', learning_rate=0.1, max_delta_step=0,
             max_depth=1, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=50, n_jobs=12,
             num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,
             reg_lambda=1, sampling_method='uniform', scale_pos_weight=1,
             subsample=1, tree_method='exact', validate_parameters=1,
             verbosity=None)

In [82]:
# Accuracy
train_acc = xgbr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = xgbr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(xgbr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(xgbr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.09262478979739786
Testing accuracy: -0.06458255917599587

the training mean squared error is: 100657.7480026328
the testing mean squared error is: 135053.29226614017


### 2.3.8 SVR <a id = "2.3.8"></a>

[Back to top](#top)

In [83]:
# Accuracy
train_acc = svr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = svr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(svr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(svr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.019479039485357785
Testing accuracy: -0.045289786278011945

the training mean squared error is: 108772.01696169865
the testing mean squared error is: 132605.80477505032


In [84]:
# Create a GridSearch model to find the best parameters

param_grid = {'C': [0.5, 0.75, 1, 1.25, 1.5],
              "kernel": ['linear', 'poly', 'rbf', 'sigmoid'],
              'gamma': ['scale', 'auto'],
              'degree': [1,2,3,4,5]
             }

gs = GridSearchCV(estimator = svr, param_grid = param_grid, scoring = 'r2', cv = 5, n_jobs = -1)
gs = gs.fit(airbnb_xtrain, airbnb_ytrain)
print(gs.best_score_)
print(gs.best_params_)

0.0326425334921711
{'C': 1.5, 'degree': 1, 'gamma': 'scale', 'kernel': 'linear'}


In [86]:
# Create the model
svr = SVR(C = 1.5, 
          degree = 1,
          gamma = 'scale', 
          kernel = 'linear')

# Fit the model
svr.fit(airbnb_xtrain, airbnb_ytrain)

SVR(C=1.5, degree=1, kernel='linear')

In [87]:
# Accuracy
train_acc = svr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = svr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(svr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(svr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.024363139335722317
Testing accuracy: -264.08256148502517

the training mean squared error is: 108230.20968458788
the testing mean squared error is: 33628460.603942424


### 2.3.9 VotingRegressor <a id = "2.3.9"></a>

[Back to top](#top)

In [88]:
# Accuracy
train_acc = vr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = vr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(vr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(vr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.3713045142942796
Testing accuracy: -52.75304308905891

the training mean squared error is: 69743.0027391085
the testing mean squared error is: 6819128.658391783


In [89]:
# Create the model
vr = VotingRegressor(estimators = [('dtr', dtr), ('mlpr', mlpr), ('rfr', rfr), 
                                    ('bgr', bgr), ('adbr', adbr), ('xgbr', xgbr), ('svr', svr)])

# Fit the model
vr.fit(airbnb_xtrain, airbnb_ytrain)

VotingRegressor(estimators=[('dtr',
                             DecisionTreeRegressor(criterion='poisson',
                                                   max_depth=5,
                                                   min_samples_split=3,
                                                   random_state=0,
                                                   splitter='random')),
                            ('mlpr',
                             MLPRegressor(hidden_layer_sizes=(90,),
                                          learning_rate='adaptive',
                                          max_iter=100, random_state=0,
                                          solver='sgd')),
                            ('rfr',
                             RandomForestRegressor(criterion='friedman_mse',
                                                   max_depth=5,
                                                   min_samples_leaf=4,
                                                   n_estim...
    

In [90]:
# Accuracy
train_acc = vr.score(airbnb_xtrain, airbnb_ytrain)
test_acc = vr.score(airbnb_xtest, airbnb_ytest)

print(f'Training accuracy: {train_acc}')
print(f'Testing accuracy: {test_acc}')
print()

# MSE
train_mse = mean_squared_error(vr.predict(airbnb_xtrain), airbnb_ytrain)
test_mse = mean_squared_error(vr.predict(airbnb_xtest), airbnb_ytest)

print(f'the training mean squared error is: {train_mse}')
print(f'the testing mean squared error is: {test_mse}')

Training accuracy: 0.18007681875870396
Testing accuracy: -326669.5552903266

the training mean squared error is: 90956.44230844197
the testing mean squared error is: 41441533677.30801
