<a href="https://colab.research.google.com/github/AhmedMinerva/Practical_DataScience/blob/master/Bayesian_Optimization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Activity 2: Bayesian Optimization

## Part 1: Bayesian Optimization on King's County Houses

You will find, in the following cells, that the code to (a) create an XGBoost model to predict the prices of houses of the dataset you used for pre-class work and (b) an implementation of the BayesianOptimization algorithm to optimize hyperparameters.

### Run the cells below, and check to see if you understand each step!


In [1]:
!pip install bayesian-optimization #we have to instal the bayesian-optimization package

Collecting bayesian-optimization
  Downloading https://files.pythonhosted.org/packages/bb/7a/fd8059a3881d3ab37ac8f72f56b73937a14e8bb14a9733e68cc8b17dbe3c/bayesian-optimization-1.2.0.tar.gz
Building wheels for collected packages: bayesian-optimization
  Building wheel for bayesian-optimization (setup.py) ... [?25l[?25hdone
  Created wheel for bayesian-optimization: filename=bayesian_optimization-1.2.0-cp36-none-any.whl size=11685 sha256=1b1c431a38888a956c40e84ad67e8836d9208abc29cd3b8553888d3cc51b865a
  Stored in directory: /root/.cache/pip/wheels/5a/56/ae/e0e3c1fc1954dc3ec712e2df547235ed072b448094d8f94aec
Successfully built bayesian-optimization
Installing collected packages: bayesian-optimization
Successfully installed bayesian-optimization-1.2.0


In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from bayes_opt import BayesianOptimization

data = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRaQcPHF6GaPB5bHKF1Q6ndb4l2Gv4CIXmFqSTeZi1c7OqKuYM9HHHoBIotsxQiM7Yjr9K0Qb6lhnDI/pub?output=csv") #Importing the data
data = data.drop(labels=["id", "date"], axis=1) #Dropping these columns
X_train, X_test, y_train, y_test = train_test_split(data.loc[:, data.columns != 'price'], data['price'], test_size=0.25, random_state=42) #Splitting test and train!


In [3]:
import xgboost as xgb
from sklearn.metrics import mean_squared_error

data_dmatrix = xgb.DMatrix(data=X_train, label=y_train) #converting our test and train to a data matrix, do you know why??
params = {"objective":'reg:squarederror', "colsample_bytree":0.2, "learning_rate":0.08, "max_depth":4, "alpha":16} #defining our parameters through a dictionary
xg_m = xgb.train(params=params, dtrain=data_dmatrix, num_boost_round=70) #training the model!

data_dmatrix_test = xgb.DMatrix(data=X_test)#preparing the test data for prediction
preds = xg_m.predict(data_dmatrix_test) #predicting the test data

np.sqrt(mean_squared_error(preds, y_test)) #How well did we do?

168732.47527618211

In [4]:
#for the cross validation process, I wrote a function that takes in certian parameters
#and outputs the minimum rmse after cross validation!
def fcv(max_depth, gamma, min_child_weight, subsample, colsample_bytree, learning_rate, num_boost_round):
  params = {"objective":'reg:squarederror', "max_depth":int(max_depth), 'gamma':gamma, 'min_child_weight':min_child_weight, 'subsample':subsample, "colsample_bytree":colsample_bytree, "learning_rate":learning_rate}
  cv_results=xgb.cv(dtrain=data_dmatrix, params=params, nfold=10, num_boost_round=int(num_boost_round), early_stopping_rounds=10, metrics='rmse', as_pandas=True)
  return -cv_results['test-rmse-mean'].min() #any idea why I used a negative sign? Hint: it matters for the Bayesian Optimization function

fcv(4, 3, 0.5, 0.2, 0.5, 0.5, 70) #random settings to see if it works

-156913.3515625

In [5]:
dict_cv = {
          'max_depth': (2, 12),
          'gamma': (0.001, 10.0),
          'min_child_weight': (0, 20),
          'subsample': (0.4, 1.0),
          'colsample_bytree': (0.4, 1.0),
          'learning_rate': (0.1, 1.0),
          'num_boost_round' :(30, 100)
          }
#Creating a dictionary with the ranges for each parameter in a tuple! Note that the 
#Dictionary's keys HAVE to match the keys for the cross validation (fcv) function


XGB_BO = BayesianOptimization(fcv, dict_cv) #Creating the optimizer
XGB_BO.maximize(init_points=10, n_iter=30, acq='ei', xi=0.0) #Running optimization!

|   iter    |  target   | colsam... |   gamma   | learni... | max_depth | min_ch... | num_bo... | subsample |
-------------------------------------------------------------------------------------------------------------
| [0m 1       [0m | [0m-1.526e+0[0m | [0m 0.7778  [0m | [0m 8.058   [0m | [0m 0.8737  [0m | [0m 6.738   [0m | [0m 10.9    [0m | [0m 63.36   [0m | [0m 0.8741  [0m |
| [95m 2       [0m | [95m-1.465e+0[0m | [95m 0.809   [0m | [95m 3.321   [0m | [95m 0.8349  [0m | [95m 3.41    [0m | [95m 12.53   [0m | [95m 65.71   [0m | [95m 0.7268  [0m |
| [95m 3       [0m | [95m-1.327e+0[0m | [95m 0.5519  [0m | [95m 7.0     [0m | [95m 0.4686  [0m | [95m 8.362   [0m | [95m 9.9     [0m | [95m 44.76   [0m | [95m 0.9664  [0m |
| [0m 4       [0m | [0m-1.471e+0[0m | [0m 0.9208  [0m | [0m 5.058   [0m | [0m 0.2331  [0m | [0m 2.538   [0m | [0m 16.03   [0m | [0m 57.08   [0m | [0m 0.6146  [0m |
| [95m 5       [0m | [95m-1.19e

## Part 2: Building Bayesian Optimization Yourself!

Now, it is time for you to try your hand at Bayesian Optimization! To do this, we will work with a new dataset, but that has the exact same premise: we have around 84 variables that predict the sale price of a house in Ames, Iowa.

Your task is, again, to use XGBoost to predict sale prices. You will notice that the data processing and model training steps are already done for you below: what you have to do next is to implement the cross validation and bayesian optimization steps of the code above!

In [6]:
data2 = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vTp6iMy3iSfMS-3BzqX5wtu4AFSlZZVn8QFNeScSrJmsGLC29tIqarJ3I5ODb-SusrCNZ0hoNnHTqp-/pub?output=csv")
data2.head()

#delete columns with many missing data
data2.drop(['PoolQC','MiscFeature','Alley','Fence','FireplaceQu','LotFrontage'], axis = 1,inplace=True)

#Drop rows with missing data 
data2.dropna(inplace=True)
data2.shape

data2 = pd.get_dummies(data2)
X_train, X_test, y_train, y_test = train_test_split(data2.loc[:, data2.columns != 'SalePrice'], data2['SalePrice'], test_size=0.25, random_state=42)

In [7]:
import xgboost as xgb
from sklearn.metrics import mean_squared_error

data_dmatrix2 = xgb.DMatrix(data=X_train, label=y_train) #converting our test and train to a data matrix, do you know why??
params = {"objective":'reg:squarederror', "colsample_bytree":0.2, "learning_rate":0.08, "max_depth":4, "alpha":16} #defining our parameters through a dictionary
xg_m2 = xgb.train(params=params, dtrain=data_dmatrix2, num_boost_round=70) #training the model!

data_dmatrix_test2 = xgb.DMatrix(data=X_test, label=y_test)#preparing the test data for prediction
preds = xg_m2.predict(data_dmatrix_test2) #predicting the test data

print(np.sqrt(mean_squared_error(preds, y_test)))

32389.117783905178


In [10]:
#Define here your cross validation function!!
def fcv(max_depth, gamma, min_child_weight, subsample, colsample_bytree, learning_rate, num_boost_round):
  params = {"objective":'reg:squarederror', "max_depth":int(max_depth), 'gamma':gamma, 'min_child_weight':min_child_weight, 'subsample':subsample, "colsample_bytree":colsample_bytree, "learning_rate":learning_rate}
  cv_results=xgb.cv(dtrain=data_dmatrix, params=params, nfold=10, num_boost_round=int(num_boost_round), early_stopping_rounds=10, metrics='rmse', as_pandas=True)
  return -cv_results['test-rmse-mean'].min()

In [11]:
#Now, create a dictionary for the boundaries we should search within, and call
#the bayesian optimization function!
dict_cv = {
          'max_depth': (2, 12),
          'gamma': (0.001, 10.0),
          'min_child_weight': (0, 20),
          'subsample': (0.4, 1.0),
          'colsample_bytree': (0.4, 1.0),
          'learning_rate': (0.1, 1.0),
          'num_boost_round' :(30, 100)
          }



XGB_BO = BayesianOptimization(fcv, dict_cv) #Creating the optimizer
XGB_BO.maximize(init_points=10, n_iter=20, acq='ei', xi=0.0) #Running optimization!

|   iter    |  target   | colsam... |   gamma   | learni... | max_depth | min_ch... | num_bo... | subsample |
-------------------------------------------------------------------------------------------------------------
| [0m 1       [0m | [0m-1.408e+0[0m | [0m 0.438   [0m | [0m 7.126   [0m | [0m 0.5951  [0m | [0m 5.619   [0m | [0m 19.13   [0m | [0m 65.65   [0m | [0m 0.5187  [0m |
| [95m 2       [0m | [95m-1.305e+0[0m | [95m 0.6899  [0m | [95m 5.592   [0m | [95m 0.4893  [0m | [95m 6.146   [0m | [95m 10.84   [0m | [95m 58.25   [0m | [95m 0.9691  [0m |
| [0m 3       [0m | [0m-1.387e+0[0m | [0m 0.7276  [0m | [0m 9.468   [0m | [0m 0.5308  [0m | [0m 9.927   [0m | [0m 12.61   [0m | [0m 42.92   [0m | [0m 0.6859  [0m |
| [0m 4       [0m | [0m-1.694e+0[0m | [0m 0.5077  [0m | [0m 2.746   [0m | [0m 0.9959  [0m | [0m 6.519   [0m | [0m 19.42   [0m | [0m 58.82   [0m | [0m 0.577   [0m |
| [0m 5       [0m | [0m-1.425e+0[0m | 