## Boosting
Boosting is an ensemble approach(meaning it involves several trees) that starts from a weaker decision and keeps on building the models such that the final prediction is the weighted sum of all the weaker decision-makers.
The weights are assigned based on the performance of an individual tree.

<img src= "boosting_basic.PNG" alt='boosting' style="width: 400px;">


Ensemble parameters are calculated in **stagewise way** which means that while calculating the subsequent weight, the learning from the previous tree is considered as well.


### XGBoost
XGBoost improves the gradient boosting method even further.
> **XGBoost** (*extreme gradient boosting*) regularises data better than normal gradient boosted Trees.

It was developed by Tianqi Chen in C++ but now has interfaces for Python, R, Julia.

XGBoost's objective function is the sum of loss function evaluated over all the predictions and a regularisation function for all predictors ($j$ trees). In the formula $f_j$ means a prediction coming from the $j^th$ tree.

$$
obj(\theta) = \sum_{i}^{n} l(y_i - \hat{y_i}) +  \sum_{j=1}^{j} \Omega (f_j)
$$

Loss function depends on the task being performed (classification, regression, etc.) and a regularization term is described by the following equation:

$$
\Omega(f) = \gamma T + \frac{1}{2} \lambda \sum_{j=1}^{T}w_j^2
$$

First part ($\gamma T$) is responsible for controlling the overall number of created leaves, and the second term ($\frac{1}{2} \lambda \sum_{j=1}^{T}w_j^2$) watches over the scores.


In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('ML_df.csv')
df.head()

Unnamed: 0,carat,depth,table,x,y,z,Good,Ideal,Premium,Very Good,...,I,J,IF,SI1,SI2,VS1,VS2,VVS1,VVS2,price
0,-1.198168,-0.174092,-1.099672,-1.587837,-1.536196,-1.571129,0,1,0,0,...,0,0,0,0,1,0,0,0,0,326
1,-1.240361,-1.360738,1.585529,-1.641325,-1.658774,-1.741175,0,0,1,0,...,0,0,0,1,0,0,0,0,0,326
2,-1.198168,-3.385019,3.375663,-1.498691,-1.457395,-1.741175,1,0,0,0,...,0,0,0,0,0,1,0,0,0,327
3,-1.071587,0.454133,0.242928,-1.364971,-1.317305,-1.28772,0,0,1,0,...,1,0,0,0,0,0,1,0,0,334
4,-1.029394,1.082358,0.242928,-1.240167,-1.212238,-1.117674,1,0,0,0,...,0,1,0,0,1,0,0,0,0,335


In [4]:
X = df.drop('price',axis=1)
y = df.price

In [5]:
X.head()


Unnamed: 0,carat,depth,table,x,y,z,Good,Ideal,Premium,Very Good,...,H,I,J,IF,SI1,SI2,VS1,VS2,VVS1,VVS2
0,-1.198168,-0.174092,-1.099672,-1.587837,-1.536196,-1.571129,0,1,0,0,...,0,0,0,0,0,1,0,0,0,0
1,-1.240361,-1.360738,1.585529,-1.641325,-1.658774,-1.741175,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0
2,-1.198168,-3.385019,3.375663,-1.498691,-1.457395,-1.741175,1,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,-1.071587,0.454133,0.242928,-1.364971,-1.317305,-1.28772,0,0,1,0,...,0,1,0,0,0,0,0,1,0,0
4,-1.029394,1.082358,0.242928,-1.240167,-1.212238,-1.117674,1,0,0,0,...,0,0,1,0,0,1,0,0,0,0


In [6]:
y.head()

0    326
1    326
2    327
3    334
4    335
Name: price, dtype: int64

In [7]:
from sklearn.model_selection import train_test_split

In [8]:
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=33)

In [9]:
from xgboost import XGBRegressor

In [10]:
model = XGBRegressor()
model.fit(x_train,y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.300000012, max_delta_step=0, max_depth=6,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=100, n_jobs=4, num_parallel_tree=1, random_state=0,
             reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
             tree_method='exact', validate_parameters=1, verbosity=None)

In [11]:
y_pred = model.predict(x_test)
y_pred

array([2148.815  , 1796.8528 ,  402.72238, ..., 7257.8267 ,  598.7939 ,
       1944.1868 ], dtype=float32)

In [12]:
model.score(x_train,y_train)

0.9888763243538651

In [13]:
model.score(x_test,y_test)

0.9777781332299885

In [16]:
def adj_r2(x,y):
    r2 = model.score(x,y)
    n = x.shape[0]
    p = x.shape[1]
    adjusted_r2 = 1-(((1-r2)*(n-1))/(n-p-1))
    return adjusted_r2

In [17]:
adj_r2(x_train,y_train)

0.9888699964238148

In [18]:
adj_r2(x_test,y_test)

0.9777401640645691

### Hyperparameter Tuning

In [19]:
from sklearn.model_selection import RandomizedSearchCV

In [20]:
param_grid = {
    'booster':['gbtree','gblinear'],
    'eta':[0.01,0.1,0.3,0.5,1],
    'min_child_weight': range(1,5,1),
    'max_depth': range(2,10,1),
    'eval_metric':['rmse','mae','logloss','error','mlogloss','auc']
}

In [21]:
rand_search = RandomizedSearchCV(estimator=model,param_distributions=param_grid,n_iter=100,cv=5,verbose=2,n_jobs=-1,random_state=33)
rand_search.fit(x_train,y_train)

Fitting 5 folds for each of 100 candidates, totalling 500 fits


RandomizedSearchCV(cv=5,
                   estimator=XGBRegressor(base_score=0.5, booster='gbtree',
                                          colsample_bylevel=1,
                                          colsample_bynode=1,
                                          colsample_bytree=1, gamma=0,
                                          gpu_id=-1, importance_type='gain',
                                          interaction_constraints='',
                                          learning_rate=0.300000012,
                                          max_delta_step=0, max_depth=6,
                                          min_child_weight=1, missing=nan,
                                          monotone_constraints='()',
                                          n_estimators=100, n_jobs=4,
                                          num_par...
                                          reg_alpha=0, reg_lambda=1,
                                          scale_pos_weight=1, subsample=1,
   

In [22]:
rand_search.best_params_

{'min_child_weight': 2,
 'max_depth': 7,
 'eval_metric': 'rmse',
 'eta': 0.1,
 'booster': 'gbtree'}

In [23]:
best_random_grid = rand_search.best_estimator_
best_random_grid

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, eta=0.1,
             eval_metric='rmse', gamma=0, gpu_id=-1, importance_type='gain',
             interaction_constraints='', learning_rate=0.300000012,
             max_delta_step=0, max_depth=7, min_child_weight=2, missing=nan,
             monotone_constraints='()', n_estimators=100, n_jobs=4,
             num_parallel_tree=1, random_state=0, reg_alpha=0, reg_lambda=1,
             scale_pos_weight=1, subsample=1, tree_method='exact',
             validate_parameters=1, verbosity=None)

In [24]:
rand_search.score(x_train,y_train)

0.9920881409096172

In [25]:
rand_search.score(x_test,y_test)

0.9767264403465765

In [26]:
import pickle


In [27]:
filename = 'xgboost.pickle'
pickle.dump(rand_search,open(filename,'wb'))