## XGBoost

- Boosting is a strong alternative to bagging. 

- Instead of aggregating predictions (like in bagging), boosters turn weak learners into strong learners by focusing on where the individual models (usually Decision Trees) went wrong. 

- In Gradient Boosting, individual models train upon the residuals, the difference between the prediction and the actual results. Instead of aggregating trees, gradient boosted trees learns from errors during each boosting round.

###  Importing libraries and dataset

In [1]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

In [30]:
dataset = pd.read_csv('./resources/Datasets/Social_Network_Ads.csv')
dataset.head()

Unnamed: 0,Age,EstimatedSalary,Purchased
0,19,19000,0
1,35,20000,0
2,26,43000,0
3,27,57000,0
4,19,76000,0


In [31]:
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
X[:5]

array([[   19, 19000],
       [   35, 20000],
       [   26, 43000],
       [   27, 57000],
       [   19, 76000]], dtype=int64)

### Splitting dataset

In [32]:
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In [33]:
X_test[:5]

array([[   30, 87000],
       [   38, 50000],
       [   35, 75000],
       [   30, 79000],
       [   35, 50000]], dtype=int64)

### Feature scaling

Scale all the features to standard normal distribution

In [34]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

X_train[:5]

array([[ 1.92295008,  2.14601566],
       [ 2.02016082,  0.3787193 ],
       [-1.3822153 , -0.4324987 ],
       [-1.18779381, -1.01194013],
       [ 1.92295008, -0.92502392]])

### Training model

[XGBoost](https://xgboost.readthedocs.io/en/latest/)

 - Uses the one-vs-rest (OvR) scheme if the ‘multi_class’ option is set to 'ovr'.
 - Uses the cross-entropy loss if the ‘multi_class’ option is set to 'multinomial'.

In [35]:
from xgboost import XGBClassifier
classifier = XGBClassifier() # use_label_encoder=False
classifier.fit(X_train, y_train)





XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=8, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [36]:
y_predicted = classifier.predict(X_test)

### Validation

In [37]:
from sklearn.metrics import accuracy_score, precision_recall_curve, roc_curve, roc_auc_score, confusion_matrix, precision_score, recall_score, f1_score

In [38]:
confusion_matrix(y_test, y_predicted)

array([[54,  4],
       [ 3, 19]], dtype=int64)

In [39]:
accuracy_score(y_test, y_predicted)

0.9125