# XGBoost(Extreme Gradient Boosting)

XGBoost (Extreme Gradient Boosting) is an optimized distributed gradient boosting library. Yes, it uses gradient boosting (GBM) framework at core. Yet, does better than GBM framework alone. XGBoost was created by Tianqi Chen, PhD Student, University of Washington. It is used for supervised ML problems.  Let's look at what makes it so good:

#### 1.Parallel Computing:
It is enabled with parallel processing (using OpenMP); i.e., when you run xgboost, by default, it would use all the cores of your laptop/machine.

#### 2.Regularization:
I believe this is the biggest advantage of xgboost. GBM has no provision for regularization. Regularization is a technique used to avoid overfitting in linear and tree-based models.

#### 3.Enabled Cross Validation:
In R, we usually use external packages such as caret and mlr to obtain CV results. But, xgboost is enabled with internal CV function (we'll see below).

#### 4.Missing Values:
XGBoost is designed to handle missing values internally. The missing values are treated in such a manner that if there exists any trend in missing values, it is captured by the model.

#### 5.Flexibility:
In addition to regression, classification, and ranking problems, it supports user-defined objective functions also. An objective function is used to measure the performance of the model given a certain set of parameters. Furthermore, it supports user defined evaluation metrics as well.

#### 6.Availability:
Currently, it is available for programming languages such as R, Python, Java, Julia, and Scala.

#### 7.Save and Reload:
XGBoost gives us a feature to save our data matrix and model and reload it later. Suppose, we have a large data set, we can simply save the model and use it in future instead of wasting time redoing the computation.

#### 8.Tree Pruning:
Unlike GBM, where tree pruning stops once a negative loss is encountered, XGBoost grows the tree upto max_depth and then prune backward until the improvement in loss function is below a threshold.

### Implementation of XGBoost is given below on the data set
### >> Import all useful libraries

In [38]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 
import pandas as pd
import xgboost as xgb

# from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn import metrics

In [39]:
df=pd.read_csv("new.csv")

### >>check data after reading file

In [40]:
print(df.head())

    age  sex  chest  resting_blood_pressure  serum_cholestoral  \
0  70.0  1.0    4.0                   130.0              322.0   
1  67.0  0.0    3.0                   115.0              564.0   
2  57.0  1.0    2.0                   124.0              261.0   
3  64.0  1.0    4.0                   128.0              263.0   
4  74.0  0.0    2.0                   120.0              269.0   

   fasting_blood_sugar  resting_electrocardiographic_results  \
0                  0.0                                   2.0   
1                  0.0                                   2.0   
2                  0.0                                   0.0   
3                  0.0                                   0.0   
4                  0.0                                   2.0   

   maximum_heart_rate_achieved  exercise_induced_angina  oldpeak  slope  \
0                        109.0                      0.0      2.4    2.0   
1                        160.0                      0.0      1.6    

### >>Take input features and target feature in different variables respectively

In [44]:
x=df.iloc[:,3:13].values
y = df.iloc[:, 13].values   #target feature



[[130. 322.   0. ...   2.   3.   3.]
 [115. 564.   0. ...   2.   0.   7.]
 [124. 261.   0. ...   1.   0.   7.]
 ...
 [140. 294.   0. ...   2.   0.   3.]
 [140. 192.   0. ...   2.   0.   6.]
 [160. 286.   0. ...   2.   3.   3.]]


### >>Take training and testing data sets

In [45]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)

## Parameters of XGBoost
#### Booster[default=gbtree]
Sets the booster type (gbtree, gblinear or dart) to use. For classification problems, you can use gbtree, dart. For regression, you can use any.
#### nthread[default=maximum cores available]
Activates parallel computation. Generally, people don't change it as using maximum cores leads to the fastest computation.
#### silent[default=0]
If you set it to 1, your R console will get flooded with running messages. Better not to change it.
#### nrounds[default=100]
It controls the maximum number of iterations. For classification, it is similar to the number of trees to grow.
Should be tuned using CV
#### eta[default=0.3][range: (0,1)]
It controls the learning rate, i.e., the rate at which our model learns patterns in data. After every round, it shrinks the feature weights to reach the best optimum.
Lower eta leads to slower computation. It must be supported by increase in nrounds.
Typically, it lies between 0.01 - 0.3
#### gamma[default=0][range: (0,Inf)]
It controls regularization (or prevents overfitting). The optimal value of gamma depends on the data set and other parameter values.
Higher the value, higher the regularization. Regularization means penalizing large coefficients which don't improve the model's performance. default = 0 means no regularization.
#### Tune trick:
Start with 0 and check CV error rate. If you see train error >>> test error, bring gamma into action. Higher the gamma, lower the difference in train and test CV. If you have no clue what value to use, use gamma=5 and see the performance. Remember that gamma brings improvement when you want to use shallow (low max_depth) trees.
#### max_depth[default=6][range: (0,Inf)]
It controls the depth of the tree.
Larger the depth, more complex the model; higher chances of overfitting. There is no standard value for max_depth. Larger data sets require deep trees to learn the rules from data.
Should be tuned using CV
#### min_child_weight[default=1][range:(0,Inf)]
In regression, it refers to the minimum number of instances required in a child node. In classification, if the leaf node has a minimum sum of instance weight (calculated by second order partial derivative) lower than min_child_weight, the tree splitting stops.
In simple words, it blocks the potential feature interactions to prevent overfitting. Should be tuned using CV.
#### subsample[default=1][range: (0,1)]
It controls the number of samples (observations) supplied to a tree.
Typically, its values lie between (0.5-0.8)
#### colsample_bytree[default=1][range: (0,1)]
It control the number of features (variables) supplied to a tree
Typically, its values lie between (0.5,0.9)
#### lambda[default=0]
It controls L2 regularization (equivalent to Ridge regression) on weights. It is used to avoid overfitting.
#### alpha[default=1]
It controls L1 regularization (equivalent to Lasso regression) on weights. In addition to shrinkage, enabling alpha also results in feature selection. Hence, it's more useful on high dimensional data sets.


### >>Apply XGBClassifier now and check what happen


In [50]:
output=xgb.XGBClassifier(max_depth=7,
                           min_child_weight=1,
                           learning_rate=0.1,
                           n_estimators=500,
                           silent=True,
                           objective='binary:logistic',
                           gamma=0,
                           max_delta_step=0,
                           subsample=1,
                           reg_alpha=0,
                           reg_lambda=0,
                           scale_pos_weight=1,
                           seed=1,
                           missing=None)
output.fit(x_train,y_train, eval_metric='auc')
y_pred=output.predict(x_test)
print(y_pred)


[1 1 1 0 0 0 1 1 1 0 0 0 1 0 0 1 1 1 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 1 1
 1 1 1 0 0 0 1 1 1 1 0 0 1 1 1 1 1]


### >>Lets check accuracy of model

In [52]:
from sklearn import metrics

print("accuracy:", metrics.accuracy_score(y_test,y_pred))

accuracy: 0.7037037037037037


## Research Infinite Solutions LLP

by [Research Infinite Solutions](http://www.researchinfinitesolutions.com/)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.