# Introduction

* XGBoost stands for Extreme Gradient Boosting.

* XGBoost is a library designed and optimized for boosting trees algorithms. Gradient boosting trees model is originally proposed by Friedman et al. The underlying algorithm of XGBoost is similar, specifically it is an extension of the classic gbm algorithm. By employing multi-threads and imposing regularization, XGBoost is able to utilize more computational power and get more accurate prediction.

* XGB is builted around gradient boosting (GBM) as core
 
**An Efficient Algorithm the reasons are:**
1. The computational part is implemented in C++.
2. It can be multi-threaded on a single machine.
3. It preprocesses the data before the training algorithm.
![](https://raw.githubusercontent.com/szilard/benchm-ml/145e029e1d092539566ece52d528196b4117b411/2-rf/x-plot-time.png)

# features of XGBoost

**Model Features**
* Gradient Boosting algorithm also called gradient boosting machine including the learning rate.
* Stochastic Gradient Boosting with sub-sampling at the row, column and column per split levels.
* Regularized Gradient Boosting with both L1 and L2 regularization.

**System Features**
* Parallelization of tree construction using all of your CPU cores during training.
* Distributed Computing for training very large models using a cluster of machines.
* Out-of-Core Computing for very large datasets that don’t fit into memory.
* Cache Optimization of data structures and algorithm to make best use of hardware.

**Algorithm Features**
* Sparse Aware implementation with automatic handling of missing data values.
* Block Structure to support the parallelization of tree construction.
* Continued Training so that you can further boost an already fitted model on new data.


# ensemble methods
Ensemble methods are techniques that aim at improving the accuracy of results in models by combining multiple models instead of using a single model. The combined models increase the accuracy of the results significantly. This has boosted the popularity of ensemble methods in machine learning.
![](https://cdn.corporatefinanceinstitute.com/assets/ensemble-methods.png)

# 1. Bagging
The idea behind bagging is combining the results of multiple models (for instance, all decision trees) to get a generalized result.

# 2. Boosting
* Boosting is an ensemble technique that learns from previous predictor mistakes to make better predictions in the future. The technique combines several weak base learners to form one strong learner, thus significantly improving the predictability of models. 
* Boosting works by arranging weak learners in a sequence, such that weak learners learn from the next learner in the sequence to create better predictive models.

# Gradient descent vs Gradient boosting vs Gradient boosted trees
* **Gradient descent** is an algorithm for finding a set of parameters that optimizes a loss function. 

* **Gradient boosting** is a technique for building an ensemble of weak models such that the predictions of the ensemble minimize a loss function. 

* **Gradient boosted trees**  application of gradient boosting on Decision trees, here individual weak models are trees.

* **XGBoost** is one of the implementations of gradient boosted trees and it comes with a lot of hyperparameters which makes learning more easier. 



# IMPLEMENTATION OF XGB-

In [None]:
import pandas as pd
import numpy as np
import xgboost as xgb

train=pd.read_csv('../input/tabular-playground-series-aug-2021/train.csv')
test=pd.read_csv('../input/tabular-playground-series-aug-2021/test.csv')


X=train.drop(['loss','id'],axis=1)
y=train['loss']
test=test.drop('id',axis=1)

In [None]:
params = {'n_estimators':5000,
          'learning_rate': 0.02,
          'subsample': 0.5,
          'colsample_bytree': 0.7,
          'max_depth': 6,
          'booster': 'gbtree',
          'tree_method': 'gpu_hist',
          'reg_lambda': 60,
          'reg_alpha': 60,
           'n_jobs': 4}

In [None]:

from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import StratifiedKFold, StratifiedShuffleSplit

splits = 12
stf = StratifiedKFold(n_splits=splits, shuffle=True)
oof= np.zeros((X.shape[0],))
prediction = 0
model_fi = 0
total_mean_rmse = 0

for num, (train_id, valid_id) in enumerate(stf.split(X, y)):
    X_train, X_valid = X.loc[train_id], X.loc[valid_id]
    y_train, y_valid = y.loc[train_id], y.loc[valid_id]
    
    model = XGBRegressor(**params)
    model.fit(X_train, y_train,
              eval_set=[(X_train, y_train), (X_valid, y_valid)],
              eval_metric="rmse",verbose=0)
    
    prediction += model.predict(test) / splits
    oof[valid_id] = model.predict(X_valid)
    oof[oof < 0] = 0

    fold_rmse = np.sqrt(mean_squared_error(y_valid, oof[valid_id]))
    print(f"Fold {num} RMSE: {fold_rmse}")

      

In [None]:
sub=pd.read_csv("../input/tabular-playground-series-aug-2021/sample_submission.csv")
sub["loss"] = prediction

sub.to_csv('submission.csv', index=False)
