## Boosting
Boosting is an ensemble approach(meaning it involves several trees) that starts from a weaker decision and keeps on building the models such that the final prediction is the weighted sum of all the weaker decision-makers.
The weights are assigned based on the performance of an individual tree.
The main idea of boosting is to train weak learners sequentially, each trying to correct its predecessor

<img src= "Image/boosting_basic.PNG">


Ensemble parameters are calculated in **stagewise way** which means that while calculating the subsequent weight, the learning from the previous tree is considered as well.


### Weak classifier - why tree?
First what is a weak classifier?
**Weak classifier** -  *slightly better* than random guessing.

Any algorithm could have been used as a base for the boosting technique, but the reason for choosing trees are:

#### Pro's
- computational scalability,
- handles missing values,
- robust to outliers,
- does not require feature scaling,
- can deal with irrelevant inputs,
- interpretable (if small),
- handles mixed predictors as well (quantitive and qualitative)

#### Con's
- inability to extract a linear combination of features
- high variance leading to a small computational power

And that’s where boosting comes into the picture. It minimises the variance by taking into consideration the results from various trees.


## Adaboost Explanation
- Adaboost combines multiple weak learners into a single strong learner. 
- It will create different decision trees with a single split (one depth), called decision stumps. 
- The number of decision stumps it will make will depend on the number of features in the dataset. Suppose there are M features then, Adaboost will create M  decision stumps. 
 1. We will assign an equal sample weight to each observation. 
 2. We will create M decision stumps, for M number of features.
 3. Out of all M decision stumps, I first have to select one best decision tree model. For selecting it, we will either calculate the Entropy or Gini coefficient. The model with lesser entropy will be selected (means model that is less disordered).
 4. Now, after the first decision stump is built, an algorithm would be train on randomly select record from Dataset and then Prediction is made on whole dataset and check how many observations the model has misclassified.
 5. Suppose out of N observations, The first decision stump (Weak/base learner) has misclassified T number of observations.
 6. For this, we will calculate the total error (TE), 
                          Total Error (T.E) = T/N.
 7. Now we will calculate the performance of the first decision stump.
                          Performance of stump = 1/2*loge((1-TE)/TE)
 8. Now we will update the weights assigned before. To do this, we will first update the weights of those observations, which we have misclassified. The weights of wrongly classified observations will be increased and the weights of correctly classified weights will be reduced.
 9. By using this formula: 
                           Incorrect Classified record weigths:- [old weight * (e^performance of stump)]
                           Correct Classified Record Weigths :- [old weight * (e^-(performance of stump))]
 10. Now respectively for each observation, we will add and subtract the updated weights to get the final weights. 
 11. But these weights are not normalized that is their sum is not equal to one. To do this, we will sum them and divide each final weight with that sum. 
 12. After this, we have to make our second decision stump. For this, we will make a class intervals for the normalized weights.
 13. After that, we want to make a second weak model. But to do that, we need a sample dataset on which the second weak model can be run. For making it, we will run N number of iterations. On each iteration, it will calculate a random number ranging between 0-1 and this random will be compared with class intervals we created and on which class interval it lies, that row will be selected for sample data set. So new sample data set would also be of N observation. 
 14. This whole process will continue for M decision stumps. The final sequential tree would be considered as the final tree.

**Example**

For understanding this algorithm, we'll use the following simple dataset for heart  patient prediction.

In [1]:
import pandas as pd
heart_data= pd.read_csv('heart_disease.csv')
heart_data

Unnamed: 0,Is Chest Pain Present,Are any arteries blocked,Weight of the person,Is Heart Patient
0,YES,YES,205,YES
1,NO,YES,180,YES
2,YES,NO,210,YES
3,YES,YES,167,YES
4,NO,YES,156,NO
5,NO,YES,125,NO
6,YES,NO,168,NO
7,YES,YES,172,NO


- There are a total of 8 rows in our dataset. Hence, we’ll initialize the sample weights($w=\frac {1}{N}$) as 1/8 in the beginning. And, at the beginning, all the samples are equally important.

<img src='Image/sw1.PNG' width=”500”>

- We’ll consider the individual columns to create weak decision-makers as shown below and then try to figure out what are the correct and incorrect predictions based on that column.

<img src='Image/cp1.PNG' width=”200”>

<img src='Image/ba1.PNG' width=”200”>

<img src='Image/bw1.PNG' width=”200”>

- We’ll now calculate the Gini index of the individual stumps using the formula

     G.I= $\sum (weight of the decision)*(1-(p^2+(1-p)^2))$

        G.I for chest pain tree= 0.47
        G.I for blocked arteries tree= 0.5
        G.I for body-weight tree= 0.2
        
        And, we select the tree with the lowest Gini Index. This will be the first decision-maker for our model.

- Now, we’ll calculate the contribution of this tree(stump) to our final decision using the formula:

Contribution= $½(log (1-total error)/total error)$

    As this stump classified only one data incorrectly out of the 8, hence the total error is 1/8.

    Putting this into the formula we get contribution= 0.97
    
- We’ll now calculate the new weights using the formula:

1. Increase the sample weight for incorrectly classified datapoints
    New weight= old weight*e^ contribution= 1/8* e^0.97=0.33
1. Decrease the sample weight for incorrectly classified datapoints
   New weight= old weight*e^- contribution= 1/8* e^-0.97=0.05

- Populate the new weights as shown below:

     <img src='Image/nsw1.PNG' width=”300”>

- Normalize the sample weights: If we add all the new sample weights, we get 0.68. Hence, for normalization we divide all the sample weights by 0.68 and then create normalized sample weights as shown below: 

     <img src='Image/normalized_wt.PNG' width=”200”>

       These new normalized weights will act as the sample weights for the next iteration.

- Then we create new trees which consider the dataset which was prepared using the new sample weights.

- Suppose, m trees(stumps) are classifying a person as a heart patient and n trees(stumps) are classifying a person as a healthy one, then the contribution of m and n trees are added separately and whichever has the higher value, the person is classified as that. 

_For example, if the contribution of m trees is 1.2 and the contribution of n trees is 0.5 then the final result will go in the favour of m trees and the person will be classified as a heart patient._


### Gradient Boosted Trees
**Gradient Boosting** is similar to AdaBoost in that they both use an ensemble of decision trees to predict a target label. However, unlike AdaBoost, the Gradient Boost trees have a depth larger than 1. In practice, you’ll typically see Gradient Boost being used with a maximum number of leaves of between 8 and 32


Before we dive into the code, it’s important that we grasp how the Gradient Boost algorithm is implemented under the hood. Suppose, we were trying to predict the price of a house given their age, square footage and location.

<img src='Image/new_1.png' width=”300”>

# Step 1: Calculate the average of the target label
When tackling regression problems, we start with a leaf that is the average value of the variable we want to predict. This leaf will be used as a baseline to approach the correct solution in the proceeding steps.

<img src='Image/new_2.png' width=”300”>

<img src='Image/new_3.png' width=”300”>

# Step 2: Calculate the residuals
For every sample, we calculate the residual with the proceeding formula.

### residual = actual value – predicted value
In our example, the predicted value is the equal to the mean calculated in the previous step and the actual value can be found in the price column of each sample. After computing the residuals, we get the following table.

<img src='Image/new_4.png' width=”300”>

# Step 3: Construct a decision tree
Next, we build a tree with the goal of predicting the residuals. In other words, every leaf will contain a prediction as to the value of the residual (not the desired label).

<img src='Image/new_5.png' width=”300”>

In the event there are more residuals than leaves, some residuals will end up inside the same leaf. When this happens, we compute their average and place that inside the leaf.

<img src='Image/new_6.png' width=”300”>

Thus, the tree becomes:

<img src='Image/new_7.png' width=”300”>

# Step 4: Predict the target label using all of the trees within the ensemble
Each sample passes through the decision nodes of the newly formed tree until it reaches a given lead. The residual in said leaf is used to predict the house price.

It’s been shown through experimentation that taking small incremental steps towards the solution achieves a comparable bias with a lower overall vatiance (a lower variance leads to better accuracy on samples outside of the training data). Thus, to prevent overfitting, we introduce a hyperparameter called learning rate. When we make a prediction, each residual is multiplied by the learning rate. This forces us to use more decision trees, each taking a small step towards the final solution.

<img src='Image/new_8.png' width=”300”>

# Step 5: Compute the new residuals
We calculate a new set of residuals by subtracting the actual house prices from the predictions made in the previous step. The residuals will then be used for the leaves of the next decision tree as described in step 3.

<img src='Image/new_9.png' width=”300”>

<img src='Image/new_10.png' width=”300”>

# Step 6: Repeat steps 3 to 5 until the number of iterations matches the number specified by the hyperparameter (i.e. number of estimators)

<img src='Image/new_11.png' width=”300”>

# Step 7: Once trained, use all of the trees in the ensemble to make a final prediction as to the value of the target variable
The final prediction will be equal to the mean we computed in the first step, plus all of the residuals predicted by the trees that make up the forest multiplied by the learning rate.

<img src='Image/new_12.png' width=”300”>

#### Example

For understanding this algorithm we'll use the following simple dataset for weight prediction

In [2]:
from sklearn.ensemble import GradientBoostingRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
from sklearn.metrics import mean_absolute_error

In [3]:
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.Series(boston.target)

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [5]:
regressor = GradientBoostingRegressor(
    max_depth=2,
    n_estimators=3,
    learning_rate=1.0
)
regressor.fit(X_train, y_train)

GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',
                          init=None, learning_rate=1.0, loss='ls', max_depth=2,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=3,
                          n_iter_no_change=None, presort='deprecated',
                          random_state=None, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=False)

In [6]:

errors = [mean_squared_error(y_test, y_pred) for y_pred in regressor.staged_predict(X_test)]
best_n_estimators = np.argmin(errors)

In [9]:
errors

[40.59759956956771, 44.36872713905481, 41.9601355054927]

In [13]:
best_n_estimators=2

In [14]:

best_regressor = GradientBoostingRegressor(
    max_depth=2,
    n_estimators=best_n_estimators,
    learning_rate=1.0
)
best_regressor.fit(X_train, y_train)

GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',
                          init=None, learning_rate=1.0, loss='ls', max_depth=2,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=2,
                          n_iter_no_change=None, presort='deprecated',
                          random_state=None, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=False)

In [15]:
y_pred = best_regressor.predict(X_test)

mean_absolute_error(y_test, y_pred)

4.260291655079165

# GBM Parameters
The overall parameters of this ensemble model can be divided into 3 categories:

- Tree-Specific Parameters: These affect each individual tree in the model.
- Boosting Parameters: These affect the boosting operation in the model.
- Miscellaneous Parameters: Other parameters for overall functioning.

# Tree-Specific Parameters
The parameters used for defining a tree are explained below
### min_samples_split
- Defines the minimum number of samples (or observations) which are required in a node to be considered for splitting.
- Used to control over-fitting. Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree.
- Too high values can lead to under-fitting hence, it should be tuned using CV.

### min_samples_leaf
- Defines the minimum samples (or observations) required in a terminal node or leaf.
- Used to control over-fitting similar to min_samples_split.
- Generally lower values should be chosen for imbalanced class problems because the regions in which the minority class will be in majority will be very small.

### max_depth
- The maximum depth of a tree.
- Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample.
- Should be tuned using CV.

### max_leaf_nodes
- The maximum number of terminal nodes or leaves in a tree.
- Can be defined in place of max_depth. Since binary trees are created, a depth of ‘n’ would produce a maximum of 2^n leaves.
- If this is defined, GBM will ignore max_depth.

### max_features
- The number of features to consider while searching for a best split. These will be randomly selected.
- As a thumb-rule, square root of the total number of features works great but we should check upto 30-40% of the total number of features.
- Higher values can lead to over-fitting but depends on case to case.

# Boosting Parameters
### learning_rate
- This determines the impact of each tree on the final outcome. GBM works by starting with an initial estimate which is updated using the output of each tree. The learning parameter controls the magnitude of this change in the estimates.
- Lower values are generally preferred as they make the model robust to the specific characteristics of tree and thus allowing it to generalize well.
- Lower values would require higher number of trees to model all the relations and will be computationally expensive.

### n_estimators
- The number of sequential trees to be modeled
- Though GBM is fairly robust at higher number of trees but it can still overfit at a point. Hence, this should be tuned using CV for a particular learning rate.

### subsample
- The fraction of observations to be selected for each tree. Selection is done by random sampling.
- Values slightly less than 1 make the model robust by reducing the variance.
- Typical values ~0.8 generally work fine but can be fine-tuned further.

Apart from these, there are certain **miscellaneous parameters** which affect overall functionality:

### loss
- It refers to the loss function to be minimized in each split.
- It can have various values for classification and regression case. Generally the default values work fine. Other values should be chosen only if you understand their impact on the model.

### init
- This affects initialization of the output.
- This can be used if we have made another model whose outcome is to be used as the initial estimates for GBM.

### random_state
- The random number seed so that same random numbers are generated every time.
- This is important for parameter tuning. If we don’t fix the random number, then we’ll have different outcomes for subsequent runs on the same parameters and it becomes difficult to compare models.
- It can potentially result in overfitting to a particular random sample selected. We can try running models for different random samples, which is computationally expensive and generally not used.

### verbose
- The type of output to be printed when the model fits. The different values can be:
 > - 0: no output generated (default)
 > - 1: output generated for trees in certain intervals
 > > 1: output generated for all trees

### warm_start
- This parameter has an interesting application and can help a lot if used judicially.
- Using this, we can fit additional trees on previous fits of a model. It can save a lot of time and you should explore this option for advanced applications

### presort 
- Select whether to presort data for faster splits.
- It makes the selection automatically by default but it can be changed if needed.

# XGBoost
XGBoost is a decision-tree-based ensemble Machine Learning algorithm that uses a gradient boosting framework. However, we can say that XGBoost is the nest version of Gradient boosting and improves the base GBM framework through following feature :

1) **Regularization**: Regularization is a technique used to avoid overfitting . I believe this is the biggest advantage of xgboost. GBM has no provision for regularization.

2) **Parallel Computing**: It is enabled with parallel processing (using OpenMP); i.e., when you run xgboost, by default, it would use all the cores of your laptop/machine.

3) **Enabled Cross Validation**: In R, we usually use external packages such as caret and mlr to obtain CV results. But, xgboost is enabled with internal CV function (we'll see below).

4) **Missing Values**: XGBoost is designed to handle missing values internally. The missing values are treated in such a manner that if there exists any trend in missing values, it is captured by the XGBoost.

5) **Flexibility**: In addition to regression, classification, and ranking problems, it supports user-defined objective functions also. An objective function is used to measure the performance of the model given a certain set of parameters. Furthermore, it supports user defined evaluation metrics as well.

6) **Availability**: Currently, it is available for programming languages such as R, Python, Java, Julia, and Scala.

7) **Save and Reload**: XGBoost gives us a feature to save our data matrix and model and reload it later. Suppose, we have a large data set, we can simply save the model and use it in future instead of wasting time redoing the computation.

8) **Tree Pruning**: Unlike GBM, where tree pruning stops once a negative loss is encountered, XGBoost grows the tree upto max_depth and then prune backward until the improvement in loss function is below a threshold.

# Understanding XGBoost Tuning Parameters
Every parameter has a significant role to play in the model's performance. Before hypertuning, let's first understand about these parameters and their importance. In this article, I've only explained the most frequently used and tunable parameters. To look at all the parameters, you can refer to its official documentation.

XGBoost parameters can be divided into three categories (as suggested by its authors):

- **General Parameters**: Controls the booster type in the model which eventually drives overall functioning
- **Booster Parameters**: Controls the performance of the selected booster
- **Learning Task Parameters**: Sets and evaluates the learning process of the booster from the given data

# 1. General Parameters

 **1) Booster[default=gbtree]**
   - Sets the booster type (gbtree, gblinear or dart) to use. For classification problems, you can use gbtree, dart. For    regression, you can use any.
    - gbtree parameter; i.e., a tree is grown one after other and attempts to reduce misclassification rate in subsequent iterations. In this, the next tree is built by giving a higher weight to misclassified points by the previous tree.
    -  In gblinear, it builds generalized linear model and optimizes it using regularization (L1,L2) and gradient descent.
  
 
 **2) nthread[default=maximum cores available]**
   - Activates parallel computation. Generally, people don't change it as using maximum cores leads to the fastest computation.
   

**3) silent[default=0]**
   - If you set it to 1, your R console will get flooded with running messages. Better not to change it.

# 2. Booster Parameters
As mentioned above, parameters for tree and linear boosters are different. Let's understand each one of them:

## Parameters for Tree Booster

**1. nrounds[default=100]**
   - It controls the maximum number of iterations. For classification, it is similar to the number of trees to grow.
     Should be tuned using CV

**2. eta[default=0.3][range: (0,1)]**
   - It controls the learning rate, i.e., the rate at which our model learns patterns in data. After every round, it shrinks the feature weights to reach the best optimum.
   - Lower eta leads to slower computation. It must be supported by increase in nrounds.
   - Typically, it lies between 0.01 - 0.3
   
**3. gamma[default=0][range: (0,Inf)]**
   - It controls regularization (or prevents overfitting). The optimal value of gamma depends on the data set and other parameter values.
   - Higher the value, higher the regularization. Regularization means penalizing large coefficients which don't improve the model's performance. default = 0 means no regularization.
     -Tune trick: Start with 0 and check CV error rate. If you see train error >>> test error, bring gamma into action. Higher the gamma, lower the difference in train and test CV. If you have no clue what value to use, use gamma=5 and see the performance. Remember that gamma brings improvement when you want to use shallow (low max_depth) trees.
     
**4. max_depth[default=6][range: (0,Inf)]**
   - It controls the depth of the tree.
   - Larger the depth, more complex the model; higher chances of overfitting. There is no standard value for max_depth. Larger data sets require deep trees to learn the rules from data.
   - Should be tuned using CV
   
**5. min_child_weight[default=1][range:(0,Inf)]**
   - In regression, it refers to the minimum number of instances required in a child node. In classification, if the leaf node has a minimum sum of instance weight (calculated by second order partial derivative) lower than min_child_weight, the tree splitting stops.
   - In simple words, it blocks the potential feature interactions to prevent overfitting. Should be tuned using CV.
   
**6. subsample[default=1][range: (0,1)]**
   - The fraction of observations to be selected for each tree. Selection is done by random sampling.
   - Values slightly less than 1 make the model robust by reducing the variance.
   - Typically, its values lie between (0.5-0.8)
   
**7. colsample_bytree[default=1][range: (0,1)]**
   - It control the number of features (variables) supplied to a tree
   - Typically, its values lie between (0.5,0.9)
   
**8. lambda[default=0]**
   - It controls L2 regularization (equivalent to Ridge regression) on weights. It is used to avoid overfitting.
   
**9. alpha[default=1]**
   - It controls L1 regularization (equivalent to Lasso regression) on weights. In addition to shrinkage, enabling alpha also results in feature selection. Hence, it's more useful on high dimensional data sets.
   
## Parameters for Linear Booster
Using linear booster has relatively lesser parameters to tune, hence it computes much faster than gbtree booster.

**1. nrounds[default=100]**
   - It controls the maximum number of iterations (steps) required for gradient descent to converge.
   - Should be tuned using CV
   
**2. lambda[default=0]**
   - It enables Ridge Regression.
   
**3. alpha[default=1]**
   - It enables Lasso Regression. 

# 3. Learning Task Parameters
These parameters specify methods for the loss function and model evaluation. In addition to the parameters listed below, you are free to use a customized objective / evaluation function.

**1. Objective[default=reg:linear]**
   - reg:linear - for linear regression
   - binary:logistic - logistic regression for binary classification. It returns class probabilities
   - multi:softmax - multiclassification using softmax objective. It returns predicted class labels. It requires setting num_class parameter denoting number of unique prediction classes.
   - multi:softprob - multiclassification using softmax objective. It returns predicted class probabilities.
   
**2. eval_metric [no default, depends on objective selected]**
   - These metrics are used to evaluate a model's accuracy on validation data. For regression, default metric is RMSE. For classification, default metric is error.
   - Available error functions are as follows:
       -  mae - Mean Absolute Error (used in regression)
       - Logloss - Negative loglikelihood (used in classification)
       - AUC - Area under curve (used in classification)
       - RMSE - Root mean square error (used in regression)
       - error - Binary classification error rate [#wrong cases/#all cases]
       - mlogloss - multiclass logloss (used in classification)

### Python Implementation

**Problem Statement**:
The Pima Indians Diabetes Dataset involves predicting the onset of diabetes within 5 years in Pima Indians given medical details.
It is a binary (2-class) classification problem. The number of observations for each class is not balanced. There are 768 observations with 8 input variables and 1 output variable. Missing values are believed to be encoded with zero values. The variable names are as follows:
1.	Number of times pregnant.
2.	Plasma glucose concentration 2 hours in an oral glucose tolerance test.
3.	Diastolic blood pressure (mm Hg).
4.	Triceps skinfold thickness (mm).
5.	2-Hour serum insulin (mu U/ml).
6.	Body mass index (weight in kg/(height in m)^2).
7.	Diabetes pedigree function.
8.	Age (years).
9.	Is Diabetic (0 or 1).

In [2]:
!pip install xgboost

Collecting xgboost
  Downloading xgboost-1.1.1-py3-none-win_amd64.whl (54.4 MB)
Installing collected packages: xgboost
Successfully installed xgboost-1.1.1


In [1]:
import pandas as pd
import numpy as np
import xgboost as xgb
import pickle
from sklearn import datasets
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

In [4]:
# reading the features and the labels
data= pd.read_csv('pima-indians-diabetes.csv')

In [5]:
data.head()

Unnamed: 0,Number of times pregnant,Plasma glucose concentration,Diastolic blood pressure (mm Hg),Triceps skinfold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age,Is Diabetic
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [6]:
data.columns

Index(['Number of times pregnant', 'Plasma glucose concentration',
       'Diastolic blood pressure (mm Hg)', 'Triceps skinfold thickness (mm)',
       '2-Hour serum insulin (mu U/ml)',
       'Body mass index (weight in kg/(height in m)^2)',
       'Diabetes pedigree function', 'Age', 'Is Diabetic'],
      dtype='object')

In [7]:
cols = ['Plasma glucose concentration',
       'Diastolic blood pressure (mm Hg)', 'Triceps skinfold thickness (mm)',
       '2-Hour serum insulin (mu U/ml)',
       'Body mass index (weight in kg/(height in m)^2)',
       'Diabetes pedigree function', 'Age']

In [8]:
# as mentioned in the data description, the missing values have been replaced by zeroes. So, we are replacing zeroes with nan
for col in cols:
    data[col]=data[col].replace(0, np.nan)

In [9]:
# checking for missing values
data.isna().sum()

Number of times pregnant                            0
Plasma glucose concentration                        5
Diastolic blood pressure (mm Hg)                   35
Triceps skinfold thickness (mm)                   227
2-Hour serum insulin (mu U/ml)                    374
Body mass index (weight in kg/(height in m)^2)     11
Diabetes pedigree function                          0
Age                                                 0
Is Diabetic                                         0
dtype: int64

In [10]:
# imputing the missing values
data['Plasma glucose concentration']=data['Plasma glucose concentration'].fillna(data['Plasma glucose concentration'].mode()[0])
data['Diastolic blood pressure (mm Hg)']=data['Diastolic blood pressure (mm Hg)'].fillna(data['Diastolic blood pressure (mm Hg)'].mode()[0])
data['Triceps skinfold thickness (mm)']=data['Triceps skinfold thickness (mm)'].fillna(data['Triceps skinfold thickness (mm)'].mean())
data['2-Hour serum insulin (mu U/ml)']=data['2-Hour serum insulin (mu U/ml)'].fillna(data['2-Hour serum insulin (mu U/ml)'].mean())
data['Body mass index (weight in kg/(height in m)^2)']=data['Body mass index (weight in kg/(height in m)^2)'].fillna(data['Body mass index (weight in kg/(height in m)^2)'].mean())


In [11]:
# checking for missing values after imputation
data.isna().sum()

Number of times pregnant                          0
Plasma glucose concentration                      0
Diastolic blood pressure (mm Hg)                  0
Triceps skinfold thickness (mm)                   0
2-Hour serum insulin (mu U/ml)                    0
Body mass index (weight in kg/(height in m)^2)    0
Diabetes pedigree function                        0
Age                                               0
Is Diabetic                                       0
dtype: int64

In [12]:
#Separating the feature and the Label columns 
x=data.drop(labels='Is Diabetic', axis=1)
y= data['Is Diabetic']

In [13]:
x.head()

Unnamed: 0,Number of times pregnant,Plasma glucose concentration,Diastolic blood pressure (mm Hg),Triceps skinfold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age
0,6,148.0,72.0,35.0,155.548223,33.6,0.627,50
1,1,85.0,66.0,29.0,155.548223,26.6,0.351,31
2,8,183.0,64.0,29.15342,155.548223,23.3,0.672,32
3,1,89.0,66.0,23.0,94.0,28.1,0.167,21
4,0,137.0,40.0,35.0,168.0,43.1,2.288,33


In [14]:
# as the datapoints differ a lot in magnitude, we'll scale them
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaled_data=scaler.fit_transform(x)

In [15]:
from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y=train_test_split(scaled_data,y,test_size=0.3,random_state=42)

In [16]:
# fit model no training data
model = XGBClassifier(objective='binary:logistic')
model.fit(train_x, train_y)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=0, num_parallel_tree=1,
              objective='binary:logistic', random_state=0, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [17]:
# cheking training accuracy
y_pred = model.predict(train_x)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(train_y,predictions)
accuracy

1.0

In [18]:
# cheking initial test accuracy
y_pred = model.predict(test_x)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(test_y,predictions)
accuracy

0.7272727272727273

In [19]:
test_x[0]

array([ 0.63994726, -0.77251205, -1.18156252,  0.43784695,  0.40547846,
        0.22451019, -0.1264714 ,  0.83038113])

In [20]:
# Now to increae the accuracy of the model, we'll do hyperparameter tuning using grid search


In [21]:
from sklearn.model_selection import GridSearchCV

In [22]:
param_grid={
   
    ' learning_rate':[1,0.5,0.1,0.01,0.001],
    'max_depth': [3,5,10,20],
    'n_estimators':[10,50,100,200]
    
}

In [23]:
grid= GridSearchCV(XGBClassifier(objective='binary:logistic'),param_grid, verbose=3)

In [24]:
grid.fit(train_x,train_y)

Fitting 5 folds for each of 80 candidates, totalling 400 fits
[CV]  learning_rate=1, max_depth=3, n_estimators=10 ..................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=3, n_estimators=10, score=0.852, total=   0.0s
[CV]  learning_rate=1, max_depth=3, n_estimators=10 ..................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=3, n_estimators=10, score=0.750, total=   0.0s
[CV]  learning_rate=1, max_depth=3, n_estimators=10 ..

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s



[CV]  learning_rate=1, max_depth=3, n_estimators=50 ..................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=3, n_estimators=50, score=0.757, total=   0.0s
[CV]  learning_rate=1, max_depth=3, n_estimators=50 ..................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=3, n_estimators=50, score=0.692, total=   0.0s
[CV]  learning_rate=1, max_depth=3, n_estimators=50 ..................
Parameters: {  learning_rate } might not be 

[CV]   learning_rate=1, max_depth=5, n_estimators=10, score=0.766, total=   0.0s
[CV]  learning_rate=1, max_depth=5, n_estimators=50 ..................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=5, n_estimators=50, score=0.778, total=   0.0s
[CV]  learning_rate=1, max_depth=5, n_estimators=50 ..................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=5, n_estimators=50, score=0.759, total=   0.0s
[CV]  learning_rate=1, max_depth=5,

[CV]   learning_rate=1, max_depth=10, n_estimators=10, score=0.776, total=   0.0s
[CV]  learning_rate=1, max_depth=10, n_estimators=10 .................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=10, n_estimators=10, score=0.710, total=   0.0s
[CV]  learning_rate=1, max_depth=10, n_estimators=10 .................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=10, n_estimators=10, score=0.738, total=   0.0s
[CV]  learning_rate=1, max_depth

[CV]   learning_rate=1, max_depth=10, n_estimators=200, score=0.710, total=   0.1s
[CV]  learning_rate=1, max_depth=20, n_estimators=10 .................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=20, n_estimators=10, score=0.787, total=   0.0s
[CV]  learning_rate=1, max_depth=20, n_estimators=10 .................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=20, n_estimators=10, score=0.713, total=   0.0s
[CV]  learning_rate=1, max_dept

[CV]   learning_rate=1, max_depth=20, n_estimators=200, score=0.787, total=   0.2s
[CV]  learning_rate=1, max_depth=20, n_estimators=200 ................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=20, n_estimators=200, score=0.731, total=   0.2s
[CV]  learning_rate=1, max_depth=20, n_estimators=200 ................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=1, max_depth=20, n_estimators=200, score=0.766, total=   0.2s
[CV]  learning_rate=1, max_de

[CV]   learning_rate=0.5, max_depth=3, n_estimators=100, score=0.722, total=   0.1s
[CV]  learning_rate=0.5, max_depth=3, n_estimators=100 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.5, max_depth=3, n_estimators=100, score=0.738, total=   0.1s
[CV]  learning_rate=0.5, max_depth=3, n_estimators=100 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.5, max_depth=3, n_estimators=100, score=0.701, total=   0.1s
[CV]  learning_rate=0.5, m

[CV]   learning_rate=0.5, max_depth=5, n_estimators=50, score=0.710, total=   0.0s
[CV]  learning_rate=0.5, max_depth=5, n_estimators=50 ................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.5, max_depth=5, n_estimators=50, score=0.692, total=   0.0s
[CV]  learning_rate=0.5, max_depth=5, n_estimators=100 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.5, max_depth=5, n_estimators=100, score=0.778, total=   0.1s
[CV]  learning_rate=0.5, max

[CV]   learning_rate=0.5, max_depth=10, n_estimators=50, score=0.785, total=   0.1s
[CV]  learning_rate=0.5, max_depth=10, n_estimators=50 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.5, max_depth=10, n_estimators=50, score=0.738, total=   0.1s
[CV]  learning_rate=0.5, max_depth=10, n_estimators=50 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.5, max_depth=10, n_estimators=50, score=0.738, total=   0.1s
[CV]  learning_rate=0.5, m


[CV]   learning_rate=0.5, max_depth=20, n_estimators=50, score=0.731, total=   0.1s
[CV]  learning_rate=0.5, max_depth=20, n_estimators=50 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.5, max_depth=20, n_estimators=50, score=0.766, total=   0.1s
[CV]  learning_rate=0.5, max_depth=20, n_estimators=50 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.5, max_depth=20, n_estimators=50, score=0.710, total=   0.1s
[CV]  learning_rate=0.5, 


[CV]   learning_rate=0.1, max_depth=3, n_estimators=10, score=0.692, total=   0.0s
[CV]  learning_rate=0.1, max_depth=3, n_estimators=10 ................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.1, max_depth=3, n_estimators=10, score=0.748, total=   0.0s
[CV]  learning_rate=0.1, max_depth=3, n_estimators=50 ................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.1, max_depth=3, n_estimators=50, score=0.778, total=   0.0s
[CV]  learning_rate=0.1, max

[CV]   learning_rate=0.1, max_depth=3, n_estimators=200, score=0.710, total=   0.1s
[CV]  learning_rate=0.1, max_depth=3, n_estimators=200 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.1, max_depth=3, n_estimators=200, score=0.757, total=   0.1s
[CV]  learning_rate=0.1, max_depth=5, n_estimators=10 ................
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.1, max_depth=5, n_estimators=10, score=0.787, total=   0.0s
[CV]  learning_rate=0.1, ma

[CV]   learning_rate=0.1, max_depth=5, n_estimators=200, score=0.778, total=   0.1s
[CV]  learning_rate=0.1, max_depth=5, n_estimators=200 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.1, max_depth=5, n_estimators=200, score=0.750, total=   0.1s
[CV]  learning_rate=0.1, max_depth=5, n_estimators=200 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.1, max_depth=5, n_estimators=200, score=0.785, total=   0.1s
[CV]  learning_rate=0.1, m

[CV]   learning_rate=0.1, max_depth=10, n_estimators=100, score=0.796, total=   0.1s
[CV]  learning_rate=0.1, max_depth=10, n_estimators=100 ..............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.1, max_depth=10, n_estimators=100, score=0.731, total=   0.1s
[CV]  learning_rate=0.1, max_depth=10, n_estimators=100 ..............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.1, max_depth=10, n_estimators=100, score=0.757, total=   0.1s
[CV]  learning_rate=0.1


[CV]   learning_rate=0.1, max_depth=20, n_estimators=50, score=0.731, total=   0.1s
[CV]  learning_rate=0.1, max_depth=20, n_estimators=50 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.1, max_depth=20, n_estimators=50, score=0.766, total=   0.1s
[CV]  learning_rate=0.1, max_depth=20, n_estimators=50 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.1, max_depth=20, n_estimators=50, score=0.710, total=   0.1s
[CV]  learning_rate=0.1, 

[CV]   learning_rate=0.01, max_depth=3, n_estimators=10, score=0.794, total=   0.0s
[CV]  learning_rate=0.01, max_depth=3, n_estimators=10 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.01, max_depth=3, n_estimators=10, score=0.692, total=   0.0s
[CV]  learning_rate=0.01, max_depth=3, n_estimators=10 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.01, max_depth=3, n_estimators=10, score=0.748, total=   0.0s
[CV]  learning_rate=0.01, 

[CV]   learning_rate=0.01, max_depth=3, n_estimators=200, score=0.710, total=   0.1s
[CV]  learning_rate=0.01, max_depth=3, n_estimators=200 ..............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.01, max_depth=3, n_estimators=200, score=0.757, total=   0.1s
[CV]  learning_rate=0.01, max_depth=5, n_estimators=10 ...............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.01, max_depth=5, n_estimators=10, score=0.787, total=   0.0s
[CV]  learning_rate=0.01

[CV]   learning_rate=0.01, max_depth=5, n_estimators=200, score=0.778, total=   0.1s
[CV]  learning_rate=0.01, max_depth=5, n_estimators=200 ..............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.01, max_depth=5, n_estimators=200, score=0.750, total=   0.2s
[CV]  learning_rate=0.01, max_depth=5, n_estimators=200 ..............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.01, max_depth=5, n_estimators=200, score=0.785, total=   0.1s
[CV]  learning_rate=0.0


[CV]   learning_rate=0.01, max_depth=10, n_estimators=100, score=0.757, total=   0.1s
[CV]  learning_rate=0.01, max_depth=10, n_estimators=100 .............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.01, max_depth=10, n_estimators=100, score=0.738, total=   0.1s
[CV]  learning_rate=0.01, max_depth=10, n_estimators=100 .............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.01, max_depth=10, n_estimators=100, score=0.710, total=   0.1s
[CV]  learning_rate

[CV]   learning_rate=0.01, max_depth=20, n_estimators=50, score=0.710, total=   0.1s
[CV]  learning_rate=0.01, max_depth=20, n_estimators=50 ..............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.01, max_depth=20, n_estimators=50, score=0.720, total=   0.1s
[CV]  learning_rate=0.01, max_depth=20, n_estimators=100 .............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.01, max_depth=20, n_estimators=100, score=0.778, total=   0.1s
[CV]  learning_rate=0.

[CV]  learning_rate=0.001, max_depth=3, n_estimators=50 ..............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.001, max_depth=3, n_estimators=50, score=0.776, total=   0.0s
[CV]  learning_rate=0.001, max_depth=3, n_estimators=100 .............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.001, max_depth=3, n_estimators=100, score=0.759, total=   0.0s
[CV]  learning_rate=0.001, max_depth=3, n_estimators=100 .............
Parameters: {  learning_rate } might


[CV]   learning_rate=0.001, max_depth=5, n_estimators=50, score=0.785, total=   0.0s
[CV]  learning_rate=0.001, max_depth=5, n_estimators=50 ..............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.001, max_depth=5, n_estimators=50, score=0.710, total=   0.0s
[CV]  learning_rate=0.001, max_depth=5, n_estimators=50 ..............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.001, max_depth=5, n_estimators=50, score=0.692, total=   0.0s
[CV]  learning_rate=0.

[CV]   learning_rate=0.001, max_depth=10, n_estimators=50, score=0.713, total=   0.1s
[CV]  learning_rate=0.001, max_depth=10, n_estimators=50 .............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.001, max_depth=10, n_estimators=50, score=0.785, total=   0.1s
[CV]  learning_rate=0.001, max_depth=10, n_estimators=50 .............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.001, max_depth=10, n_estimators=50, score=0.738, total=   0.1s
[CV]  learning_rate=

[CV]   learning_rate=0.001, max_depth=20, n_estimators=50, score=0.731, total=   0.1s
[CV]  learning_rate=0.001, max_depth=20, n_estimators=50 .............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.001, max_depth=20, n_estimators=50, score=0.766, total=   0.1s
[CV]  learning_rate=0.001, max_depth=20, n_estimators=50 .............
Parameters: {  learning_rate } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]   learning_rate=0.001, max_depth=20, n_estimators=50, score=0.710, total=   0.1s
[CV]  learning_rate=

[Parallel(n_jobs=1)]: Done 400 out of 400 | elapsed:   27.9s finished


GridSearchCV(cv=None, error_score=nan,
             estimator=XGBClassifier(base_score=None, booster=None,
                                     colsample_bylevel=None,
                                     colsample_bynode=None,
                                     colsample_bytree=None, gamma=None,
                                     gpu_id=None, importance_type='gain',
                                     interaction_constraints=None,
                                     learning_rate=None, max_delta_step=None,
                                     max_depth=None, min_child_weight=None,
                                     missing=nan, monotone_constraints=None,
                                     n_es...
                                     random_state=None, reg_alpha=None,
                                     reg_lambda=None, scale_pos_weight=None,
                                     subsample=None, tree_method=None,
                                     validate_parameters=None, 

In [25]:
# To  find the parameters givingmaximum accuracy
grid.best_params_

{' learning_rate': 1, 'max_depth': 3, 'n_estimators': 10}

In [26]:
# Create new model using the same parameters
new_model=XGBClassifier(learning_rate= 1, max_depth= 5, n_estimators= 50)
new_model.fit(train_x, train_y)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=1, max_delta_step=0, max_depth=5,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=50, n_jobs=0, num_parallel_tree=1,
              objective='binary:logistic', random_state=0, reg_alpha=0,
              reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [27]:
y_pred_new = new_model.predict(test_x)
predictions_new = [round(value) for value in y_pred_new]
accuracy_new = accuracy_score(test_y,predictions_new)
accuracy_new

0.7445887445887446

In [28]:
# As we have increased the accuracy of the model, we'll save this model

In [29]:
filename = 'xgboost_model.pickle'
pickle.dump(new_model, open(filename, 'wb'))

loaded_model = pickle.load(open(filename, 'rb'))

In [30]:
# we'll save the scaler object as well for prediction
filename_scaler = 'scaler_model.pickle'
pickle.dump(scaler, open(filename_scaler, 'wb'))

scaler_model = pickle.load(open(filename_scaler, 'rb'))

In [31]:
# Trying a random prediction
d=scaler_model.transform([[6,148,72,35,80,33.6,0.627,50]])
pred=loaded_model.predict(d)
print('This data belongs to class :',pred[0])

This data belongs to class : 1


**The main advantages:**
- out of the box feature of appropriate bias-variance trade-off,
- great computation speed as it utilises parallel computing and cache optimization,
- uses hardware optimization,
- works well even if the features are correlated
- robust even if there is noise for classification problem
- the facility of early stopping
- the package is evolving, i.e., new features are being added.