# XGBOsst praactical implementation on diabatic dataset 

**Problem Statement**:
The Pima Indians Diabetes Dataset involves predicting the onset of diabetes within 5 years in Pima Indians given medical details.
It is a binary (2-class) classification problem. The number of observations for each class is not balanced. There are 768 observations with 8 input variables and 1 output variable. Missing values are believed to be encoded with zero values. The variable names are as follows:
1.	Number of times pregnant.
2.	Plasma glucose concentration 2 hours in an oral glucose tolerance test.
3.	Diastolic blood pressure (mm Hg).
4.	Triceps skinfold thickness (mm).
5.	2-Hour serum insulin (mu U/ml).
6.	Body mass index (weight in kg/(height in m)^2).
7.	Diabetes pedigree function.
8.	Age (years).
9.	Is Diabetic (0 or 1).

In [3]:
import pandas as pd
import numpy as np
import xgboost as xgb
import pickle
from sklearn import datasets
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

In [5]:
# reading the features and the labels
data= pd.read_csv('pima-indians-diabetes.csv')

In [6]:
data.columns

Index(['Number of times pregnant', 'Plasma glucose concentration',
       'Diastolic blood pressure (mm Hg)', 'Triceps skinfold thickness (mm)',
       '2-Hour serum insulin (mu U/ml)',
       'Body mass index (weight in kg/(height in m)^2)',
       'Diabetes pedigree function', 'Age', 'Is Diabetic'],
      dtype='object')

In [7]:
cols = ['Plasma glucose concentration',
       'Diastolic blood pressure (mm Hg)', 'Triceps skinfold thickness (mm)',
       '2-Hour serum insulin (mu U/ml)',
       'Body mass index (weight in kg/(height in m)^2)',
       'Diabetes pedigree function', 'Age']

In [8]:
# as mentioned in the data description, the missing values have been replaced by zeroes. So, we are replacing zeroes with nan
for col in cols:
    data[col]=data[col].replace(0, np.nan)

In [9]:
# checking for missing values
data.isna().sum()

Number of times pregnant                            0
Plasma glucose concentration                        5
Diastolic blood pressure (mm Hg)                   35
Triceps skinfold thickness (mm)                   227
2-Hour serum insulin (mu U/ml)                    374
Body mass index (weight in kg/(height in m)^2)     11
Diabetes pedigree function                          0
Age                                                 0
Is Diabetic                                         0
dtype: int64

In [10]:
# imputing the missing values
data['Plasma glucose concentration']=data['Plasma glucose concentration'].fillna(data['Plasma glucose concentration'].mode()[0])
data['Diastolic blood pressure (mm Hg)']=data['Diastolic blood pressure (mm Hg)'].fillna(data['Diastolic blood pressure (mm Hg)'].mode()[0])
data['Triceps skinfold thickness (mm)']=data['Triceps skinfold thickness (mm)'].fillna(data['Triceps skinfold thickness (mm)'].mean())
data['2-Hour serum insulin (mu U/ml)']=data['2-Hour serum insulin (mu U/ml)'].fillna(data['2-Hour serum insulin (mu U/ml)'].mean())
data['Body mass index (weight in kg/(height in m)^2)']=data['Body mass index (weight in kg/(height in m)^2)'].fillna(data['Body mass index (weight in kg/(height in m)^2)'].mean())


In [11]:
# checking for missing values after imputation
data.isna().sum()

Number of times pregnant                          0
Plasma glucose concentration                      0
Diastolic blood pressure (mm Hg)                  0
Triceps skinfold thickness (mm)                   0
2-Hour serum insulin (mu U/ml)                    0
Body mass index (weight in kg/(height in m)^2)    0
Diabetes pedigree function                        0
Age                                               0
Is Diabetic                                       0
dtype: int64

In [12]:
#Separating the feature and the Label columns 
x=data.drop(labels='Is Diabetic', axis=1)
y= data['Is Diabetic']

In [13]:
x.head()

Unnamed: 0,Number of times pregnant,Plasma glucose concentration,Diastolic blood pressure (mm Hg),Triceps skinfold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age
0,6,148.0,72.0,35.0,155.548223,33.6,0.627,50
1,1,85.0,66.0,29.0,155.548223,26.6,0.351,31
2,8,183.0,64.0,29.15342,155.548223,23.3,0.672,32
3,1,89.0,66.0,23.0,94.0,28.1,0.167,21
4,0,137.0,40.0,35.0,168.0,43.1,2.288,33


In [14]:
# as the datapoints differ a lot in magnitude, we'll scale them
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaled_data=scaler.fit_transform(x)

In [15]:
from sklearn.model_selection import train_test_split
train_x,test_x,train_y,test_y=train_test_split(scaled_data,y,test_size=0.3,random_state=42)

In [16]:
# fit model no training data
model = XGBClassifier(objective='binary:logistic')
model.fit(train_x, train_y)





XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.300000012, max_delta_step=0, max_depth=6,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=100, n_jobs=8, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

1. There are many parameter present inside algorithm

In [17]:
# cheking training accuracy
y_pred = model.predict(train_x)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(train_y,predictions)
accuracy

1.0

In [18]:
# cheking initial test accuracy
y_pred = model.predict(test_x)
predictions = [round(value) for value in y_pred]
accuracy = accuracy_score(test_y,predictions)
accuracy

0.7272727272727273

#### Increase accuracy of model by optimized input parameters

In [20]:
from sklearn.model_selection import GridSearchCV

In [21]:
param_grid={
   
    ' learning_rate':[1,0.5,0.1,0.01,0.001],
    'max_depth': [3,5,10,20],
    'n_estimators':[10,50,100,200]
    
}

In [22]:
grid= GridSearchCV(XGBClassifier(objective='binary:logistic'),param_grid, verbose=3)

In [23]:
grid.fit(train_x,train_y)

Fitting 5 folds for each of 80 candidates, totalling 400 fits
[CV 1/5] END . learning_rate=1, max_depth=3, n_estimators=10; total time=   0.0s
[CV 2/5] END . learning_rate=1, max_depth=3, n_estimators=10; total time=   0.0s
[CV 3/5] END . learning_rate=1, max_depth=3, n_estimators=10; total time=   0.0s
[CV 4/5] END . learning_rate=1, max_depth=3, n_estimators=10; total time=   0.0s
[CV 5/5] END . learning_rate=1, max_depth=3, n_estimators=10; total time=   0.0s
[CV 1/5] END . learning_rate=1, max_depth=3, n_estimators=50; total time=   0.0s
[CV 2/5] END . learning_rate=1, max_depth=3, n_estimators=50; total time=   0.0s
[CV 3/5] END . learning_rate=1, max_depth=3, n_estimators=50; total time=   0.0s
[CV 4/5] END . learning_rate=1, max_depth=3, n_estimators=50; total time=   0.0s
[CV 5/5] END . learning_rate=1, max_depth=3, n_estimators=50; total time=   0.0s
[CV 1/5] END  learning_rate=1, max_depth=3, n_estimators=100; total time=   0.0s
[CV 2/5] END  learning_rate=1, max_depth=3, n_e

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 2/5] END . learning_rate=1, max_depth=5, n_estimators=10; total time=   0.0s
[CV 3/5] END . learning_rate=1, max_depth=5, n_estimators=10; total time=   0.0s
[CV 4/5] END . learning_rate=1, max_depth=5, n_estimators=10; total time=   0.0s
[CV 5/5] END . learning_rate=1, max_depth=5, n_estimators=10; total time=   0.0s
[CV 1/5] END . learning_rate=1, max_depth=5, n_estimators=50; total time=   0.0s
[CV 2/5] END . learning_rate=1, max_depth=5, n_estimators=50; total time=   0.0s
[CV 3/5] END . learning_rate=1, max_depth=5, n_estimators=50; total time=   0.0s
[CV 4/5] END . learning_rate=1, max_depth=5, n_estimators=50; total time=   0.0s
[CV 5/5] END . learning_rate=1, max_depth=5, n_estimators=50; total time=   0.0s
[CV 1/5] END  learning_rate=1, max_depth=5, n_estimators=100; total time=   0.0s
[CV 2/5] END  learning_rate=1, max_depth=5, n_estimators=100; total time=   0.0s
[CV 3/5] END  learning_rate=1, max_depth=5, n_estimators=100; total time=   0.0s
[CV 4/5] END  learning_rate=

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 5/5] END  learning_rate=1, max_depth=10, n_estimators=50; total time=   0.0s
[CV 1/5] END  learning_rate=1, max_depth=10, n_estimators=100; total time=   0.0s
[CV 2/5] END  learning_rate=1, max_depth=10, n_estimators=100; total time=   0.0s
[CV 3/5] END  learning_rate=1, max_depth=10, n_estimators=100; total time=   0.0s
[CV 4/5] END  learning_rate=1, max_depth=10, n_estimators=100; total time=   0.0s
[CV 5/5] END  learning_rate=1, max_depth=10, n_estimators=100; total time=   0.0s
[CV 1/5] END  learning_rate=1, max_depth=10, n_estimators=200; total time=   0.0s
[CV 2/5] END  learning_rate=1, max_depth=10, n_estimators=200; total time=   0.0s
[CV 3/5] END  learning_rate=1, max_depth=10, n_estimators=200; total time=   0.0s
[CV 4/5] END  learning_rate=1, max_depth=10, n_estimators=200; total time=   0.0s
[CV 5/5] END  learning_rate=1, max_depth=10, n_estimators=200; total time=   0.0s
[CV 1/5] END  learning_rate=1, max_depth=20, n_estimators=10; total time=   0.0s
[CV 2/5] END  lear

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 4/5] END  learning_rate=1, max_depth=20, n_estimators=200; total time=   0.0s
[CV 5/5] END  learning_rate=1, max_depth=20, n_estimators=200; total time=   0.0s
[CV 1/5] END  learning_rate=0.5, max_depth=3, n_estimators=10; total time=   0.0s
[CV 2/5] END  learning_rate=0.5, max_depth=3, n_estimators=10; total time=   0.0s
[CV 3/5] END  learning_rate=0.5, max_depth=3, n_estimators=10; total time=   0.0s
[CV 4/5] END  learning_rate=0.5, max_depth=3, n_estimators=10; total time=   0.0s
[CV 5/5] END  learning_rate=0.5, max_depth=3, n_estimators=10; total time=   0.0s
[CV 1/5] END  learning_rate=0.5, max_depth=3, n_estimators=50; total time=   0.0s
[CV 2/5] END  learning_rate=0.5, max_depth=3, n_estimators=50; total time=   0.0s
[CV 3/5] END  learning_rate=0.5, max_depth=3, n_estimators=50; total time=   0.0s
[CV 4/5] END  learning_rate=0.5, max_depth=3, n_estimators=50; total time=   0.0s
[CV 5/5] END  learning_rate=0.5, max_depth=3, n_estimators=50; total time=   0.0s
[CV 1/5] END  le

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 1/5] END  learning_rate=0.5, max_depth=5, n_estimators=50; total time=   0.0s
[CV 2/5] END  learning_rate=0.5, max_depth=5, n_estimators=50; total time=   0.0s
[CV 3/5] END  learning_rate=0.5, max_depth=5, n_estimators=50; total time=   0.0s
[CV 4/5] END  learning_rate=0.5, max_depth=5, n_estimators=50; total time=   0.0s
[CV 5/5] END  learning_rate=0.5, max_depth=5, n_estimators=50; total time=   0.0s
[CV 1/5] END  learning_rate=0.5, max_depth=5, n_estimators=100; total time=   0.0s
[CV 2/5] END  learning_rate=0.5, max_depth=5, n_estimators=100; total time=   0.0s
[CV 3/5] END  learning_rate=0.5, max_depth=5, n_estimators=100; total time=   0.0s
[CV 4/5] END  learning_rate=0.5, max_depth=5, n_estimators=100; total time=   0.0s
[CV 5/5] END  learning_rate=0.5, max_depth=5, n_estimators=100; total time=   0.0s
[CV 1/5] END  learning_rate=0.5, max_depth=5, n_estimators=200; total time=   0.0s
[CV 2/5] END  learning_rate=0.5, max_depth=5, n_estimators=200; total time=   0.0s
[CV 3/5] 

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 1/5] END  learning_rate=0.5, max_depth=10, n_estimators=100; total time=   0.0s
[CV 2/5] END  learning_rate=0.5, max_depth=10, n_estimators=100; total time=   0.0s
[CV 3/5] END  learning_rate=0.5, max_depth=10, n_estimators=100; total time=   0.0s
[CV 4/5] END  learning_rate=0.5, max_depth=10, n_estimators=100; total time=   0.0s
[CV 5/5] END  learning_rate=0.5, max_depth=10, n_estimators=100; total time=   0.0s
[CV 1/5] END  learning_rate=0.5, max_depth=10, n_estimators=200; total time=   0.0s
[CV 2/5] END  learning_rate=0.5, max_depth=10, n_estimators=200; total time=   0.0s
[CV 3/5] END  learning_rate=0.5, max_depth=10, n_estimators=200; total time=   0.0s
[CV 4/5] END  learning_rate=0.5, max_depth=10, n_estimators=200; total time=   0.0s
[CV 5/5] END  learning_rate=0.5, max_depth=10, n_estimators=200; total time=   0.0s
[CV 1/5] END  learning_rate=0.5, max_depth=20, n_estimators=10; total time=   0.0s
[CV 2/5] END  learning_rate=0.5, max_depth=20, n_estimators=10; total time=  

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 5/5] END  learning_rate=0.5, max_depth=20, n_estimators=100; total time=   0.0s
[CV 1/5] END  learning_rate=0.5, max_depth=20, n_estimators=200; total time=   0.0s
[CV 2/5] END  learning_rate=0.5, max_depth=20, n_estimators=200; total time=   0.0s
[CV 3/5] END  learning_rate=0.5, max_depth=20, n_estimators=200; total time=   0.0s
[CV 4/5] END  learning_rate=0.5, max_depth=20, n_estimators=200; total time=   0.0s
[CV 5/5] END  learning_rate=0.5, max_depth=20, n_estimators=200; total time=   0.0s
[CV 1/5] END  learning_rate=0.1, max_depth=3, n_estimators=10; total time=   0.0s
[CV 2/5] END  learning_rate=0.1, max_depth=3, n_estimators=10; total time=   0.0s
[CV 3/5] END  learning_rate=0.1, max_depth=3, n_estimators=10; total time=   0.0s
[CV 4/5] END  learning_rate=0.1, max_depth=3, n_estimators=10; total time=   0.0s
[CV 5/5] END  learning_rate=0.1, max_depth=3, n_estimators=10; total time=   0.0s
[CV 1/5] END  learning_rate=0.1, max_depth=3, n_estimators=50; total time=   0.0s
[CV 

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 2/5] END  learning_rate=0.1, max_depth=5, n_estimators=10; total time=   0.0s
[CV 3/5] END  learning_rate=0.1, max_depth=5, n_estimators=10; total time=   0.0s
[CV 4/5] END  learning_rate=0.1, max_depth=5, n_estimators=10; total time=   0.0s
[CV 5/5] END  learning_rate=0.1, max_depth=5, n_estimators=10; total time=   0.0s
[CV 1/5] END  learning_rate=0.1, max_depth=5, n_estimators=50; total time=   0.0s
[CV 2/5] END  learning_rate=0.1, max_depth=5, n_estimators=50; total time=   0.0s
[CV 3/5] END  learning_rate=0.1, max_depth=5, n_estimators=50; total time=   0.0s
[CV 4/5] END  learning_rate=0.1, max_depth=5, n_estimators=50; total time=   0.0s
[CV 5/5] END  learning_rate=0.1, max_depth=5, n_estimators=50; total time=   0.0s
[CV 1/5] END  learning_rate=0.1, max_depth=5, n_estimators=100; total time=   0.0s
[CV 2/5] END  learning_rate=0.1, max_depth=5, n_estimators=100; total time=   0.0s
[CV 3/5] END  learning_rate=0.1, max_depth=5, n_estimators=100; total time=   0.0s
[CV 4/5] END 

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 2/5] END  learning_rate=0.1, max_depth=10, n_estimators=50; total time=   0.0s
[CV 3/5] END  learning_rate=0.1, max_depth=10, n_estimators=50; total time=   0.0s
[CV 4/5] END  learning_rate=0.1, max_depth=10, n_estimators=50; total time=   0.0s
[CV 5/5] END  learning_rate=0.1, max_depth=10, n_estimators=50; total time=   0.0s
[CV 1/5] END  learning_rate=0.1, max_depth=10, n_estimators=100; total time=   0.0s
[CV 2/5] END  learning_rate=0.1, max_depth=10, n_estimators=100; total time=   0.0s
[CV 3/5] END  learning_rate=0.1, max_depth=10, n_estimators=100; total time=   0.0s
[CV 4/5] END  learning_rate=0.1, max_depth=10, n_estimators=100; total time=   0.0s
[CV 5/5] END  learning_rate=0.1, max_depth=10, n_estimators=100; total time=   0.0s
[CV 1/5] END  learning_rate=0.1, max_depth=10, n_estimators=200; total time=   0.0s
[CV 2/5] END  learning_rate=0.1, max_depth=10, n_estimators=200; total time=   0.0s
[CV 3/5] END  learning_rate=0.1, max_depth=10, n_estimators=200; total time=   0

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 1/5] END  learning_rate=0.1, max_depth=20, n_estimators=100; total time=   0.0s
[CV 2/5] END  learning_rate=0.1, max_depth=20, n_estimators=100; total time=   0.0s
[CV 3/5] END  learning_rate=0.1, max_depth=20, n_estimators=100; total time=   0.0s
[CV 4/5] END  learning_rate=0.1, max_depth=20, n_estimators=100; total time=   0.0s
[CV 5/5] END  learning_rate=0.1, max_depth=20, n_estimators=100; total time=   0.0s
[CV 1/5] END  learning_rate=0.1, max_depth=20, n_estimators=200; total time=   0.0s
[CV 2/5] END  learning_rate=0.1, max_depth=20, n_estimators=200; total time=   0.0s
[CV 3/5] END  learning_rate=0.1, max_depth=20, n_estimators=200; total time=   0.0s
[CV 4/5] END  learning_rate=0.1, max_depth=20, n_estimators=200; total time=   0.0s
[CV 5/5] END  learning_rate=0.1, max_depth=20, n_estimators=200; total time=   0.0s
[CV 1/5] END  learning_rate=0.01, max_depth=3, n_estimators=10; total time=   0.0s
[CV 2/5] END  learning_rate=0.01, max_depth=3, n_estimators=10; total time=  

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 4/5] END  learning_rate=0.01, max_depth=3, n_estimators=100; total time=   0.0s
[CV 5/5] END  learning_rate=0.01, max_depth=3, n_estimators=100; total time=   0.0s
[CV 1/5] END  learning_rate=0.01, max_depth=3, n_estimators=200; total time=   0.0s
[CV 2/5] END  learning_rate=0.01, max_depth=3, n_estimators=200; total time=   0.0s
[CV 3/5] END  learning_rate=0.01, max_depth=3, n_estimators=200; total time=   0.0s
[CV 4/5] END  learning_rate=0.01, max_depth=3, n_estimators=200; total time=   0.0s
[CV 5/5] END  learning_rate=0.01, max_depth=3, n_estimators=200; total time=   0.0s
[CV 1/5] END  learning_rate=0.01, max_depth=5, n_estimators=10; total time=   0.0s
[CV 2/5] END  learning_rate=0.01, max_depth=5, n_estimators=10; total time=   0.0s
[CV 3/5] END  learning_rate=0.01, max_depth=5, n_estimators=10; total time=   0.0s
[CV 4/5] END  learning_rate=0.01, max_depth=5, n_estimators=10; total time=   0.0s
[CV 5/5] END  learning_rate=0.01, max_depth=5, n_estimators=10; total time=   0.

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 3/5] END  learning_rate=0.01, max_depth=5, n_estimators=200; total time=   0.0s
[CV 4/5] END  learning_rate=0.01, max_depth=5, n_estimators=200; total time=   0.0s
[CV 5/5] END  learning_rate=0.01, max_depth=5, n_estimators=200; total time=   0.0s
[CV 1/5] END  learning_rate=0.01, max_depth=10, n_estimators=10; total time=   0.0s
[CV 2/5] END  learning_rate=0.01, max_depth=10, n_estimators=10; total time=   0.0s
[CV 3/5] END  learning_rate=0.01, max_depth=10, n_estimators=10; total time=   0.0s
[CV 4/5] END  learning_rate=0.01, max_depth=10, n_estimators=10; total time=   0.0s
[CV 5/5] END  learning_rate=0.01, max_depth=10, n_estimators=10; total time=   0.0s
[CV 1/5] END  learning_rate=0.01, max_depth=10, n_estimators=50; total time=   0.0s
[CV 2/5] END  learning_rate=0.01, max_depth=10, n_estimators=50; total time=   0.0s
[CV 3/5] END  learning_rate=0.01, max_depth=10, n_estimators=50; total time=   0.0s
[CV 4/5] END  learning_rate=0.01, max_depth=10, n_estimators=50; total time=

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 3/5] END  learning_rate=0.01, max_depth=20, n_estimators=10; total time=   0.0s
[CV 4/5] END  learning_rate=0.01, max_depth=20, n_estimators=10; total time=   0.0s
[CV 5/5] END  learning_rate=0.01, max_depth=20, n_estimators=10; total time=   0.0s
[CV 1/5] END  learning_rate=0.01, max_depth=20, n_estimators=50; total time=   0.0s
[CV 2/5] END  learning_rate=0.01, max_depth=20, n_estimators=50; total time=   0.0s
[CV 3/5] END  learning_rate=0.01, max_depth=20, n_estimators=50; total time=   0.0s
[CV 4/5] END  learning_rate=0.01, max_depth=20, n_estimators=50; total time=   0.0s
[CV 5/5] END  learning_rate=0.01, max_depth=20, n_estimators=50; total time=   0.0s
[CV 1/5] END  learning_rate=0.01, max_depth=20, n_estimators=100; total time=   0.0s
[CV 2/5] END  learning_rate=0.01, max_depth=20, n_estimators=100; total time=   0.0s
[CV 3/5] END  learning_rate=0.01, max_depth=20, n_estimators=100; total time=   0.0s
[CV 4/5] END  learning_rate=0.01, max_depth=20, n_estimators=100; total t

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 5/5] END  learning_rate=0.001, max_depth=3, n_estimators=50; total time=   0.0s
[CV 1/5] END  learning_rate=0.001, max_depth=3, n_estimators=100; total time=   0.0s
[CV 2/5] END  learning_rate=0.001, max_depth=3, n_estimators=100; total time=   0.0s
[CV 3/5] END  learning_rate=0.001, max_depth=3, n_estimators=100; total time=   0.0s
[CV 4/5] END  learning_rate=0.001, max_depth=3, n_estimators=100; total time=   0.0s
[CV 5/5] END  learning_rate=0.001, max_depth=3, n_estimators=100; total time=   0.0s
[CV 1/5] END  learning_rate=0.001, max_depth=3, n_estimators=200; total time=   0.0s
[CV 2/5] END  learning_rate=0.001, max_depth=3, n_estimators=200; total time=   0.0s
[CV 3/5] END  learning_rate=0.001, max_depth=3, n_estimators=200; total time=   0.0s
[CV 4/5] END  learning_rate=0.001, max_depth=3, n_estimators=200; total time=   0.0s
[CV 5/5] END  learning_rate=0.001, max_depth=3, n_estimators=200; total time=   0.0s
[CV 1/5] END  learning_rate=0.001, max_depth=5, n_estimators=10; t

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 5/5] END  learning_rate=0.001, max_depth=5, n_estimators=100; total time=   0.0s
[CV 1/5] END  learning_rate=0.001, max_depth=5, n_estimators=200; total time=   0.0s
[CV 2/5] END  learning_rate=0.001, max_depth=5, n_estimators=200; total time=   0.0s
[CV 3/5] END  learning_rate=0.001, max_depth=5, n_estimators=200; total time=   0.0s
[CV 4/5] END  learning_rate=0.001, max_depth=5, n_estimators=200; total time=   0.0s
[CV 5/5] END  learning_rate=0.001, max_depth=5, n_estimators=200; total time=   0.0s
[CV 1/5] END  learning_rate=0.001, max_depth=10, n_estimators=10; total time=   0.0s
[CV 2/5] END  learning_rate=0.001, max_depth=10, n_estimators=10; total time=   0.0s
[CV 3/5] END  learning_rate=0.001, max_depth=10, n_estimators=10; total time=   0.0s
[CV 4/5] END  learning_rate=0.001, max_depth=10, n_estimators=10; total time=   0.0s
[CV 5/5] END  learning_rate=0.001, max_depth=10, n_estimators=10; total time=   0.0s
[CV 1/5] END  learning_rate=0.001, max_depth=10, n_estimators=50;

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

[CV 5/5] END  learning_rate=0.001, max_depth=10, n_estimators=200; total time=   0.0s
[CV 1/5] END  learning_rate=0.001, max_depth=20, n_estimators=10; total time=   0.0s
[CV 2/5] END  learning_rate=0.001, max_depth=20, n_estimators=10; total time=   0.0s
[CV 3/5] END  learning_rate=0.001, max_depth=20, n_estimators=10; total time=   0.0s
[CV 4/5] END  learning_rate=0.001, max_depth=20, n_estimators=10; total time=   0.0s
[CV 5/5] END  learning_rate=0.001, max_depth=20, n_estimators=10; total time=   0.0s
[CV 1/5] END  learning_rate=0.001, max_depth=20, n_estimators=50; total time=   0.0s
[CV 2/5] END  learning_rate=0.001, max_depth=20, n_estimators=50; total time=   0.0s
[CV 3/5] END  learning_rate=0.001, max_depth=20, n_estimators=50; total time=   0.0s
[CV 4/5] END  learning_rate=0.001, max_depth=20, n_estimators=50; total time=   0.0s
[CV 5/5] END  learning_rate=0.001, max_depth=20, n_estimators=50; total time=   0.0s
[CV 1/5] END  learning_rate=0.001, max_depth=20, n_estimators=10

Traceback (most recent call last):
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 593, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 436, in inner_f
    return f(**kwargs)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\sklearn.py", line 1176, in fit
    self._Booster = train(
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 189, in train
    bst = _train_internal(params, dtrain,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\training.py", line 81, in _train_internal
    bst.update(dtrain, i, obj)
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 1499, in update
    _check_call(_LIB.XGBoosterUpdateOneIter(self.handle,
  File "C:\Users\CHINTAN PATEL\anaconda3\lib\site-packages\xgboost\core.py", line 210, in _check_call
  

XGBoostError: [18:34:28] C:/Users/Administrator/workspace/xgboost-win64_release_1.4.0/src/learner.cc:550: Invalid parameter " learning_rate" contains whitespace.

In [24]:
# To  find the parameters givingmaximum accuracy
grid.best_params_

{' learning_rate': 1, 'max_depth': 3, 'n_estimators': 10}

In [26]:
# Create new model using the same parameters
new_model=XGBClassifier(learning_rate= 1, max_depth= 3, n_estimators= 10)
new_model.fit(train_x, train_y)



XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=10, n_jobs=8, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

In [27]:
y_pred_new = new_model.predict(test_x)
predictions_new = [round(value) for value in y_pred_new]
accuracy_new = accuracy_score(test_y,predictions_new)
accuracy_new

0.7532467532467533

In [28]:
#Save model inside pickle file as following
filename = 'xgboost_model.pickle'
pickle.dump(new_model, open(filename, 'wb'))

loaded_model = pickle.load(open(filename, 'rb'))

In [29]:
# we'll save the scaler object as well for prediction
filename_scaler = 'scaler_model.pickle'
pickle.dump(scaler, open(filename_scaler, 'wb'))

scaler_model = pickle.load(open(filename_scaler, 'rb'))