# The Pima Indians Diabetes Dataset

#### Problem Statement: The Pima Indians Diabetes Dataset involves predicting the onset of diabetes within 5 years in Pima Indians given medical details. It is a binary (2-class) classification problem. The number of observations for each class is not balanced. There are 768 observations with 8 input variables and 1 output variable. Missing values are believed to be encoded with zero values. The variable names are as follows:

1. Number of times pregnant.
2. Plasma glucose concentration 2 hours in an oral glucose tolerance test.
3. Diastolic blood pressure (mm Hg).
4. Triceps skinfold thickness (mm).
5. 2-Hour serum insulin (mu U/ml).
6. Body mass index (weight in kg/(height in m)^2).
7. Diabetes pedigree function.
8. Age (years).
9. Is Diabetic (0 or 1).

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt 
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,confusion_matrix
import xgboost as xgb
from xgboost import XGBRFClassifier   
from sklearn import datasets
    


In [2]:
data=pd.read_csv('pima-indians-diabetes.csv')
data

Unnamed: 0,Number of times pregnant,Plasma glucose concentration,Diastolic blood pressure (mm Hg),Triceps skinfold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age,Is Diabetic
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


In [3]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                                          Non-Null Count  Dtype  
---  ------                                          --------------  -----  
 0   Number of times pregnant                        768 non-null    int64  
 1   Plasma glucose concentration                    768 non-null    int64  
 2   Diastolic blood pressure (mm Hg)                768 non-null    int64  
 3   Triceps skinfold thickness (mm)                 768 non-null    int64  
 4   2-Hour serum insulin (mu U/ml)                  768 non-null    int64  
 5   Body mass index (weight in kg/(height in m)^2)  768 non-null    float64
 6   Diabetes pedigree function                      768 non-null    float64
 7   Age                                             768 non-null    int64  
 8   Is Diabetic                                     768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 K

In [4]:
data.describe()

Unnamed: 0,Number of times pregnant,Plasma glucose concentration,Diastolic blood pressure (mm Hg),Triceps skinfold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age,Is Diabetic
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


In [5]:
#wie in der Datenbeschreibung erwähnt, wurden die fehlenden Werte durch Nullen ersetzt. Wir ersetzen also die Nullen durch nan

column=['Plasma glucose concentration',
       'Diastolic blood pressure (mm Hg)', 'Triceps skinfold thickness (mm)',
       '2-Hour serum insulin (mu U/ml)',
       'Body mass index (weight in kg/(height in m)^2)',
       'Diabetes pedigree function', 'Age']

In [6]:
for i in column:
    data[i]=data[i].replace(0,np.nan)

In [7]:
data.isna().sum()

Number of times pregnant                            0
Plasma glucose concentration                        5
Diastolic blood pressure (mm Hg)                   35
Triceps skinfold thickness (mm)                   227
2-Hour serum insulin (mu U/ml)                    374
Body mass index (weight in kg/(height in m)^2)     11
Diabetes pedigree function                          0
Age                                                 0
Is Diabetic                                         0
dtype: int64

In [8]:
data['Plasma glucose concentration'].mode()[0]

99.0

In [9]:
data['Plasma glucose concentration']=data['Plasma glucose concentration'].fillna(data['Plasma glucose concentration'].mode()[0])
data['Diastolic blood pressure (mm Hg)']=data['Diastolic blood pressure (mm Hg)'].fillna(data['Diastolic blood pressure (mm Hg)'].mode()[0])
data['Triceps skinfold thickness (mm)']=data['Triceps skinfold thickness (mm)'].fillna(data['Triceps skinfold thickness (mm)'].mean())
data['2-Hour serum insulin (mu U/ml)']=data['2-Hour serum insulin (mu U/ml)'].fillna(data['2-Hour serum insulin (mu U/ml)'].mean())
data['Body mass index (weight in kg/(height in m)^2)']=data['Body mass index (weight in kg/(height in m)^2)'].fillna(data['Body mass index (weight in kg/(height in m)^2)'].mean())


In [10]:
data.isna().sum()

Number of times pregnant                          0
Plasma glucose concentration                      0
Diastolic blood pressure (mm Hg)                  0
Triceps skinfold thickness (mm)                   0
2-Hour serum insulin (mu U/ml)                    0
Body mass index (weight in kg/(height in m)^2)    0
Diabetes pedigree function                        0
Age                                               0
Is Diabetic                                       0
dtype: int64

In [11]:
data

Unnamed: 0,Number of times pregnant,Plasma glucose concentration,Diastolic blood pressure (mm Hg),Triceps skinfold thickness (mm),2-Hour serum insulin (mu U/ml),Body mass index (weight in kg/(height in m)^2),Diabetes pedigree function,Age,Is Diabetic
0,6,148.0,72.0,35.00000,155.548223,33.6,0.627,50,1
1,1,85.0,66.0,29.00000,155.548223,26.6,0.351,31,0
2,8,183.0,64.0,29.15342,155.548223,23.3,0.672,32,1
3,1,89.0,66.0,23.00000,94.000000,28.1,0.167,21,0
4,0,137.0,40.0,35.00000,168.000000,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101.0,76.0,48.00000,180.000000,32.9,0.171,63,0
764,2,122.0,70.0,27.00000,155.548223,36.8,0.340,27,0
765,5,121.0,72.0,23.00000,112.000000,26.2,0.245,30,0
766,1,126.0,60.0,29.15342,155.548223,30.1,0.349,47,1


In [12]:
#Die Daten unterscheiden sich stark in ihrer Größenordnung, so dass sie skaliert werden müssen.
from sklearn.preprocessing import StandardScaler
scale=StandardScaler()

In [13]:
x=data.drop('Is Diabetic',axis=1)
y=data['Is Diabetic']

In [14]:
scaled_data=scale.fit_transform(x)
scaled_data

array([[ 0.63994726,  0.86840303, -0.02442979, ...,  0.16629174,
         0.46849198,  1.4259954 ],
       [-0.84488505, -1.19914997, -0.52034382, ..., -0.85253118,
        -0.36506078, -0.19067191],
       [ 1.23388019,  2.01704359, -0.68564849, ..., -1.33283341,
         0.60439732, -0.10558415],
       ...,
       [ 0.3429808 , -0.01769112, -0.02442979, ..., -0.91074963,
        -0.68519336, -0.27575966],
       [-0.84488505,  0.14640039, -1.01625784, ..., -0.34311972,
        -0.37110101,  1.17073215],
       [-0.84488505, -0.93660356, -0.18973447, ..., -0.29945588,
        -0.47378505, -0.87137393]])

In [15]:
pd.DataFrame(scaled_data)

Unnamed: 0,0,1,2,3,4,5,6,7
0,0.639947,0.868403,-0.024430,6.655021e-01,-3.345079e-16,0.166292,0.468492,1.425995
1,-0.844885,-1.199150,-0.520344,-1.746338e-02,-3.345079e-16,-0.852531,-0.365061,-0.190672
2,1.233880,2.017044,-0.685648,8.087936e-16,-3.345079e-16,-1.332833,0.604397,-0.105584
3,-0.844885,-1.067877,-0.520344,-7.004289e-01,-7.243887e-01,-0.634212,-0.920763,-1.041549
4,-1.141852,0.507402,-2.669305,6.655021e-01,1.465506e-01,1.548980,5.484909,-0.020496
...,...,...,...,...,...,...,...,...
763,1.827813,-0.674057,0.306180,2.145261e+00,2.877840e-01,0.064409,-0.908682,2.532136
764,-0.547919,0.015127,-0.189734,-2.451185e-01,-3.345079e-16,0.632039,-0.398282,-0.531023
765,0.342981,-0.017691,-0.024430,-7.004289e-01,-5.125386e-01,-0.910750,-0.685193,-0.275760
766,-0.844885,0.146400,-1.016258,8.087936e-16,-3.345079e-16,-0.343120,-0.371101,1.170732


In [16]:
x_train,x_test,y_train,y_test=train_test_split(scaled_data,y,test_size=.30,random_state=355)

In [17]:
model=XGBRFClassifier()

In [18]:
model.fit(x_train,y_train)

XGBRFClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
                colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain',
                interaction_constraints='', max_delta_step=0, max_depth=6,
                min_child_weight=1, missing=nan, monotone_constraints='()',
                n_estimators=100, n_jobs=0, num_parallel_tree=100,
                objective='binary:logistic', random_state=0, reg_alpha=0,
                scale_pos_weight=1, tree_method='exact', validate_parameters=1,
                verbosity=None)

In [19]:
#train accurarcy
y_pred=model.predict(x_train)
pre=[round(value) for value in y_pred]
accuracy=accuracy_score(y_train,pre)
accuracy

0.9199255121042831

In [20]:
#test accuracy
y_pred=model.predict(x_test)
accurac=accuracy_score(y_test,y_pred)
accurac

0.7359307359307359

In [21]:
#hyperparameter 
from sklearn.model_selection import GridSearchCV

In [22]:
peram_grid={
    
    'learning_rate':[1,0.5,0.1,0.01,0.001],
    'max_depth': [3,5,10,20],
    'n_estimatores':[10,50,100,200]
    
    
}

In [23]:
grid=GridSearchCV(XGBRFClassifier(),peram_grid,verbose=3)

In [24]:
grid.fit(x_train,y_train)

Fitting 5 folds for each of 80 candidates, totalling 400 fits
[CV] learning_rate=1, max_depth=3, n_estimatores=10 ..................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=3, n_estimatores=10, score=0.787, total=   0.1s
[CV] learning_rate=1, max_depth=3, n_estimatores=10 ..................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=3, n_estimatores=10, score=0.713, total=   0.0s
[CV] learning_rate=1, max_depth=3, n_estimatores=10 ....

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.1s remaining:    0.0s


[CV]  learning_rate=1, max_depth=3, n_estimatores=50, score=0.787, total=   0.0s
[CV] learning_rate=1, max_depth=3, n_estimatores=50 ..................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=3, n_estimatores=50, score=0.713, total=   0.1s
[CV] learning_rate=1, max_depth=3, n_estimatores=50 ..................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=3, n_estimatores=50, score=0.785, total=   0.0s
[CV] learning_rate=1, max_depth=3, n_

[CV]  learning_rate=1, max_depth=5, n_estimatores=10, score=0.748, total=   0.1s
[CV] learning_rate=1, max_depth=5, n_estimatores=10 ..................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=5, n_estimatores=10, score=0.729, total=   0.0s
[CV] learning_rate=1, max_depth=5, n_estimatores=10 ..................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=5, n_estimatores=10, score=0.832, total=   0.0s
[CV] learning_rate=1, max_depth=5, n_

[CV]  learning_rate=1, max_depth=10, n_estimatores=10, score=0.796, total=   0.1s
[CV] learning_rate=1, max_depth=10, n_estimatores=10 .................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=10, n_estimatores=10, score=0.769, total=   0.0s
[CV] learning_rate=1, max_depth=10, n_estimatores=10 .................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=10, n_estimatores=10, score=0.748, total=   0.1s
[CV] learning_rate=1, max_depth=10

[CV]  learning_rate=1, max_depth=10, n_estimatores=200, score=0.769, total=   0.1s
[CV] learning_rate=1, max_depth=10, n_estimatores=200 ................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=10, n_estimatores=200, score=0.748, total=   0.1s
[CV] learning_rate=1, max_depth=10, n_estimatores=200 ................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=10, n_estimatores=200, score=0.729, total=   0.1s
[CV] learning_rate=1, max_depth

[CV]  learning_rate=1, max_depth=20, n_estimatores=200, score=0.796, total=   0.1s
[CV] learning_rate=1, max_depth=20, n_estimatores=200 ................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=20, n_estimatores=200, score=0.769, total=   0.1s
[CV] learning_rate=1, max_depth=20, n_estimatores=200 ................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=1, max_depth=20, n_estimatores=200, score=0.748, total=   0.1s
[CV] learning_rate=1, max_depth

[CV]  learning_rate=0.5, max_depth=3, n_estimatores=200, score=0.785, total=   0.0s
[CV] learning_rate=0.5, max_depth=3, n_estimatores=200 ...............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.5, max_depth=3, n_estimatores=200, score=0.710, total=   0.0s
[CV] learning_rate=0.5, max_depth=3, n_estimatores=200 ...............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.5, max_depth=3, n_estimatores=200, score=0.804, total=   0.0s
[CV] learning_rate=0.5, max_

[CV]  learning_rate=0.5, max_depth=5, n_estimatores=200, score=0.778, total=   0.1s
[CV] learning_rate=0.5, max_depth=5, n_estimatores=200 ...............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.5, max_depth=5, n_estimatores=200, score=0.750, total=   0.0s
[CV] learning_rate=0.5, max_depth=5, n_estimatores=200 ...............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.5, max_depth=5, n_estimatores=200, score=0.748, total=   0.0s
[CV] learning_rate=0.5, max_

[CV]  learning_rate=0.5, max_depth=10, n_estimatores=100, score=0.729, total=   0.1s
[CV] learning_rate=0.5, max_depth=10, n_estimatores=100 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.5, max_depth=10, n_estimatores=100, score=0.841, total=   0.1s
[CV] learning_rate=0.5, max_depth=10, n_estimatores=200 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.5, max_depth=10, n_estimatores=200, score=0.796, total=   0.1s
[CV] learning_rate=0.5, m

Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.5, max_depth=20, n_estimatores=50, score=0.841, total=   0.1s
[CV] learning_rate=0.5, max_depth=20, n_estimatores=100 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.5, max_depth=20, n_estimatores=100, score=0.796, total=   0.2s
[CV] learning_rate=0.5, max_depth=20, n_estimatores=100 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only us

[CV]  learning_rate=0.1, max_depth=3, n_estimatores=50, score=0.713, total=   0.1s
[CV] learning_rate=0.1, max_depth=3, n_estimatores=50 ................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.1, max_depth=3, n_estimatores=50, score=0.785, total=   0.0s
[CV] learning_rate=0.1, max_depth=3, n_estimatores=50 ................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.1, max_depth=3, n_estimatores=50, score=0.710, total=   0.0s
[CV] learning_rate=0.1, max_dep

[CV]  learning_rate=0.1, max_depth=5, n_estimatores=10, score=0.748, total=   0.0s
[CV] learning_rate=0.1, max_depth=5, n_estimatores=10 ................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.1, max_depth=5, n_estimatores=10, score=0.729, total=   0.0s
[CV] learning_rate=0.1, max_depth=5, n_estimatores=10 ................
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.1, max_depth=5, n_estimatores=10, score=0.832, total=   0.0s
[CV] learning_rate=0.1, max_dep

[CV]  learning_rate=0.1, max_depth=5, n_estimatores=200, score=0.832, total=   0.1s
[CV] learning_rate=0.1, max_depth=10, n_estimatores=10 ...............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.1, max_depth=10, n_estimatores=10, score=0.796, total=   0.1s
[CV] learning_rate=0.1, max_depth=10, n_estimatores=10 ...............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.1, max_depth=10, n_estimatores=10, score=0.769, total=   0.1s
[CV] learning_rate=0.1, max_

[CV]  learning_rate=0.1, max_depth=10, n_estimatores=200, score=0.796, total=   0.1s
[CV] learning_rate=0.1, max_depth=10, n_estimatores=200 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.1, max_depth=10, n_estimatores=200, score=0.769, total=   0.1s
[CV] learning_rate=0.1, max_depth=10, n_estimatores=200 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.1, max_depth=10, n_estimatores=200, score=0.748, total=   0.1s
[CV] learning_rate=0.1, m

[CV]  learning_rate=0.1, max_depth=20, n_estimatores=100, score=0.769, total=   0.1s
[CV] learning_rate=0.1, max_depth=20, n_estimatores=100 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.1, max_depth=20, n_estimatores=100, score=0.748, total=   0.1s
[CV] learning_rate=0.1, max_depth=20, n_estimatores=100 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.1, max_depth=20, n_estimatores=100, score=0.729, total=   0.1s
[CV] learning_rate=0.1, m

[CV]  learning_rate=0.01, max_depth=3, n_estimatores=50, score=0.713, total=   0.0s
[CV] learning_rate=0.01, max_depth=3, n_estimatores=50 ...............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.01, max_depth=3, n_estimatores=50, score=0.785, total=   0.0s
[CV] learning_rate=0.01, max_depth=3, n_estimatores=50 ...............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.01, max_depth=3, n_estimatores=50, score=0.710, total=   0.0s
[CV] learning_rate=0.01, max

[CV]  learning_rate=0.01, max_depth=5, n_estimatores=10, score=0.748, total=   0.1s
[CV] learning_rate=0.01, max_depth=5, n_estimatores=10 ...............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.01, max_depth=5, n_estimatores=10, score=0.729, total=   0.1s
[CV] learning_rate=0.01, max_depth=5, n_estimatores=10 ...............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.01, max_depth=5, n_estimatores=10, score=0.832, total=   0.0s
[CV] learning_rate=0.01, max

[CV]  learning_rate=0.01, max_depth=5, n_estimatores=200, score=0.729, total=   0.1s
[CV] learning_rate=0.01, max_depth=5, n_estimatores=200 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.01, max_depth=5, n_estimatores=200, score=0.832, total=   0.0s
[CV] learning_rate=0.01, max_depth=10, n_estimatores=10 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.01, max_depth=10, n_estimatores=10, score=0.796, total=   0.0s
[CV] learning_rate=0.01, 

[CV]  learning_rate=0.01, max_depth=10, n_estimatores=200, score=0.796, total=   0.1s
[CV] learning_rate=0.01, max_depth=10, n_estimatores=200 .............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.01, max_depth=10, n_estimatores=200, score=0.769, total=   0.0s
[CV] learning_rate=0.01, max_depth=10, n_estimatores=200 .............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.01, max_depth=10, n_estimatores=200, score=0.748, total=   0.0s
[CV] learning_rate=0.0

[CV]  learning_rate=0.01, max_depth=20, n_estimatores=100, score=0.729, total=   0.1s
[CV] learning_rate=0.01, max_depth=20, n_estimatores=100 .............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.01, max_depth=20, n_estimatores=100, score=0.841, total=   0.1s
[CV] learning_rate=0.01, max_depth=20, n_estimatores=200 .............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.01, max_depth=20, n_estimatores=200, score=0.796, total=   0.0s
[CV] learning_rate=0.0

[CV]  learning_rate=0.001, max_depth=3, n_estimatores=100, score=0.785, total=   0.0s
[CV] learning_rate=0.001, max_depth=3, n_estimatores=100 .............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.001, max_depth=3, n_estimatores=100, score=0.710, total=   0.0s
[CV] learning_rate=0.001, max_depth=3, n_estimatores=100 .............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.001, max_depth=3, n_estimatores=100, score=0.804, total=   0.0s
[CV] learning_rate=0.0

[CV]  learning_rate=0.001, max_depth=5, n_estimatores=50, score=0.748, total=   0.0s
[CV] learning_rate=0.001, max_depth=5, n_estimatores=50 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.001, max_depth=5, n_estimatores=50, score=0.729, total=   0.0s
[CV] learning_rate=0.001, max_depth=5, n_estimatores=50 ..............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.001, max_depth=5, n_estimatores=50, score=0.832, total=   0.0s
[CV] learning_rate=0.001,

[CV]  learning_rate=0.001, max_depth=10, n_estimatores=10, score=0.748, total=   0.1s
[CV] learning_rate=0.001, max_depth=10, n_estimatores=10 .............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.001, max_depth=10, n_estimatores=10, score=0.729, total=   0.0s
[CV] learning_rate=0.001, max_depth=10, n_estimatores=10 .............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.001, max_depth=10, n_estimatores=10, score=0.841, total=   0.1s
[CV] learning_rate=0.0

[CV]  learning_rate=0.001, max_depth=10, n_estimatores=200, score=0.729, total=   0.1s
[CV] learning_rate=0.001, max_depth=10, n_estimatores=200 ............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.001, max_depth=10, n_estimatores=200, score=0.841, total=   0.1s
[CV] learning_rate=0.001, max_depth=20, n_estimatores=10 .............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.001, max_depth=20, n_estimatores=10, score=0.796, total=   0.1s
[CV] learning_rate=0

[CV]  learning_rate=0.001, max_depth=20, n_estimatores=100, score=0.729, total=   0.1s
[CV] learning_rate=0.001, max_depth=20, n_estimatores=100 ............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.001, max_depth=20, n_estimatores=100, score=0.841, total=   0.1s
[CV] learning_rate=0.001, max_depth=20, n_estimatores=200 ............
Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[CV]  learning_rate=0.001, max_depth=20, n_estimatores=200, score=0.796, total=   0.1s
[CV] learning_rate=

[Parallel(n_jobs=1)]: Done 400 out of 400 | elapsed:   19.4s finished


GridSearchCV(estimator=XGBRFClassifier(base_score=None, booster=None,
                                       colsample_bylevel=None,
                                       colsample_bytree=None, gamma=None,
                                       gpu_id=None, importance_type='gain',
                                       interaction_constraints=None,
                                       max_delta_step=None, max_depth=None,
                                       min_child_weight=None, missing=nan,
                                       monotone_constraints=None,
                                       n_estimators=100, n_jobs=None,
                                       num_parallel_tree=None,
                                       objective='binary:logistic',
                                       random_state=None, reg_alpha=None,
                                       scale_pos_weight=None, tree_method=None,
                                       validate_parameters=None,
           

In [25]:
grid.best_params_

{'learning_rate': 1, 'max_depth': 10, 'n_estimatores': 10}

In [26]:
new_model=XGBRFClassifier(learning_rate= 1, max_depth= 10, n_estimatores= 10)


In [27]:
new_model.fit(x_train,y_train)


Parameters: { n_estimatores } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.




XGBRFClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
                colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain',
                interaction_constraints='', max_delta_step=0, max_depth=10,
                min_child_weight=1, missing=nan, monotone_constraints='()',
                n_estimatores=10, n_estimators=100, n_jobs=0,
                num_parallel_tree=100, objective='binary:logistic',
                random_state=0, reg_alpha=0, scale_pos_weight=1,
                tree_method='exact', validate_parameters=1, verbosity=None)

In [28]:
y_new_pred=new_model.predict(x_test)
new_acc=accuracy_score(y_test,y_new_pred)
new_acc

0.7316017316017316

In [42]:
import pickle
with open("XGBoost_model.pickel",'wb') as f:
    pickle.dump(new_model,f)

In [43]:
with open("XGBoost_model.pickel",'rb') as f:
    mp=pickle.load(f)

In [45]:
with open('scaler_model.pickle','wb') as f:
    pickle.dump(scale,f)

In [46]:
with open('scaler_model.pickle','rb') as f:
    sc=pickle.load(f)

In [52]:
#nun betrachen wir ob eine Vorhersage treffen können
d=sc.transform([[1,85.0,66.0,29.00000,155.548223,26.6,0.351,31]])
d

array([[-8.44885053e-01, -1.19914997e+00, -5.20343817e-01,
        -1.74633816e-02, -4.12229492e-09, -8.52531177e-01,
        -3.65060778e-01, -1.90671905e-01]])

In [60]:
final=mp.predict(d)
print('his data belongs to class',final[0])

this data belongs to class 0
