# Problem Statement
- The objective of the dataset is to diagnostically predict whether or not a patient has diabetes, based on certain diagnostic measurements included in the dataset. All patients are females of at least 21 years of age and of Pima Indian heritage.
- The dataset consists of several medical predictor variables and one target variable, Outcome. Predictor variables include the number of pregnancies the patients have had, their BMI, insulin level, age, and so on

In [2]:
# Import the Required Liabraries
import pandas as pd
from sklearn import model_selection

# For AdaBoost
from sklearn.ensemble import AdaBoostClassifier

# For XGBoost
from sklearn import svm
from xgboost import XGBClassifier

In [3]:
df = pd.read_csv(r"C:\Users\navee\OneDrive\Desktop\Simplilearn\Machine Learning\dataset\pima-indians-diabetes.csv")

In [4]:
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  
 1   Glucose                   768 non-null    int64  
 2   BloodPressure             768 non-null    int64  
 3   SkinThickness             768 non-null    int64  
 4   Insulin                   768 non-null    int64  
 5   BMI                       768 non-null    float64
 6   DiabetesPedigreeFunction  768 non-null    float64
 7   Age                       768 non-null    int64  
 8   Outcome                   768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


In [6]:
# Extract the values from column in the form of array
array = df.values
X = array[:,0:8]
y = array[:,8]
seed = 7
num_tress = 30

In [7]:
'''Let's create AdaBoost Model, AdaBoost uses decision tree classifier as the default classifier,
Pass the model with in the cross validation score fuction to evaluate the results using the cross validation technique,
Construct the model now by splitting the train test indices in to 10 consecutive folds. Again evaluate the models such
that each fold gets used once as a validation while the remaining Nine folds form the training sets '''
kfold= model_selection.KFold(n_splits=10, random_state=seed,shuffle=True)
model_ADA = AdaBoostClassifier(n_estimators=num_tress)
results_ADA = model_selection.cross_val_score(model_ADA,X,y,cv=kfold)
print(results_ADA)
print(results_ADA.mean())

[0.76623377 0.71428571 0.71428571 0.79220779 0.79220779 0.74025974
 0.68831169 0.77922078 0.80263158 0.76315789]
0.7552802460697198


- AdaBoosts gives an accuracy of around 76%

In [14]:
# Similarly we apply the XGBoost Algorithm

# Intialize the XGBoost Classifier under the name model XG
model_XG = XGBClassifier()
seed = 7
num_tress = 30

# Costruct the XGB Classifier using the K-Folds technique such that number of folds equals to 10, evaluate 
# the model using the cross validation score similar to the way we did for AdaBoost Classifier
kfold= model_selection.KFold(n_splits=10)
model_XG = XGBClassifier(n_estimators=num_tress, random_state=seed)
results_XG = model_selection.cross_val_score(model_XG,X,y,cv=kfold)
print(results_XG)
print(results_XG.mean())

[0.71428571 0.79220779 0.68831169 0.66233766 0.79220779 0.79220779
 0.79220779 0.80519481 0.68421053 0.77631579]
0.7499487354750513


-  XGBoost gives an accuracy around of 75%

- The accuracy of both models is almost the same 

# ___Summary Table___

### Step-1 Import the Required Liabraries
### Step-2 Build model using AdaBoost and XGBoost
### Step-3 Compare the Accuracy of Classifiers