<h3>AIM :To implement a Machine Learning Classification model using a Decision Tree Classifier algorithm and enhance the model by K Fold and GridSearchCV cross-validation.</h3>

<h4>Importing the libraries</h4>

In [27]:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

<h4>Load the dataset</h4>

In [29]:
file_path = './CSV files/Diabetes.csv'
data = pd.read_csv(file_path)
data

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


<h4>Split the data into features (X) and target variable (y)</h4>

In [30]:
X = data.drop('Outcome', axis=1)
y = data['Outcome']

<h4>Standardize the features</h4>

In [31]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

<h4>Split the data into training and testing sets<</h4>

In [32]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

<h4>Implement Decision Tree Classifier with GridSearchCV</h4>

In [34]:
dt_classifier = DecisionTreeClassifier()
param_grid_dt = {
    'max_depth': [3, 5, 7, 10],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}
grid_search_dt = GridSearchCV(estimator=dt_classifier, param_grid=param_grid_dt, cv=5)
grid_search_dt.fit(X_train, y_train)
best_dt_classifier = grid_search_dt.best_estimator_

<h4>Perform K-Fold cross-validation for K Nearest Neighbors Classifier</h4>

In [36]:
knn_classifier = KNeighborsClassifier()
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
cv_results_knn = cross_val_score(knn_classifier, X_scaled, y, cv=kfold)

In [37]:
print("Mean Cross-validation Accuracy for KNN Classifier:", cv_results_knn.mean())

Mean Cross-validation Accuracy for KNN Classifier: 0.7200831847890672


<h4> Predictions using K Nearest Neighbors Classifier</h4>

In [38]:
knn_classifier.fit(X_train, y_train)
y_pred_knn = knn_classifier.predict(X_test)

<h4>Calculate metrics for K Nearest Neighbors Classifier</h4>

In [40]:
accuracy_knn = accuracy_score(y_test, y_pred_knn)
precision_knn = precision_score(y_test, y_pred_knn)
recall_knn = recall_score(y_test, y_pred_knn)
f1_knn = f1_score(y_test, y_pred_knn)
print("\nK Nearest Neighbors Classifier Metrics:")
print("Accuracy:", accuracy_knn)
print("Precision:", precision_knn)
print("Recall:", recall_knn)
print("F1-score:", f1_knn)


K Nearest Neighbors Classifier Metrics:
Accuracy: 0.6883116883116883
Precision: 0.574468085106383
Recall: 0.4909090909090909
F1-score: 0.5294117647058822
