You are given a dataset called ’Diabetes.csv’.
- Load the dataset
- Print the first ten observations on the screen
-  Check the shape of the dataset
-  Print the column names of the dataset
-  Split the dataset into feature set (X) and the target variable (y). The target variable is ‘OnDiab’, indicating onset of diabetes within five years.
-  Find the unique number of classes for the target variable.
-  Split the dataset into training and test datasets
-  Train KNearestNeighbor classifer on your train dataset and print the score on the the test dataset. Set number of neighbors to 5.
- Import GridSearchCV from `sklearn.modelselection`
-  Split your data into train and test datasets
-  For neighbors=1 to 30, compute GridSearchCV for train dataset with kfold=10. 12. Print the best cross validation score
-  Print the best parameter
-  Print the test score

In [1]:
import pandas as pd

# Load the dataset
df = pd.read_csv('Diabetes.csv')

# Print the first ten observations
print(df.head(10))

# Check the shape of the dataset
print("Shape of the dataset:", df.shape)

# Print the column names of the dataset
print("Column names:", df.columns.tolist())

    6  148  72  35    0  33.6  0.627  50  1
0   1   85  66  29    0  26.6  0.351  31  0
1   8  183  64   0    0  23.3  0.672  32  1
2   1   89  66  23   94  28.1  0.167  21  0
3   0  137  40  35  168  43.1  2.288  33  1
4   5  116  74   0    0  25.6  0.201  30  0
5   3   78  50  32   88  31.0  0.248  26  1
6  10  115   0   0    0  35.3  0.134  29  0
7   2  197  70  45  543  30.5  0.158  53  1
8   8  125  96   0    0   0.0  0.232  54  1
9   4  110  92   0    0  37.6  0.191  30  0
Shape of the dataset: (767, 9)
Column names: ['6', '148', '72', '35', '0', '33.6', '0.627', '50', '1']


In [None]:
# Split the dataset into feature set (X) and target variable (y)
X = df.drop(columns=['OnDiab'])
y = df['OnDiab']

unique_classes = y.unique()
print("Unique classes in the target variable:", unique_classes)
print("Number of unique classes:", len(unique_classes))

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Print the score on the test dataset
test_score = knn.score(X_test, y_test)
print("Test score with K=5:", test_score)

In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {'n_neighbors': range(1, 31)}

# Create a GridSearchCV object with KNeighborsClassifier
grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=10)

grid_search.fit(X_train, y_train)

print("Best cross-validation score:", grid_search.best_score_)

print("Best parameter (n_neighbors):", grid_search.best_params_)

best_knn = grid_search.best_estimator_
test_score_best = best_knn.score(X_test, y_test)
print("Test score with best parameter:", test_score_best)