### Hyperparameter:

- They are the parameter specified inside the ML Algorithm, now since these parameter affect the accuracy of the ML algorithm hence we call them as Hyperparameters.
- Tuning : Adjusting

#### Hyperparameter Tuning : To adjust the parameter of any ML Algorithm in such a way that the specified parameter give us the max accuracy.

### Load the standard libraries

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

## Load the data

In [3]:
data = pd.read_csv('Health_Insurance.csv')
data.head()

Unnamed: 0,Age,Purchased
0,19,0
1,35,0
2,26,0
3,27,0
4,19,0


## Seperate X and y

In [4]:
X = data.drop('Purchased', axis = 1)
y = data['Purchased']

### Split the data into train and test set

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Apply KNeighborsClassifier Algorithm on the data

In [7]:
from sklearn.neighbors import KNeighborsClassifier
knc = KNeighborsClassifier()   ### default parameter values are 5 neighbors and Euclidean distance
knc

In [8]:
knc.fit(X_train, y_train)

## Perform predictions on X_test

In [9]:
y_pred = knc.predict(X_test)
y_pred

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1,
       0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1,
       0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1], dtype=int64)

### Perform Evaluations

In [10]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

0.825

#### Note:
Since, KNeighborsClassifier is applied on the data, the accuracy we get is 82.5% when the n_neighbors in KNC is 5 and distance metric used is Euclidean distance.

# Hyperparameter Tuning:

- To figure out excatly at what value of n_neigbors and distance metric does the KNeighborsClassifier Algorithm perform better?

In [18]:
from sklearn.neighbors import KNeighborsClassifier
knc = KNeighborsClassifier()

### Hyperparameter optimization
parameters = {'n_neighbors' : [3, 5, 7, 9, 11, 13, 15, 17, 19, 21]}

### To apply the KNeighborsClassifier on the data with different value of n_neighbors we use the a function called GridSearchCV

In [19]:
from sklearn.model_selection import GridSearchCV
gscv = GridSearchCV(knc, parameters)
gscv

In [20]:
gscv.fit(X_train, y_train)

In [21]:
gscv.cv_results_

{'mean_fit_time': array([0.00315514, 0.00339098, 0.00328689, 0.00227342, 0.00238938,
        0.00208535, 0.00217776, 0.00209789, 0.00205932, 0.00245786]),
 'std_fit_time': array([0.00084826, 0.00094638, 0.00139289, 0.00049971, 0.00063912,
        0.00032821, 0.00019087, 0.00044403, 0.00055071, 0.00022895]),
 'mean_score_time': array([0.00648646, 0.00670638, 0.00502601, 0.00465798, 0.00456047,
        0.00448565, 0.00467649, 0.00485244, 0.00454884, 0.00446653]),
 'std_score_time': array([0.00137384, 0.00194152, 0.00108306, 0.00065522, 0.00032069,
        0.00021238, 0.00055169, 0.00035615, 0.00090963, 0.00028569]),
 'param_n_neighbors': masked_array(data=[3, 5, 7, 9, 11, 13, 15, 17, 19, 21],
              mask=[False, False, False, False, False, False, False, False,
                    False, False],
        fill_value='?',
             dtype=object),
 'params': [{'n_neighbors': 3},
  {'n_neighbors': 5},
  {'n_neighbors': 7},
  {'n_neighbors': 9},
  {'n_neighbors': 11},
  {'n_neighbors'

In [22]:
res = pd.DataFrame(gscv.cv_results_)
res[['params', 'mean_test_score', 'rank_test_score']]

Unnamed: 0,params,mean_test_score,rank_test_score
0,{'n_neighbors': 3},0.753125,10
1,{'n_neighbors': 5},0.803125,7
2,{'n_neighbors': 7},0.8,8
3,{'n_neighbors': 9},0.7875,9
4,{'n_neighbors': 11},0.80625,6
5,{'n_neighbors': 13},0.81875,1
6,{'n_neighbors': 15},0.815625,2
7,{'n_neighbors': 17},0.815625,2
8,{'n_neighbors': 19},0.815625,2
9,{'n_neighbors': 21},0.815625,2


#### At k = 13 the KNN algorithm is performing better on the data. Hence the optimized value of hyperparameter k is 13.

- When the k value is 15 and above the accuracy stops increasing or is constant at k = 15, k = 17, k = 19 etc. This indicates that k = 15 and below is the one where the accuracy is changing

When the k = 13, for k = 13 weather Euclidean distance perform better or Manhattan distance perform better

In [24]:
from sklearn.neighbors import KNeighborsClassifier
knc = KNeighborsClassifier()

## Hyperparameter optimization
parameters = {'n_neighbors' : [3, 5, 7, 9, 11, 13, 15, 17], 'p' : [1, 2]}

In [25]:
from sklearn.model_selection import GridSearchCV
gscv = GridSearchCV(knc, parameters)
gscv

In [26]:
gscv.fit(X_train, y_train)

In [27]:
gscv.cv_results_

{'mean_fit_time': array([0.00370474, 0.00257301, 0.00203681, 0.00261841, 0.00267763,
        0.00252438, 0.00204997, 0.00202341, 0.00211391, 0.00263782,
        0.00337543, 0.00245519, 0.00263124, 0.00417533, 0.00284967,
        0.00286779]),
 'std_fit_time': array([0.00084462, 0.00100119, 0.00039343, 0.00032846, 0.00061057,
        0.00026252, 0.00033765, 0.00071853, 0.00019967, 0.00039414,
        0.00085914, 0.00057716, 0.00088631, 0.00285101, 0.00096726,
        0.00116578]),
 'mean_score_time': array([0.00608783, 0.00479355, 0.00476604, 0.00518718, 0.004598  ,
        0.00415196, 0.00424457, 0.00446463, 0.00509362, 0.00534534,
        0.00622549, 0.00548205, 0.00613298, 0.00710301, 0.00629783,
        0.00524316]),
 'std_score_time': array([0.00126196, 0.00084194, 0.00042638, 0.00182412, 0.00074674,
        0.00020359, 0.00069012, 0.00075283, 0.00128516, 0.00056707,
        0.00115838, 0.00101302, 0.00121056, 0.00298787, 0.00116844,
        0.00054686]),
 'param_n_neighbors': mask

In [28]:
res = pd.DataFrame(gscv.cv_results_)
res

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_n_neighbors,param_p,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.003705,0.000845,0.006088,0.001262,3,1,"{'n_neighbors': 3, 'p': 1}",0.765625,0.71875,0.6875,0.78125,0.8125,0.753125,0.044634,15
1,0.002573,0.001001,0.004794,0.000842,3,2,"{'n_neighbors': 3, 'p': 2}",0.765625,0.71875,0.6875,0.78125,0.8125,0.753125,0.044634,15
2,0.002037,0.000393,0.004766,0.000426,5,1,"{'n_neighbors': 5, 'p': 1}",0.765625,0.75,0.8125,0.796875,0.890625,0.803125,0.049014,9
3,0.002618,0.000328,0.005187,0.001824,5,2,"{'n_neighbors': 5, 'p': 2}",0.765625,0.75,0.8125,0.796875,0.890625,0.803125,0.049014,9
4,0.002678,0.000611,0.004598,0.000747,7,1,"{'n_neighbors': 7, 'p': 1}",0.78125,0.75,0.765625,0.8125,0.890625,0.8,0.049804,11
5,0.002524,0.000263,0.004152,0.000204,7,2,"{'n_neighbors': 7, 'p': 2}",0.78125,0.75,0.765625,0.8125,0.890625,0.8,0.049804,11
6,0.00205,0.000338,0.004245,0.00069,9,1,"{'n_neighbors': 9, 'p': 1}",0.765625,0.765625,0.71875,0.796875,0.890625,0.7875,0.057282,13
7,0.002023,0.000719,0.004465,0.000753,9,2,"{'n_neighbors': 9, 'p': 2}",0.765625,0.765625,0.71875,0.796875,0.890625,0.7875,0.057282,13
8,0.002114,0.0002,0.005094,0.001285,11,1,"{'n_neighbors': 11, 'p': 1}",0.78125,0.765625,0.78125,0.796875,0.90625,0.80625,0.050967,7
9,0.002638,0.000394,0.005345,0.000567,11,2,"{'n_neighbors': 11, 'p': 2}",0.78125,0.765625,0.78125,0.796875,0.90625,0.80625,0.050967,7


In [29]:
res[['params', 'mean_test_score', 'rank_test_score']]

Unnamed: 0,params,mean_test_score,rank_test_score
0,"{'n_neighbors': 3, 'p': 1}",0.753125,15
1,"{'n_neighbors': 3, 'p': 2}",0.753125,15
2,"{'n_neighbors': 5, 'p': 1}",0.803125,9
3,"{'n_neighbors': 5, 'p': 2}",0.803125,9
4,"{'n_neighbors': 7, 'p': 1}",0.8,11
5,"{'n_neighbors': 7, 'p': 2}",0.8,11
6,"{'n_neighbors': 9, 'p': 1}",0.7875,13
7,"{'n_neighbors': 9, 'p': 2}",0.7875,13
8,"{'n_neighbors': 11, 'p': 1}",0.80625,7
9,"{'n_neighbors': 11, 'p': 2}",0.80625,7


### Note:

- Changing the distance from Manhattan to Euclidean distance, the accuracy remains constant indicating
that accuracy is not dependent on the distance metric.
- The accuracy is highest when k = 13 and distance is either Manhattan or Euclidean (any one can be taken)