# Metadata

**Python Topics** - ml_principles > knn_classification

**Type** - modification

**Target Number of Turns (User + Assistant)** - 2+


# Conversation

**User**

I need to perform a grid search to find the best number of neighbors for my KNN classifier. Can you show me how to integrate GridSearchCV for this purpose? Here's my current setup: from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(X_train, y_train)


**Assistant**

Certainly! You can use `GridSearchCV` from the `sklearn.model_selection` module to perform a grid search and find the best hyperparameters for your KNN classifier. Here's an example of how to integrate GridSearchCV into your setup -
```python
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split

# Create a KNN classifier
knn = KNeighborsClassifier()

# Define the parameter grid to search
param_grid = {'n_neighbors': [1, 3, 5, 7, 9, 11]}  # Adjust the list of values as needed

# Create the GridSearchCV object
grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy')

# Fit the GridSearchCV object to the data
grid_search.fit(X_train, y_train)

# Print the best hyperparameters
print("Best Number of Neighbors:", grid_search.best_params_['n_neighbors'])


**User**

Can you show this on iris dataset?

**Assistant**

Certainly! Here's an example of how to perform a grid search for the best number of neighbors on the Iris dataset using GridSearchCV with a KNeighborsClassifier -

In [None]:
# Assistant

from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a KNN classifier
knn = KNeighborsClassifier()

# Define the parameter grid to search
param_grid = {'n_neighbors': [1, 3, 5, 7, 9]}

# Create the GridSearchCV object
grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy')

# Fit the GridSearchCV object to the data
grid_search.fit(X_train, y_train)

# Print the best hyperparameters
best_n_neighbors = grid_search.best_params_['n_neighbors']
print("Best Number of Neighbors:", best_n_neighbors)

# Access the best model (fitted with the best hyperparameters)
best_knn = grid_search.best_estimator_

# Evaluate the best model on the test set
y_pred = best_knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Test Accuracy with Best Model:", accuracy)

Best Number of Neighbors: 3
Test Accuracy with Best Model: 1.0


**User**

What other hyperparameters can I consider while tuning a KNN Classifier?

**Assistant**

Based on the official sklearn documentation, here are some key hyperparameters for the `KNeighborsClassifier` -

1. `weights`: Specifies the weight assigned to neighbors when making predictions (`uniform` for equal weight, `distance` for inverse distance weighting).
2. `p`: Defines the power parameter for the Minkowski distance metric (1 for Manhattan distance, 2 for Euclidean distance).
3. `algorithm`: Algorithm for computing nearest neighbors ('auto', 'ball_tree', 'kd_tree', 'brute').
4. `leaf_size`: Controls the size of leaf nodes in tree-based algorithms.
5. `metric`: Specifies the distance metric used for the tree data structure.
6. `n_jobs`: Determines the number of parallel jobs for neighbor search (-1 uses all available CPU cores).
7. `outlier_label`: Assigns a label to outlier samples.

Tuning these hyperparameters can optimize the performance of the KNN classifier based on the characteristics of your dataset and problem requirements.
