Step 1: Import Required Libraries
First, we need to import the required libraries for implementing the KNN algorithm. In this tutorial, we will use NumPy for numerical computations and Pandas for data manipulation

In [1]:
import numpy as np
import pandas as pd

Step 2: Load the Dataset
Next, we need to load the dataset into a Pandas DataFrame. In this example, we will use the famous Iris dataset, which contains information about the sepal length, sepal width, petal length, and petal width of three different types of iris flowers.

In [2]:
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None, names=['sepal length', 'sepal width', 'petal length', 'petal width', 'class'])

Step 3: Split the Dataset into Training and Test Sets
We need to split the dataset into a training set and a test set. In this example, we will use 70% of the data for training and 30% for testing.

In [3]:
from sklearn.model_selection import train_test_split

X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)


Step 4: Normalize the Data
Next, we need to normalize the data to ensure that all features have the same scale. In this example, we will use the Min-Max scaling method to normalize the data.

In [4]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Step 5: Implement the KNN Algorithm
Now we can implement the KNN algorithm using scikit-learn. We will use the KNeighborsClassifier class, which takes two parameters: the number of neighbors (k) and the distance metric (we will use Euclidean distance).

In [5]:
from sklearn.neighbors import KNeighborsClassifier

k = 5
metric = 'euclidean'

knn = KNeighborsClassifier(n_neighbors=k, metric=metric)
knn.fit(X_train, y_train)

KNeighborsClassifier(metric='euclidean')

Step 6: Evaluate the Model
Finally, we need to evaluate the performance of the model on the test set. We will use the accuracy score as the performance metric.

In [7]:
from sklearn.metrics import accuracy_score

y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred) #y_test -- actual and y_pred-- model (actual -model)

print('Accuracy:', accuracy)


Accuracy: 0.9777777777777777


That's it! You have successfully implemented the KNN algorithm in Python. Here's the complete code:

In [10]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the dataset
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None, names=['sepal length', 'sepal width', 'petal length', 'petal width', 'class'])

# Split the dataset into training and test sets
X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1, stratify=y)

# Normalize the data
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Implement the KNN algorithm
k = 5
metric = 'euclidean'
knn = KNeighborsClassifier(n_neighbors=k, metric=metric)
knn.fit(X_train, y_train)

# Evaluate the model
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print('Accuracy:', accuracy)


Accuracy: 0.9777777777777777
