Q1. Implement K-NN classification algorithm with Euclidean and Manhattan Distance metrics. The program should be generic, should work for any k values on Iris dataset.

• Keep 80% of samples for training and rest for testing

• Show the results using both distance metrics.

• Compare your results with Scikit/SKlearn Library function.



In [21]:
import numpy as np 
import pandas as pd 
from sklearn.metrics import accuracy_score
from scipy.spatial import distance

In [22]:
class KNNClassifier:
    def __init__(self, k, distance_metric='euclidean'):
        self.k = k
        self.distance_metric = distance_metric

    def fit(self, X_train, y_train):
        self.X_train = X_train
        self.y_train = y_train

    def predict(self, X_test):
        y_pred = []
        for x in X_test:
            distances = []
            for i, x_train in enumerate(self.X_train):
                if self.distance_metric == 'euclidean':
                    dist = distance.euclidean(x, x_train)
                elif self.distance_metric == 'manhattan':
                    dist = distance.cityblock(x, x_train)
                distances.append((dist, self.y_train[i]))

            distances.sort(key=lambda x: x[0])
            k_nearest = distances[:self.k]
            k_nearest_labels = [label for (_, label) in k_nearest]
            y_pred.append(max(set(k_nearest_labels), key=k_nearest_labels.count))

        return y_pred

In [23]:
from sklearn.datasets import load_iris
iris  = load_iris()
X, y = iris.data, iris.target

In [24]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=42)

In [25]:
#Euclidean Distance
knn_euclidean = KNNClassifier(k=5, distance_metric='euclidean')
knn_euclidean.fit(X_train, y_train)
y_pred_euclidean = knn_euclidean.predict(X_test)

In [26]:
#Manhattan Distance
knn_manhattan = KNNClassifier(k=5, distance_metric='manhattan')
knn_manhattan.fit(X_train, y_train)
y_pred_manhattan = knn_manhattan.predict(X_test)

In [27]:
#accuracy scores
from sklearn.metrics import accuracy_score
accuracy_euclidean = accuracy_score(y_test, y_pred_euclidean)
accuracy_manhattan = accuracy_score(y_test, y_pred_manhattan)
print("Euclidean Accuracy", round(accuracy_euclidean,4))
print("Manhattan Accuracy", round(accuracy_manhattan,4))

Euclidean Accuracy 0.9667
Manhattan Accuracy 0.9583


In [28]:
#using Scikit library
from sklearn.neighbors import KNeighborsClassifier
knn_sklearn = KNeighborsClassifier(n_neighbors=5, metric='euclidean')
knn_sklearn.fit(X_train, y_train)
y_pred_sklearn = knn_sklearn.predict(X_test)

accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
print("scikit-learn Accuracy", round(accuracy_sklearn,4))

scikit-learn Accuracy 0.9667


Q2. Modify your K-NN implementation for regression problem.
•	Make an auxiliary dataset from Iris.csv file consisting of only sepal length and sepal width. Assume you want to predict sepal width based on sepal length values.
•	Keep 80% of samples for training and rest for testing
•	Show the results using Euclidean metric and different K-values. 
•	Use appropriate Scikit/SKlearn Library function to apply K-NN regression on the given dataset and compare the results with your implementation.


In [29]:
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

In [30]:
iris_df = pd.read_csv('Iris.csv')

In [31]:
iris_df

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica


In [32]:
auxiliary_data = iris_df[['SepalLengthCm', 'SepalWidthCm']]

In [33]:
X_train, X_test, y_train, y_test = train_test_split(auxiliary_data['SepalLengthCm'], auxiliary_data['SepalWidthCm'], test_size=0.8)

In [34]:
knn_regressor = KNeighborsRegressor(n_neighbors=5)
X_train = X_train.values.reshape(-1, 1)

In [35]:
knn_regressor.fit(X_train, y_train)

In [36]:
X_test_reshaped = X_test.values.reshape(-1, 1)
y_pred = knn_regressor.predict(X_test_reshaped)

In [37]:
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

In [38]:
print("Root mean square error:", round(rmse,4))

Root mean square error: 0.4401


In [39]:
sklearn_knn_regressor = KNeighborsRegressor(n_neighbors=5)
sklearn_knn_regressor.fit(X_train, y_train)
sklearn_y_pred = sklearn_knn_regressor.predict(X_test_reshaped)
sklearn_rmse = np.sqrt(mean_squared_error(y_test, sklearn_y_pred))

In [40]:
print("Root mean squre error using Scikit Library:", round(sklearn_rmse,4))

Root mean squre error using Scikit Library: 0.4401
