# Module72 KNN Assignment3

Q1. Write a Python code to implement the KNN classifier algorithm on load_iris dataset in
sklearn.datasets.

In [3]:
# A1. Implementing KNN Classifier on the load_iris dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train KNN classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy of KNN Classifier on Iris dataset:", accuracy)


Accuracy of KNN Classifier on Iris dataset: 1.0


Q2. Write a Python code to implement the KNN regressor algorithm on load_boston dataset in
sklearn.datasets.

In [4]:
# A2: Implementing KNN Regressor on the load_boston dataset
from sklearn.datasets import load_boston
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Load the dataset
boston = load_boston()
X, y = boston.data, boston.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize and train KNN regressor
knn_reg = KNeighborsRegressor(n_neighbors=3)
knn_reg.fit(X_train, y_train)

# Make predictions
y_pred = knn_reg.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error of KNN Regressor on Boston dataset:", mse)

ImportError: 
`load_boston` has been removed from scikit-learn since version 1.2.

The Boston housing prices dataset has an ethical problem: as
investigated in [1], the authors of this dataset engineered a
non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the
research that led to the creation of this dataset was to study the
impact of air quality but it did not give adequate demonstration of the
validity of this assumption.

The scikit-learn maintainers therefore strongly discourage the use of
this dataset unless the purpose of the code is to study and educate
about ethical issues in data science and machine learning.

In this special case, you can fetch the dataset from the original
source::

    import pandas as pd
    import numpy as np

    data_url = "http://lib.stat.cmu.edu/datasets/boston"
    raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
    data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
    target = raw_df.values[1::2, 2]

Alternative datasets include the California housing dataset and the
Ames housing dataset. You can load the datasets as follows::

    from sklearn.datasets import fetch_california_housing
    housing = fetch_california_housing()

for the California housing dataset and::

    from sklearn.datasets import fetch_openml
    housing = fetch_openml(name="house_prices", as_frame=True)

for the Ames housing dataset.

[1] M Carlisle.
"Racist data destruction?"
<https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8>

[2] Harrison Jr, David, and Daniel L. Rubinfeld.
"Hedonic housing prices and the demand for clean air."
Journal of environmental economics and management 5.1 (1978): 81-102.
<https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air>


Q3. Write a Python code snippet to find the optimal value of K for the KNN classifier algorithm using
cross-validation on load_iris dataset in sklearn.datasets.

In [6]:
# A3: Finding optimal k for KNN Classifier on Iris dataset
from sklearn.model_selection import cross_val_score
import numpy as np

# Test various k values
k_values = range(1, 31)
cv_scores = []

for k in k_values:
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn, X, y, cv=5, scoring='accuracy')
    cv_scores.append(scores.mean())

optimal_k = k_values[np.argmax(cv_scores)]
print("Optimal k value:", optimal_k)

Optimal k value: 6


Q4. Implement the KNN regressor algorithm with feature scaling on load_boston dataset in
sklearn.datasets.

In [10]:
# A4: KNN Regressor with feature scaling on Boston dataset
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train KNN Regressor
knn_reg = KNeighborsRegressor(n_neighbors=3)
knn_reg.fit(X_train_scaled, y_train)

y_pred_scaled = knn_reg.predict(X_test_scaled)
mse_scaled = mean_squared_error(y_test, y_pred_scaled)
print("MSE of scaled KNN Regressor on Boston dataset:", mse_scaled)


MSE of scaled KNN Regressor on Boston dataset: 0.012345679012345675


Q5. Write a Python code snippet to implement the KNN classifier algorithm with weighted voting on
load_iris dataset in sklearn.datasets.

In [11]:
# A5: KNN Classifier with weighted voting on Iris dataset
knn_weighted = KNeighborsClassifier(n_neighbors=3, weights='distance')
knn_weighted.fit(X_train, y_train)

y_pred_weighted = knn_weighted.predict(X_test)
accuracy_weighted = accuracy_score(y_test, y_pred_weighted)
print("Accuracy of weighted KNN Classifier on Iris dataset:", accuracy_weighted)


Accuracy of weighted KNN Classifier on Iris dataset: 1.0


Q6. Implement a function to standardise the features before applying KNN classifier.

In [12]:
# A6: Function to standardize features before KNN Classifier
def standardize_features(X_train, X_test):
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    return X_train_scaled, X_test_scaled

# Example usage
X_train_scaled, X_test_scaled = standardize_features(X_train, X_test)


Q7. Write a Python function to calculate the euclidean distance between two points.

In [13]:
# A7: Function to calculate Euclidean distance between two points
def euclidean_distance(point1, point2):
    return np.sqrt(np.sum((np.array(point1) - np.array(point2))**2))


Q8. Write a Python function to calculate the manhattan distance between two points.

In [14]:
# A8: Function to calculate Manhattan distance between two points
def manhattan_distance(point1, point2):
    return np.sum(np.abs(np.array(point1) - np.array(point2)))

# Example usage for distances
point_a = [1, 2, 3]
point_b = [4, 5, 6]

print("Euclidean Distance:", euclidean_distance(point_a, point_b))
print("Manhattan Distance:", manhattan_distance(point_a, point_b))


Euclidean Distance: 5.196152422706632
Manhattan Distance: 9
