## KNN Assignment - 3
**By Shahequa Modabbera**

### Q1. Write a Python code to implement the KNN classifier algorithm on load_iris dataset in sklearn.datasets.

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the dataset
iris = load_iris()

# Split the dataset into training  and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Create a KNN classifier object
knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier
knn = knn.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn.predict(X_test)

# Calculate the accuracy of the classifier
accuracy= accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 1.0


In this code, we first load the `load_iris` dataset from sklearn.datasets. Then, we split the dataset into training and testing sets using `train_test_split` function.

Next, we create a `KNeighborsClassifier` object, specifying the number of neighbors (k) as n_neighbors=3. We then train the classifier using the training data by calling the `fit` method.

After training, we use the trained classifier to make predictions on the test set by calling the predict method. The predicted labels are stored in `y_pred`.

Finally, we calculated the accuracy of the classifier by comparing the predicted labels with the true labels from the test set using the `accuracy_score` function. The accuracy is then printed.

### Q2. Write a Python code to implement the KNN regressor algorithm on load_boston dataset in sklearn.datasets.

- load_boston dataset is not available in sklearn.datsets. So, we are using fetch_openml to load the boston dataset.

In [2]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Load the Boston Housing dataset
boston = fetch_openml(data_id=531)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42)

# Create a KNN regressor object
knn = KNeighborsRegressor(n_neighbors=3)

# Train the regressor
knn.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn.predict(X_test)

# Calculate the mean squared error
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

  warn(


Mean Squared Error: 21.65955337690632


### Q3. Write a Python code snippet to find the optimal value of K for the KNN classifier algorithm using cross-validation on load_iris dataset in sklearn.datasets.

In [3]:
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Define the range of K values to try
k_values = range(1,31)

# Initialize an empty list to score the cross-validation scores
cross_val_scores = []

# Perform cross-validation for each K value
for k in k_values:
    # Create a KNN classifier with the current K value
    knn = KNeighborsClassifier(n_neighbors=k)
    
    # Perform cross-validation with 5 folds
    scores = cross_val_score(knn, X, y, cv=5)
    
    # Calculate the mean cross-validation score
    mean_score = scores.mean()
    
    # Append the mean score to the list of cross-validation scores
    cross_val_scores.append(mean_score)

# Find the index of the maximum scros-validation scores
best_index = cross_val_scores.index(max(cross_val_scores))
    
# Find the optimal K value
optimal_k = k_values[best_index]

# print the optimal K valur
print("Optimal K value:", optimal_k)

Optimal K value: 6


### Q4. Implement the KNN regressor algorithm with feature scaling on load_boston dataset in sklearn.datasets.

In [4]:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

# Load the Boston Housing dataset
boston = fetch_openml(data_id=531)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.2, random_state=42)

# Perform feature scaling on the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize the KNN regressor with the desired number of neighbors
k = 5
knn_regressor = KNeighborsRegressor(n_neighbors=k)

# Fit the regressor to the scaled training data
knn_regressor.fit(X_train_scaled, y_train)

# Predict the target values for the test data
y_pred = knn_regressor.predict(X_test_scaled)

# Calculate the MSE  to evaluate the model performance
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

Mean Squared Error: 20.60552941176471


  warn(


### Q5. Write a Python code snippet to implement the KNN classifier algorithm with weighted voting on load_iris dataset in sklearn.datasets.

In [5]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the KNN classifier with the desired number of neighbors and weights
k = 5
weights = 'distance' # Weighted voting based on distance
knn_classifier = KNeighborsClassifier(n_neighbors=k, weights=weights)

# Fit the classifier to the training data
knn_classifier.fit(X_train, y_train)

# Predict the class labels for the test data
y_pred = knn_classifier.predict(X_test)

# Calculate the accuracy of the model to evaluate its performance
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 1.0


### Q6. Implement a function to standardise the features before applying KNN classifier.

In [6]:
from sklearn.preprocessing import StandardScaler

def standardize_features(X_train, X_test):
    # Initialize the StandardScaler object
    scaler = StandardScaler()
    
    # Fit the scaler to the training data and transform the training features
    X_train_std = scaler.fit_transform(X_train)

    # Transform the test features using the fitted scaler from training data
    X_test_std = scaler.transform(X_test)
    
    return X_train_std, X_test_std

# Implementing on the iris dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#Standardize the features
X_train_std, X_test_std = standardize_features(X_train, X_test)

# Initialize and fit the KNN classifier with standardized features
knn_classifier = KNeighborsClassifier(n_neighbors=5)
knn_classifier.fit(X_train_std, y_train)

# Predict the class labels for the test data
y_pred = knn_classifier.predict(X_test_std)

# Calculate the accuracy of the model to evaluate its performance
accuracy = accuracy_score(y_test, y_pred)
print("Accurace:", accuracy)

Accurace: 1.0


In this code, the `standardize_features` function takes `X_train` and `X_test` as input, and it returns the standardized versions of the features. It uses the `StandardScaler` from `sklearn.preprocessing` to perform the standardization. The `fit_transform` method is used on the training data to fit the scaler and transform the training features, while the `transform` method is used on the test data to transform the test features using the fitted scaler from the training data.

By standardizing the features, we ensure that all features have a mean of 0 and a standard deviation of 1, which helps in bringing the features to a similar scale and can improve the performance of the KNN classifier.

### Q7. Write a Python function to calculate the euclidean distance between two points.

In [7]:
import math

def euclidean_distance(point1, point2):
    """
    Calculate the Euclidean distance between two points.
    
    Arguments:
    point1 -- tuple or list containing the coordinates of the first point (e.g. [x1, y1])
    point2 -- tuple or list containing the coordinates of the second point (e.g. [x2, y2])
    
    Returns:
    The Euclidean distance between the two points.
    """
    distance = 0
    for i in range(len(point1)):
        distance += (point1[i] - point2[i]) ** 2
    return math.sqrt(distance)

point1= [2,3]
point2 = [5,7]

distance = euclidean_distance(point1, point2)
print(distance)

5.0


### Q8. Write a Python function to calculate the manhattan distance between two points.

In [8]:
def manhattan_distance(point1, point2):
    """
    Calculate the Manhattan distance between two points.
    
    Arguments:
    point1 -- tuple or list containing the coordinates of the first point (e.g. [x1, y1])
    point2 -- tuple or list containing the coordinates of the second point (e.g. [x2, y2])
    
    Returns:
    The Manhattan distance between the two points.
    """
    
    distance = 0
    for i in range(len(point1)):
        distance += abs(point1[i] - point2[i])
    return distance

point1= [2,3]
point2 = [5,7]

distance = manhattan_distance(point1, point2)
print(distance)

7
