In [None]:
##Q1.

Certainly! Here's an example of Python code that implements the K-Nearest Neighbors (KNN) classifier algorithm on the load_iris dataset from sklearn.datasets:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Input features
y = iris.target  # Target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a KNN classifier with k=3
knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier on the training set
knn.fit(X_train, y_train)

# Predict the classes for the test set
y_pred = knn.predict(X_test)

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


In this code, we first import the necessary modules from sklearn for loading the dataset, splitting it into training and testing sets, creating the KNN classifier, and calculating the accuracy. We then load the load_iris dataset and assign the input features to X and the target variable to y.

Next, we split the dataset into training and testing sets using the train_test_split function, where 80% of the data is used for training and 20% for testing. We create a KNN classifier object with n_neighbors=3, indicating that it should consider the three nearest neighbors for classification.

After that, we train the KNN classifier on the training set using the fit method. We then use the trained classifier to predict the classes for the test set with the predict method. Finally, we calculate the accuracy of the classifier by comparing the predicted classes with the actual classes and printing the result.

Note that the random state is set to 42 in the train_test_split function to ensure reproducibility. Feel free to adjust the parameters and experiment with different values to see how it affects the accuracy.


In [None]:
##Q2.

ertainly! Here's an example of Python code that implements the K-Nearest Neighbors (KNN) regressor algorithm on the load_boston dataset from sklearn.datasets:

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Load the Boston Housing dataset
boston = load_boston()
X = boston.data  # Input features
y = boston.target  # Target variable (housing prices)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a KNN regressor with k=5
knn = KNeighborsRegressor(n_neighbors=5)

# Train the regressor on the training set
knn.fit(X_train, y_train)

# Predict the target variable for the test set
y_pred = knn.predict(X_test)

# Calculate the mean squared error of the regressor
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

In this code, we first import the necessary modules from sklearn for loading the dataset, splitting it into training and testing sets, creating the KNN regressor, and calculating the mean squared error. We then load the load_boston dataset and assign the input features to X and the target variable (housing prices) to y.

Next, we split the dataset into training and testing sets using the train_test_split function, where 80% of the data is used for training and 20% for testing. We create a KNN regressor object with n_neighbors=5, indicating that it should consider the five nearest neighbors for regression.

After that, we train the KNN regressor on the training set using the fit method. We then use the trained regressor to predict the target variable for the test set with the predict method. Finally, we calculate the mean squared error of the regressor by comparing the predicted values with the actual values and printing the result.

Note that the random state is set to 42 in the train_test_split function to ensure reproducibility. Feel free to adjust the parameters and experiment with different values to see how it affects the mean squared error.


In [None]:
##Q3.

Certainly! Here's a Python code snippet that uses cross-validation to find the optimal value of K for the K-Nearest Neighbors (KNN) classifier algorithm on the load_iris dataset from sklearn.datasets

from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.neighbors import KNeighborsClassifier

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Input features
y = iris.target  # Target variable

# Define the range of K values to try
k_values = list(range(1, 21))

# Perform cross-validation for each K value
cv_scores = []
for k in k_values:
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn, X, y, cv=5)  # 5-fold cross-validation
    cv_scores.append(scores.mean())

# Find the optimal K value with the highest cross-validated score
optimal_k = k_values[cv_scores.index(max(cv_scores))]
print("Optimal K:", optimal_k)


In this code, we first import the necessary modules from sklearn for loading the dataset, performing cross-validation, and creating the KNN classifier. We then load the load_iris dataset and assign the input features to X and the target variable to y.

Next, we define a range of K values to try, in this case, from 1 to 20. For each K value, we create a KNN classifier object and use the cross_val_score function to perform 5-fold cross-validation. The cross_val_score function splits the data into 5 folds, trains the KNN classifier on 4 folds, and evaluates its performance on the remaining fold. This process is repeated 5 times, with each fold serving as the test set once.

We store the mean cross-validated scores for each K value in the cv_scores list. Finally, we find the optimal K value by selecting the one with the highest cross-validated score, and print it as the result.

Note that you can adjust the range of K values and the number of folds (by modifying the cv parameter in cross_val_score) according to your needs.


In [None]:
##Q4.

To implement the K-Nearest Neighbors (KNN) regressor algorithm with feature scaling on the load_boston dataset from sklearn.datasets, you can use the StandardScaler class from sklearn.preprocessing to scale the input features. Here's an example of Python code:

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

# Load the Boston Housing dataset
boston = load_boston()
X = boston.data  # Input features
y = boston.target  # Target variable (housing prices)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Perform feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create a KNN regressor with k=5
knn = KNeighborsRegressor(n_neighbors=5)

# Train the regressor on the scaled training set
knn.fit(X_train_scaled, y_train)

# Predict the target variable for the scaled test set
y_pred = knn.predict(X_test_scaled)

# Calculate the mean squared error of the regressor
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

In this code, we first import the necessary modules from sklearn for loading the dataset, splitting it into training and testing sets, creating the KNN regressor, performing feature scaling, and calculating the mean squared error. We then load the load_boston dataset and assign the input features to X and the target variable (housing prices) to y.

Next, we split the dataset into training and testing sets using the train_test_split function, where 80% of the data is used for training and 20% for testing. We then create a StandardScaler object and use it to perform feature scaling on the training set with the fit_transform method. We apply the same scaling transformation to the test set using the transform method.

After that, we create a KNN regressor object with n_neighbors=5. We train the regressor on the scaled training set using the fit method. We then use the trained regressor to predict the target variable for the scaled test set with the predict method. Finally, we calculate the mean squared error of the regressor by comparing the predicted values with the actual values and printing the result.

Using feature scaling is beneficial for KNN as it helps to normalize the input features and bring them to a similar scale, which can improve the performance of the algorithm.


In [None]:
##Q5.

To implement the K-Nearest Neighbors (KNN) classifier algorithm with weighted voting on the load_iris dataset from sklearn.datasets, you can use the KNeighborsClassifier class from sklearn.neighbors and set the weights parameter to "distance". This will assign weights to each neighbor based on their distance to the query point. Here's an example of Python code:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Input features
y = iris.target  # Target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a KNN classifier with weighted voting
knn = KNeighborsClassifier(n_neighbors=3, weights="distance")

# Train the classifier on the training set
knn.fit(X_train, y_train)

# Predict the classes for the test set
y_pred = knn.predict(X_test)

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


In this code, we first import the necessary modules from sklearn for loading the dataset, splitting it into training and testing sets, creating the KNN classifier, and calculating the accuracy. We then load the load_iris dataset and assign the input features to X and the target variable to y.

Next, we split the dataset into training and testing sets using the train_test_split function, where 80% of the data is used for training and 20% for testing. We create a KNN classifier object with n_neighbors=3 and set the weights parameter to "distance", indicating that the weights for each neighbor will be inversely proportional to their distance.

After that, we train the KNN classifier on the training set using the fit method. We then use the trained classifier to predict the classes for the test set with the predict method. Finally, we calculate the accuracy of the classifier by comparing the predicted classes with the actual classes and printing the result.

Note that you can adjust the n_neighbors parameter and experiment with different values to see how it affects the accuracy.



In [None]:
##Q6.

To standardize the features before applying the K-Nearest Neighbors (KNN) classifier, you can use the StandardScaler class from sklearn.preprocessing. Here's an example of a Python function that standardizes the features:


from sklearn.preprocessing import StandardScaler

def standardize_features(X_train, X_test):
    """
    Standardizes the features using the StandardScaler.

    Parameters:
        X_train (numpy.ndarray): Training features.
        X_test (numpy.ndarray): Test features.

    Returns:
        numpy.ndarray: Standardized training features.
        numpy.ndarray: Standardized test features.
    """
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    return X_train_scaled, X_test_scaled

ou can use this function in conjunction with the KNN classifier to standardize the features before training and testing the classifier. Here's an example of how you can apply it to the load_iris dataset:


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Input features
y = iris.target  # Target variable

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
X_train_scaled, X_test_scaled = standardize_features(X_train, X_test)

# Create a KNN classifier
knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier on the standardized training set
knn.fit(X_train_scaled, y_train)

# Predict the classes for the standardized test set
y_pred = knn.predict(X_test_scaled)

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

In this code, the standardize_features function takes the training features X_train and test features X_test as input. It creates a StandardScaler object, fits it on the training features using fit_transform, and applies the scaling transformation to both the training and test features using transform. It then returns the standardized training and test features.

In the main code, after loading the load_iris dataset and splitting it into training and testing sets, the standardize_features function is called to obtain the standardized features. The KNN classifier is then created and trained on the standardized training set. Finally, the classifier is used to predict the classes for the standardized test set, and the accuracy is calculated and printed.

Standardizing the features before applying the KNN classifier can be beneficial as it ensures that all features are on a similar scale, which can improve the performance of the algorithm and prevent any features from dominating the distance calculations.


In [None]:
##Q7.

Certainly! Here's a Python function that calculates the Euclidean distance between two points:

import math

def euclidean_distance(point1, point2):
    """
    Calculates the Euclidean distance between two points.

    Parameters:
        point1 (list or tuple): Coordinates of the first point.
        point2 (list or tuple): Coordinates of the second point.

    Returns:
        float: Euclidean distance between the two points.
    """
    # Ensure both points have the same dimensionality
    assert len(point1) == len(point2), "Points must have the same dimensionality"

    # Calculate the sum of squared differences for each dimension
    squared_diff = sum([(p1 - p2) ** 2 for p1, p2 in zip(point1, point2)])

    # Take the square root of the sum of squared differences
    distance = math.sqrt(squared_diff)

    return distance

In this function, we first import the math module to use the sqrt function for taking the square root. The euclidean_distance function takes two points, point1 and point2, as input.

To ensure that both points have the same dimensionality, we use an assertion statement to check if the lengths of point1 and point2 are equal. If they are not, an assertion error will be raised.

Next, we calculate the squared differences between the corresponding coordinates of the two points using a list comprehension. The zip function is used to iterate over the coordinates of point1 and point2 simultaneously.

We then sum up the squared differences to obtain the sum of squared differences for each dimension.

Finally, we take the square root of the sum of squared differences using the math.sqrt function, and return the resulting Euclidean distance.

You can use this euclidean_distance function to calculate the Euclidean distance between any two points in your code.


In [None]:
##Q8.

Certainly! Here's a Python function that calculates the Manhattan distance between two points:

def manhattan_distance(point1, point2):
    """
    Calculates the Manhattan distance between two points.

    Parameters:
        point1 (list or tuple): Coordinates of the first point.
        point2 (list or tuple): Coordinates of the second point.

    Returns:
        float: Manhattan distance between the two points.
    """
    # Ensure both points have the same dimensionality
    assert len(point1) == len(point2), "Points must have the same dimensionality"

    # Calculate the sum of absolute differences for each dimension
    abs_diff = sum([abs(p1 - p2) for p1, p2 in zip(point1, point2)])

    return abs_diff