In [None]:
Q1. Write a Python code to implement the KNN classifier algorithm on load_iris dataset in
sklearn.datasets.

In [None]:
Sure, here's a Python code example that demonstrates how to implement the K-Nearest Neighbors (KNN) classifier algorithm using the load_iris dataset from sklearn.datasets. We'll use the scikit-learn library for this purpose:

In [1]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features (sepal length, sepal width, petal length, petal width)
y = iris.target  # Target variable (species)

# Split the dataset into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the KNN classifier
k = 3  # Number of neighbors to consider (you can adjust this)
knn_classifier = KNeighborsClassifier(n_neighbors=k)

# Fit the classifier on the training data
knn_classifier.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn_classifier.predict(X_test)

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of KNN Classifier with k={k}: {accuracy*100:.2f}%")


Accuracy of KNN Classifier with k=3: 100.00%


In [None]:
In this code:

We first import the necessary libraries, including scikit-learn.
We load the Iris dataset using load_iris(), where X contains the feature data (sepal length, sepal width, petal length, petal width) and y contains the target variable (species).
We split the dataset into a training set and a test set using train_test_split to evaluate the model's performance.
We initialize the KNN classifier with the desired number of neighbors (k).
The classifier is trained on the training data using fit.
We make predictions on the test data using predict.
Finally, we calculate the accuracy of the KNN classifier on the test data using accuracy_score and print the result.
You can adjust the value of k and other hyperparameters to experiment with different settings and evaluate their impact on the classifier's performance

In [None]:
Q2. Write a Python code to implement the KNN regressor algorithm on load_boston dataset in
sklearn.datasets.

In [None]:
Certainly! Here's a Python code example that demonstrates how to implement the K-Nearest Neighbors (KNN) regressor algorithm using the load_boston dataset from sklearn.datasets:

In [2]:
# Import necessary libraries
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Load the Boston Housing dataset
boston = load_boston()
X = boston.data  # Features (e.g., crime rate, number of rooms, etc.)
y = boston.target  # Target variable (median house price)

# Split the dataset into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the KNN regressor
k = 5  # Number of neighbors to consider (you can adjust this)
knn_regressor = KNeighborsRegressor(n_neighbors=k)

# Fit the regressor on the training data
knn_regressor.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn_regressor.predict(X_test)

# Calculate the mean squared error (MSE) to evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error of KNN Regressor with k={k}: {mse:.2f}")


Mean Squared Error of KNN Regressor with k=5: 30.95



    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_ho

In [None]:
In this code:

We import the necessary libraries, including scikit-learn.
We load the Boston Housing dataset using load_boston(), where X contains the feature data (e.g., crime rate, number of rooms) and y contains the target variable (median house price).
We split the dataset into a training set and a test set using train_test_split to evaluate the model's performance.
We initialize the KNN regressor with the desired number of neighbors (k).
The regressor is trained on the training data using fit.
We make predictions on the test data using predict.
Finally, we calculate the mean squared error (MSE) as a measure of the KNN regressor's performance on the test data and print the result.
You can adjust the value of k and other hyperparameters to experiment with different settings and evaluate their impact on the regressor's performance.

In [None]:
Q3. Write a Python code snippet to find the optimal value of K for the KNN classifier algorithm using
cross-validation on load_iris dataset in sklearn.datasets.

In [None]:
To find the optimal value of K for the K-Nearest Neighbors (KNN) classifier algorithm using cross-validation on the load_iris dataset from sklearn.datasets, you can perform a grid search over different K values and use cross-validation to evaluate their performance. Here's a Python code snippet to do that:

In [3]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features (sepal length, sepal width, petal length, petal width)
y = iris.target  # Target variable (species)

# Split the dataset into a training set and a test set (for final evaluation)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define a range of K values to explore
k_values = list(range(1, 21))  # You can adjust the range as needed

# Initialize variables to store results
best_k = None
best_accuracy = 0.0

# Perform cross-validation to find the optimal K
for k in k_values:
    knn_classifier = KNeighborsClassifier(n_neighbors=k)
    # Perform 5-fold cross-validation and compute mean accuracy
    scores = cross_val_score(knn_classifier, X_train, y_train, cv=5)
    mean_accuracy = np.mean(scores)
    if mean_accuracy > best_accuracy:
        best_k = k
        best_accuracy = mean_accuracy

# Train the KNN classifier with the best K on the full training set
best_knn_classifier = KNeighborsClassifier(n_neighbors=best_k)
best_knn_classifier.fit(X_train, y_train)

# Evaluate the model on the test set
test_accuracy = best_knn_classifier.score(X_test, y_test)

print(f"Best K: {best_k}")
print(f"Cross-Validation Accuracy with Best K: {best_accuracy:.2f}")
print(f"Test Accuracy with Best K: {test_accuracy:.2f}")


Best K: 1
Cross-Validation Accuracy with Best K: 0.95
Test Accuracy with Best K: 1.00


In [None]:
In this code:

We import the necessary libraries, load the load_iris dataset, and split it into a training set and a test set.
We define a range of K values to explore (in this example, from 1 to 20).
We perform cross-validation with 5 folds for each K value and compute the mean accuracy across the folds.
We keep track of the K value that yields the highest cross-validation accuracy.
Finally, we train the KNN classifier with the best K on the full training set and evaluate its performance on the test set.
This code will help you find the optimal K value for your KNN classifier using cross-validation on the load_iris dataset.

In [None]:
Q4. Implement the KNN regressor algorithm with feature scaling on load_boston dataset in
sklearn.datasets.

In [None]:
To implement the K-Nearest Neighbors (KNN) regressor algorithm with feature scaling on the load_boston dataset from sklearn.datasets, you can follow these steps. Feature scaling is essential for KNN because it relies on distance metrics, and features with different scales can dominate the distance calculations. Here's a Python code example:

In [4]:
# Import necessary libraries
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

# Load the Boston Housing dataset
boston = load_boston()
X = boston.data  # Features (e.g., crime rate, number of rooms, etc.)
y = boston.target  # Target variable (median house price)

# Split the dataset into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the StandardScaler for feature scaling
scaler = StandardScaler()

# Fit the scaler on the training features and transform both the training and test features
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Initialize the KNN regressor
k = 5  # Number of neighbors to consider (you can adjust this)
knn_regressor = KNeighborsRegressor(n_neighbors=k)

# Fit the regressor on the scaled training data
knn_regressor.fit(X_train_scaled, y_train)

# Make predictions on the scaled test data
y_pred = knn_regressor.predict(X_test_scaled)

# Calculate the mean squared error (MSE) to evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error of KNN Regressor with feature scaling and k={k}: {mse:.2f}")


Mean Squared Error of KNN Regressor with feature scaling and k=5: 18.84



    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np

        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_ho

In [None]:
In this code:

We import the necessary libraries, including scikit-learn.
We load the Boston Housing dataset and split it into a training set and a test set.
We initialize a StandardScaler to perform feature scaling. We fit the scaler on the training features and transform both the training and test features.
We initialize the KNN regressor with the desired number of neighbors (k).
The regressor is trained on the scaled training data using fit.
We make predictions on the scaled test data using predict.
Finally, we calculate the mean squared error (MSE) as a measure of the KNN regressor's performance on the test data and print the result.
By scaling the features using StandardScaler, we ensure that they have zero mean and unit variance, which helps KNN make more accurate predictions. Adjust the value of k and other hyperparameters as needed for your specific task.

In [None]:
Q5. Write a Python code snippet to implement the KNN classifier algorithm with weighted voting on
load_iris dataset in sklearn.datasets.

In [None]:
To implement the K-Nearest Neighbors (KNN) classifier algorithm with weighted voting on the load_iris dataset from sklearn.datasets, you can use the KNeighborsClassifier from scikit-learn and set the weights parameter to 'distance' to enable weighted voting based on the inverse of distances. Here's a Python code snippet:

In [5]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features (sepal length, sepal width, petal length, petal width)
y = iris.target  # Target variable (species)

# Split the dataset into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the KNN classifier with weighted voting (using 'distance' for weights)
k = 3  # Number of neighbors to consider (you can adjust this)
knn_classifier = KNeighborsClassifier(n_neighbors=k, weights='distance')

# Fit the classifier on the training data
knn_classifier.fit(X_train, y_train)

# Make predictions on the test data
y_pred = knn_classifier.predict(X_test)

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of KNN Classifier with weighted voting and k={k}: {accuracy*100:.2f}%")


Accuracy of KNN Classifier with weighted voting and k=3: 100.00%


In [None]:
In this code:

We import the necessary libraries, including scikit-learn.
We load the Iris dataset and split it into a training set and a test set.
We initialize the KNeighborsClassifier with the desired number of neighbors (k) and set the weights parameter to 'distance' to enable weighted voting based on the inverse of distances.
The classifier is trained on the training data using fit.
We make predictions on the test data using predict.
Finally, we calculate and print the accuracy of the KNN classifier with weighted voting on the test data.
By setting the weights parameter to 'distance', closer neighbors will have a greater influence on the predictions, and farther neighbors will have less influence. You can adjust the value of k and other hyperparameters to experiment with different settings as needed for your specific classification task.

In [None]:
Q6. Implement a function to standardise the features before applying KNN classifier.

In [None]:
To standardize the features before applying a K-Nearest Neighbors (KNN) classifier, you can create a function that uses the StandardScaler from scikit-learn. Standardization ensures that the features have zero mean and unit variance, which is important for distance-based algorithms like KNN. Here's a Python function to do this:

In [6]:
from sklearn.preprocessing import StandardScaler

def standardize_features(X_train, X_test):
    """
    Standardizes the features in the training and test datasets using StandardScaler.

    Parameters:
    - X_train: The training feature data (numpy array or pandas DataFrame).
    - X_test: The test feature data (numpy array or pandas DataFrame).

    Returns:
    - X_train_scaled: The standardized training features.
    - X_test_scaled: The standardized test features.
    """
    # Initialize the StandardScaler
    scaler = StandardScaler()

    # Fit and transform the scaler on the training features
    X_train_scaled = scaler.fit_transform(X_train)

    # Transform the test features using the same scaler
    X_test_scaled = scaler.transform(X_test)

    return X_train_scaled, X_test_scaled


In [None]:
You can use this function as follows:

In [7]:
# Example usage
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

# Load the Iris dataset
iris = load_iris()
X = iris.data  # Features (sepal length, sepal width, petal length, petal width)
y = iris.target  # Target variable (species)

# Split the dataset into a training set and a test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features
X_train_scaled, X_test_scaled = standardize_features(X_train, X_test)

# Initialize the KNN classifier
k = 3  # Number of neighbors to consider (you can adjust this)
knn_classifier = KNeighborsClassifier(n_neighbors=k)

# Fit the classifier on the standardized training data
knn_classifier.fit(X_train_scaled, y_train)

# Make predictions on the standardized test data
y_pred = knn_classifier.predict(X_test_scaled)

# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of KNN Classifier with standardized features and k={k}: {accuracy*100:.2f}%")


Accuracy of KNN Classifier with standardized features and k=3: 100.00%


In [None]:
This function standardizes the training and test features using StandardScaler and returns the standardized datasets, which can then be used for KNN classification.






In [None]:
Q7. Write a Python function to calculate the euclidean distance between two points.

In [None]:
You can create a Python function to calculate the Euclidean distance between two points in n-dimensional space using the following formula:

Euclidean Distance (L2 Norm) between two points P and Q in n-dimensional space:

Distance
=
(
�
1
−
�
2
)
2
+
(
�
1
−
�
2
)
2
+
…
+
(
�
1
−
�
2
)
2
Distance= 
(x 
1
​
 −x 
2
​
 ) 
2
 +(y 
1
​
 −y 
2
​
 ) 
2
 +…+(z 
1
​
 −z 
2
​
 ) 
2
 
​
 

Here's a Python function to calculate the Euclidean distance between two points represented as NumPy arrays

In [8]:
import numpy as np

def euclidean_distance(point1, point2):
    """
    Calculate the Euclidean distance between two points represented as NumPy arrays.

    Parameters:
    - point1: NumPy array representing the coordinates of the first point.
    - point2: NumPy array representing the coordinates of the second point.

    Returns:
    - distance: The Euclidean distance between the two points.
    """
    if len(point1) != len(point2):
        raise ValueError("Points must have the same dimensionality")
    
    # Calculate the squared differences for each dimension
    squared_diff = np.square(point1 - point2)

    # Sum the squared differences and take the square root
    distance = np.sqrt(np.sum(squared_diff))

    return distance


In [None]:
You can use this function to calculate the Euclidean distance between two points by passing their coordinates as NumPy arrays. Here's an example of how to use it:

python

In [None]:
# Example usage
point1 = np.array([1, 2, 3])
point2 = np.array([4, 5, 6])

distance = euclidean_distance(point1, point2)
print(f"Euclidean Distance: {distance:.2f}")


In [None]:
This code will output the Euclidean distance between point1 and point2.

In [None]:
Q8. Write a Python function to calculate the manhattan distance between two points.