<a href="https://colab.research.google.com/github/Esau-May/MachineLearningCourse/blob/main/Activities/K_NN_Esau_May.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**k-NN**

The k-NN (k-Nearest Neighbors) algorithm is a classification and regression method that is based on the proximity between data. In the prediction process, it searches for the k closest examples in the training set and makes a decision based on the labels of those neighbors.

**Pseudocode**

Load the Abalone dataset from a CSV file named 'abalone.data' and assign it to the variable 'data' using the pandas library.

Define a list called 'column_names' containing the names of the columns in the dataset.

Modify the 'Sex' column in the dataset to assign numerical values to the categories 'M', 'F' and 'I'.

Separate the features (X) and labels (y) of the dataset.

Define a function called 'euclidean_distance' to calculate the Euclidean distance between two points.

Define a function called 'knn_predict' that predicts the classification using the k-NN algorithm.

Split the dataset into a training set (X_train, y_train) and a test set (X_test, y_test) using partitioning.

Define a value 'k_value' for the number of k-nearest neighbors in the k-NN algorithm.

Initialize a counter 'correct_predictions' to keep track of correct predictions.

Calculate the total number of predictions in the test set.

For each point in the test set:

- Use the 'knn_predict' function to predict the class.

- Compare the prediction with the actual label in the test set.

- If the prediction is correct, increment the counter 'correct_predictions'.

Calculate the accuracy of the k-NN model by dividing 'correct_predictions' by the total number of predictions in the test set.

Print the class prediction for the last point in the test set.

Print the accuracy of the k-NN model to two decimal places

In [None]:
import pandas as pd
import numpy as np


column_names = ['Sex', 'Length', 'Diameter', 'Height', 'WholeWeight', 'ShuckedWeight', 'VisceraWeight', 'ShellWeight', 'Rings']
data = pd.read_csv('abalone.data', names=column_names)


data['Sex'] = data['Sex'].map({'M': 0, 'F': 1, 'I': 2})


X = data.drop('Rings', axis=1).values
y = data['Rings'].values


def euclidean_distance(point1, point2):
    return np.sqrt(np.sum((point1 - point2) ** 2))


def knn_predict(X_train, y_train, x_test, k):
    distances = [euclidean_distance(x, x_test) for x in X_train]
    k_indices = np.argsort(distances)[:k]
    k_nearest_labels = [y_train[i] for i in k_indices]
    most_common = np.bincount(k_nearest_labels).argmax()
    return most_common


split_ratio = 0.8
split_index = int(len(X) * split_ratio)
X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]


k_value = 10
correct_predictions = 0
total_predictions = len(X_test)

for i in range(total_predictions):
    predicted_class = knn_predict(X_train, y_train, X_test[i], k_value)
    if predicted_class == y_test[i]:
        correct_predictions += 1

accuracy = correct_predictions / total_predictions

print(f'Class prediction using k-NN: {predicted_class}')
print(f'k-NN model accuracy: {accuracy:.2f}')


Class prediction using k-NN: 11
k-NN model accuracy: 0.29


**Loss Function and Optimization Function**

Loss and optimization functions are fundamental in machine learning, but are not used in a k-NN-based classification code. K-NN is a "lazy learning" algorithm that does not involve training with parameter optimization. Instead, k-NN stores training data and makes predictions based on nearest-neighbor proximity. Loss and optimization functions are features of algorithms that adjust parameters through an optimization process to minimize error, whereas k-NN is based on similarity and does not perform this explicit optimization.

In [1]:
#!jupyter nbconvert --to html DecisionTree_Regression_EsauMay.ipynb

[NbConvertApp] Converting notebook DecisionTree_Regression_EsauMay.ipynb to html
[NbConvertApp] Writing 817565 bytes to DecisionTree_Regression_EsauMay.html
