## K-Nearest Neighbors

The k-nearest neighbors algorithm, also known as KNN or k-NN, is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point.

## Intuition behind the KNN algorithm.

The K-Nearest Neighbors (KNN) algorithm is based on the idea that similar objects tend to be close to each other in a feature space. To classify or predict a value for a new data point, KNN finds the K nearest points to that point in the training set and makes a decision based on the majority of classes (in classification) or the average of values (in regression) of those neighbors. The choice of distance metric and the value of K are important, and KNN is effective for problems where local information is relevant but can be computationally expensive on large datasets.

## Algorithm pseudocode


    Function KNN(TrainingSet, QueryPoint, K):
    # Create a list to store distances and labels of nearby points
    DistanceList = []

    # Calculate the distance between QueryPoint and each point in TrainingSet
    For each point in TrainingSet:
        Distance = CalculateDistance(QueryPoint, point)
        DistanceList.append((Distance, point.label))  # Store the distance and label/class of the point
    
    # Sort the distance list in ascending order
    DistanceList.sort_by_distance()

    # Take the first K points from the sorted list
    NearestNeighbors = DistanceList[:K]

    # Count the classes/labels of the K nearest neighbors
    ClassCounter = CountClasses(NearestNeighbors)

    # Return the most common class/label among the K nearest neighbors
    PredictedClass = MostCommonClass(ClassCounter)
    
    Return PredictedClass

## Algorithm implementation

In [None]:
# Import the necessary libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset as an example
iris = load_iris()
X = iris.data  # Characteristics
y = iris.target  # Tags/classes

# Split the data set into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a KNN classifier with a K value of 3
knn_classifier = KNeighborsClassifier(n_neighbors=3)

# Train the classifier with the training data
knn_classifier.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn_classifier.predict(X_test)

# Calculate the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("KNN Accuracy:", accuracy)


KNN Accuracy: 1.0


The Iris data set is used as an example data set. The code imports the necessary libraries, splits the data into training and test sets, creates a KNN classifier with a value of K equal to 3, trains it on the training data, and then makes predictions on the test set. Finally, it calculates and displays the accuracy of the KNN classifier on the test set.

## Loss function + Optimization function identification.


The K-Nearest Neighbors (KNN) algorithm does not have a loss function or an optimization function. KNN is a supervised learning algorithm that relies on the idea of finding the K nearest neighbors to a query point in the feature space and making a decision based on the majority of classes among those neighbors (in classification) or the average of values (in regression).

Unlike other supervised learning algorithms like Support Vector Machines (SVM), neural networks, or linear regression algorithms, KNN does not involve a loss function that is optimized during training. Instead, KNN stores the entire training dataset and performs calculations in real-time to make decisions based on the proximity of the nearest neighbors to the query point.

Therefore, there are no parameters to be tuned through an optimization function in KNN, as is the case with other algorithms that aim to minimize a loss function to find the best model coefficients. KNN is a "lazy learning" algorithm in the sense that it doesn't train a model in the traditional sense but rather stores the training data and performs calculations at the time of prediction.