In [None]:
'''
KNN Algorithm:
- KNN (K-Nearest Neighbors) is a simple, yet powerful, supervised machine learning algorithm used for 
classification and regression tasks. 
- It works by finding the 'k' nearest data points in the training set to a 
given test point and making predictions based on the majority class (for classification) or 
average value (for regression) of those neighbors.
- KNN is non-parametric, meaning it does not assume any underlying distribution of the data, 
and it is sensitive to the choice of 'k' and the distance metric used (e.g., Euclidean distance). 
It is often used in scenarios where the decision boundary is complex and not easily separable by linear models.
- KNN is particularly effective for small to medium-sized datasets, 
but it can become computationally expensive for large datasets due to the need to 
calculate distances between points. 
- It is also sensitive to irrelevant features and the scale of the data, 
so feature scaling (e.g., normalization or standardization) is often recommended before applying KNN.
- KNN is widely used in various applications, including image recognition, 
recommendation systems, and anomaly detection

'''

In [None]:
'''
Distance Metrics:
- Distance metrics are mathematical functions used to quantify the similarity or dissimilarity between two data points in a feature space.
- Common distance metrics include:
1. Euclidean Distance: The straight-line distance between two points in Euclidean space, 
    calculated as the square root of the sum of squared differences between corresponding coordinates.
   Formula: d(x1,x2,y1,y2) = sqrt((x2-x1)^2 + (y2-y1)^2)
2. Manhattan Distance: Also known as L1 distance or taxicab distance, 
    it is the sum of the absolute differences between corresponding coordinates.
    Formula: d(x1,x2, y1,y2) = |x2-x1| + |y2-y1| 

These are used in classification algorithms like KNN to determine the closest neighbors based on their feature values.
'''

In [None]:
'''
Optimization Algorithms of KNN:
- KNN does not have traditional optimization algorithms like gradient descent, 
  as it is a non-parametric algorithm that does not learn a model in the same way as parametric models.
- However, there are techniques to optimize KNN's performance:
1. Feature Selection: Identifying and selecting the most relevant features can improve KNN's performance 
    by reducing noise and computational complexity.
2. Hyperparameter Tuning: Adjusting the value of 'k' (the number of neighbors) can significantly impact KNN's performance.
3. Distance Metric Selection: Choosing the appropriate distance metric (e.g., Euclidean, Manhattan) 
    based on the data characteristics can enhance KNN's accuracy.
4. Data Preprocessing: Normalizing or standardizing the data can help KNN perform better,
    especially when features have different scales.
5. Dimensionality Reduction: Techniques like PCA (Principal Component Analysis)
    can be used to reduce the dimensionality of the data, making KNN more efficient and effective.
6. KD-Trees or Ball Trees: These are data structures that can be used to speed up the nearest neighbor search process,
    especially for high-dimensional data, by organizing the data points in a way that allows for faster distance calculations.
- These techniques help improve KNN's efficiency and accuracy, especially in large datasets or high-dimensional spaces.
'''

In [None]:
'''
Focus on KD-Trees and Ball Trees:
- KD-Trees (k-dimensional trees) and Ball Trees are data structures used to organize points in a k-dimensional space
    to enable efficient nearest neighbor searches.
- KD-Trees:
    A binary tree structure where each node represents a point in k-dimensional space.
        Each level of the tree splits the data based on one dimension, alternating dimensions at each level.
        This allows for efficient searching by pruning large portions of the search space.
- Ball Trees:
    A binary tree structure where each node represents a "ball" (a hypersphere) that contains a subset of points.
        Each node stores the center and radius of the ball, allowing for efficient distance calculations.
        Ball Trees are particularly useful for high-dimensional data, as they can handle varying densities of points better than KD-Trees.
- Both KD-Trees and Ball Trees are used to speed up the nearest neighbor search process in algorithms like KNN,
    reducing the computational complexity from O(n) to O(log n) in many cases.

'''

## K Nearest Neighbors (KNN) Classifier Code


In [7]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier

In [6]:
from sklearn.datasets import make_classification
X,y = make_classification(n_samples=1000, n_features=3, n_classes=2, n_redundant=1, random_state=999)


In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [9]:
classifier = KNeighborsClassifier(n_neighbors=5, algorithm='auto')
classifier.fit(X_train, y_train)

In [10]:
y_pred = classifier.predict(X_test)

In [11]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))

Accuracy: 0.906060606060606
Confusion Matrix:
 [[158  11]
 [ 20 141]]
Classification Report:
               precision    recall  f1-score   support

           0       0.89      0.93      0.91       169
           1       0.93      0.88      0.90       161

    accuracy                           0.91       330
   macro avg       0.91      0.91      0.91       330
weighted avg       0.91      0.91      0.91       330



## KNN Regressor

In [12]:
from sklearn.datasets import make_regression
X,y = make_regression(n_samples=1000, n_features=2, noise=10, random_state=42)


In [13]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)


In [14]:
from sklearn.neighbors import KNeighborsRegressor
regressor = KNeighborsRegressor(n_neighbors=6, algorithm='auto')
regressor.fit(X_train, y_train)

In [15]:
y_pred = regressor.predict(X_test)

In [17]:
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
print("R^2 Score:", r2_score(y_test, y_pred))
print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred))
print("Mean Squared Error:", mean_squared_error(y_test, y_pred))


R^2 Score: 0.9189275159979495
Mean Absolute Error: 9.009462452972217
Mean Squared Error: 127.45860414317289
