<a href="https://colab.research.google.com/github/KhotNoorin/Machine-Learning-/blob/main/KNN_prac.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

KNN:
use for both Classification and regression, but it's mostly used for classification.

It is a lazy learner – meaning it doesn’t learn an internal model. Instead, it memorizes the training data and makes predictions by comparing new input to the stored examples.


---



How Does KNN Work?

1.   Choose K (number of neighbors, like 3 or 5).
2.   Measure the distance between the new data point and all training data points using a metric like Euclidean distance.
3.   Select the K nearest neighbors to the new data point.
4.   Vote or average:
*   Classification: Majority class among neighbors.
*   Regression: Average of neighbors' values.

---



Distance Metrics (Common ones)

Euclidean Distance

Manhattan Distance


---



**Advantages**

Simple to understand and implement.

No training phase (good for small datasets).

Works for multi-class problems.


---


**Disadvantages**

Slow for large datasets (needs to compute distance for all points).

Sensitive to irrelevant features and data scaling.

Curse of dimensionality – performance degrades in high-dimensional data.


---



**When to Use**

Small to medium-sized datasets.

When accuracy is more important than speed.

When the data is not high-dimensional.

---



In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

In [None]:
iris=load_iris()

In [None]:
x=iris.data
y=iris.target

In [None]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

In [None]:
knn=KNeighborsClassifier(n_neighbors=3)

In [None]:
knn.fit(x_train,y_train)

In [None]:
y_pred=knn.predict(x_test)

In [None]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))

Accuracy: 1.0

Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



Manual KNN Implementation

In [None]:
import numpy as np
from collections import Counter

# Euclidean Distance Function
def euclidean_distance(x1, x2):
    return np.sqrt(np.sum((x1 - x2) ** 2))

# KNN Classifier
class KNN:
    def __init__(self, k=3):
        self.k = k

    def fit(self, X, y):
        self.X_train = X
        self.y_train = y

    def predict(self, X):
        predictions = [self._predict(x) for x in X]
        return np.array(predictions)

    def _predict(self, x):
        # Compute distances between x and all examples in the training set
        distances = [euclidean_distance(x, x_train) for x_train in self.X_train]
        # Sort by distance and return indices of the first k neighbors
        k_indices = np.argsort(distances)[:self.k]
        # Extract the labels of the k nearest neighbor training samples
        k_nearest_labels = [self.y_train[i] for i in k_indices]
        # Majority vote, most common class label
        most_common = Counter(k_nearest_labels).most_common(1)
        return most_common[0][0]

# Example dataset (2D)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train model
model = KNN(k=3)
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)

# Accuracy
print("Accuracy:", accuracy_score(y_test, predictions))

Accuracy: 1.0




---


KNN Does Not Use an Activation Function

Backpropagation is NOT used in KNN

---



Methods to Increase KNN Accuracy


1.   Choose the Optimal K value (number of neighbors)
2.   Feature Scaling (Standardization/Normalization)
3. Feature Selection
Remove irrelevant or noisy features:
Use techniques like correlation, PCA, SelectKBest
Irrelevant features reduce distance accuracy
4. Handle Imbalanced Datasets
5. Use Distance Weighting:
Instead of giving equal weight to all neighbors
Weight closer neighbors more using "distance" metric


In [None]:
model = KNeighborsClassifier(n_neighbors=best_k, weights='distance')

6. Outlier Removal
7. Try Different Distance Metrics (Default is Euclidean; others may perform better: manhattan, minkowski, cosine)
8. Use Dimensionality Reduction (e.g., PCA)