<a href="https://colab.research.google.com/github/MussaddikKhan/Data-Science-College-Practicals-/blob/main/Experiment_No_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Experiment – 2**
**Date:**  
**Roll No.: 24201013**  
**Title:** *K-Nearest Neighbors (KNN) Algorithm*

---

## **Theory**

K-Nearest Neighbors (KNN) is a simple and intuitive supervised machine learning algorithm used for both classification and regression. It is a lazy learning method, meaning it does not build a model during training. Instead, it makes decisions only when a new data point needs to be classified. The idea is that similar points lie close to each other.

---

## **Euclidean Distance Formula**

To measure similarity, KNN uses **Euclidean distance** between two data points.

Below is the correctly formatted formula:

<br>

$$
d(x, y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2}
$$

<br>

Where:  
- \(x_i\) = feature value of point **x**  
- \(y_i\) = feature value of point **y**  
- \(n\) = number of features  

---

## **Step-wise Explanation of KNN**

### **Step 1 — Data Representation**
Each example consists of features + a label.  
Example: `[feature1, feature2, ..., label]`

### **Step 2 — Feature Scaling**
Required because KNN uses distance.  
Common methods: Min–Max Scaling, Standardization.

### **Step 3 — Choose K**
- Small K → sensitive to noise  
- Large K → smoother but may underfit  
- Odd K avoids ties

### **Step 4 — Calculate Distances**
Compute the distance between the query point and all training samples.

### **Step 5 — Pick K Nearest Neighbors**
Sort distances and select the closest **K** points.

### **Step 6 — Majority Voting / Averaging**
- Classification → majority vote  
- Regression → average of K neighbors

### **Step 7 — Predict Output**
Return the most frequent label (classification) or mean value (regression).

### **Step 8 — Evaluate (Optional)**
Split into train/test and compute accuracy.

---

## **Advantages**
- Simple to implement  
- Works for classification & regression  
- No explicit training phase  

## **Disadvantages**
- Slow for large datasets  
- Memory heavy  
- Sensitive to scale and noise  

---


In [1]:
import math
from collections import Counter

* Step 1: Euclidean Distance Function*


In [2]:
def euclidean_distance(point1, point2):
    distance = 0
    for i in range(len(point1)):
        distance += (point1[i] - point2[i]) ** 2
    return math.sqrt(distance)


*Step 2: KNN Function*

In [3]:
def knn(data, query, k):
    distances = []

    # Step 3: Calculate distance from query to all points
    for example in data:
        features = example[:-1]       # All values except label
        label = example[-1]           # Last value is label
        dist = euclidean_distance(features, query)
        distances.append((dist, label))

    # Step 4: Sort by distance and take top K
    distances.sort(key=lambda x: x[0])
    k_nearest_labels = [label for _, label in distances[:k]]

    # Step 5: Majority vote (classification)
    prediction = Counter(k_nearest_labels).most_common(1)[0][0]
    return prediction


 *Step 6: Sample Dataset*

 *Format -> [feature1, feature2, label]*

In [4]:
dataset = [
    [1, 2, 'A'],
    [2, 3, 'A'],
    [3, 1, 'B'],
    [6, 5, 'B'],
    [7, 7, 'B'],
    [8, 6, 'A']
]

*Step 7: Predict for a new point*

In [5]:
query_point = [5, 2]
k = 3

result = knn(dataset, query_point, k)

*Step 8: Display Result*

In [6]:
print("Query Point:", query_point)
print("Predicted Class:", result)

Query Point: [5, 2]
Predicted Class: B
