K-Nearest Neighbors (KNN) is a **supervised machine learning algorithm** used for both **classification and regression** problems, but it's mostly used for classification.

---

### ðŸ§  **When to Use KNN**

You can use KNN when:

1. **Your dataset is small to medium-sized** (KNN is computationally expensive as it searches the entire dataset at prediction time).
2. **The data is not high-dimensional** (KNN struggles with the "curse of dimensionality").
3. **You want a simple, easy-to-implement model** with no assumptions about the data distribution.
4. **The decision boundary is nonlinear or complex** â€” KNN is flexible and can adapt to complex boundaries.
5. **Your features are on a similar scale** (scaling like normalization or standardization is important for KNN).

---

### ðŸ”¢ **How to Choose the Best Value of K**

Thereâ€™s no fixed rule, but here's how you can find a good value:

#### 1. **Try several values of K and use cross-validation**
- Use a range like `k = 1 to 30` and test performance using **cross-validation**.
- Plot **accuracy vs. k** and pick the value with the **highest accuracy** (or lowest error).

#### 2. **Odd number for K (in binary classification)**
- Use an odd value to avoid ties in voting.

#### 3. **Rule of thumb** (just a starting point):
```python
k = sqrt(n)
```
Where `n` is the number of training samples. But always validate with actual performance testing.

#### 4. **Avoid overfitting and underfitting**
- Small K (like 1 or 3): **High variance**, may overfit.
- Large K: **High bias**, may underfit.
- So itâ€™s a **bias-variance tradeoff**.

---


In [15]:
import numpy as np
from collections import Counter

def elucidean_distance(x1,x2):
    distance=np.sqrt(np.sum((x1-x2)**2))
    return distance

class KNN:
    def __init__(self, k=10):
        self.k=k
    
    def fit(self, X, y):
        self.X_train=X
        self.y_train=y

    def predict(self, X):
        predictions=[self._predict(x) for x in X]
        return predictions
    
    def _predict(self, x):
        dist=[elucidean_distance(x,x_train) for x_train in X_train]
        k_indices=np.argsort(dist)[:self.k]
        k_nearest_labels=[self.y_train[i] for i in k_indices]
        most_common=Counter(k_nearest_labels).most_common()
        # print(most_common)
        return most_common[0][0]


In [14]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
data=datasets.load_iris()
X,y=data.data, data.target


X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=42)

clf=KNN(k=10)
clf.fit(X_train,y_train)
predictions=clf.predict(X_test)

print(np.sum(predictions==y_test)/len(y_test))

1.0
