# 📌 **KNN Algorithm**

**KNN (K-Nearest Neighbors)** is a **supervised learning algorithm** that **predicts the label of a data point by finding the 'K' most similar neighbors** and making the prediction based on those neighbors. 🧑‍🤝‍🧑🔍

---

## 📚 **Key Things to Know About KNN**

🔢 **1. K-value (Number of Neighbors):**

* You choose how many neighbors (K) to look at.
* Example: If **K = 3**, the algorithm checks the **3 nearest points**.
* 🧠 Tip: K should be an **odd number** to avoid ties in classification.

**K in KNN:**

* **Low K (e.g., 1-3):** More sensitive, can overfit 🐉.
* **High K (e.g., 15+):** Less sensitive, can underfit 🦖.

**Key:** Small K = complex, large K = generalized. 🎯


📏 **2. Distance Metric:**
KNN finds the **closest points** using distance formulas:

* **Euclidean Distance (most common)** 📐
* Manhattan Distance 🛣️
* Minkowski Distance 🧮

🔄 **3. Lazy Learner:**

* KNN **doesn’t learn during training**.
* It **stores all data** and **predicts only when needed** (called **lazy learning**) 🐢💤

🧠 **4. Works for:**

* **Classification** (e.g., spam or not spam 📧🚫)
* **Regression** (e.g., predicting price 💵)

⚖️ **5. Sensitive to:**

* **Outliers** 😬
* **Irrelevant features** 🤷
* **Feature scaling** (must **normalize** or **standardize** the data!) 📊

📝 **6. How It Works:**
KNN **finds the 'K' most similar neighbors** to the input data point, then **makes predictions based on those neighbors' labels**. Think of it like asking your closest neighbors what they think! 🏘️💬

🐢 **7. Slow for large datasets:**

* Because it **calculates distance for each test point**, it can be **slow** for large data 📉





---

## 📏 **Euclidean Distance**

- **Euclidean Distance** is the **straight-line distance** between two points in space.
- It’s like measuring the shortest path between two locations on a map 🗺️.
- In the case of **KNN**, Euclidean distance is used to find how "close" or "similar" a data point is to others.


In [4]:
# DATA LOADING :
import pandas as pd
DATA = pd.read_csv(r"C:\Users\Nagesh Agrawal\OneDrive\Desktop\MACHINE LEARNING\1_DATASETS\DECISION TREE DATA.csv")

# Label Encoding
from sklearn.preprocessing import LabelEncoder
LE = LabelEncoder()
DATA["Species"] = LE.fit_transform(DATA["Species"])

# Selecting Features and Target
X = DATA.iloc[:, 1:5]   # Features
Y = DATA["Species"]     # Target

# Splitting the data
from sklearn.model_selection import train_test_split
X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = train_test_split(X, Y, test_size=0.3, random_state=2)

#  Create SVC model
from sklearn.neighbors import KNeighborsClassifier
KNN = KNeighborsClassifier(n_neighbors=10)# TRY 4,5

# 🏋️ Train the model
KNN.fit(X_TRAIN, Y_TRAIN)

# 🔍 Make predictions
Y_PRED = KNN.predict(X_TEST)

# 📊 Check accuracy
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(Y_TEST, Y_PRED)
print("Accuracy:", accuracy)

Accuracy: 0.9777777777777777
