# K-Nearest Neighbors (KNN) – Beginner Notebook

## 1. What is KNN?
K-Nearest Neighbors is a simple ML algorithm that predicts output based on the closest data points in the dataset.

## 2. Distance Calculation
Most common: Euclidean distance


\[
d = \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2 + ... + (x_n - y_n)^2}
\]


## 3. Why Scaling?
Features with larger numeric ranges dominate distance calculations, so scaling helps.

## 4. KNN Classification Example (Iris Dataset)

In [None]:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load data
data = load_iris()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Prediction
y_pred = knn.predict(X_test)

# Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))


## 5. Choosing Best K – Elbow Method

In [None]:

import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier

error = []

for k in range(1, 21):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    pred = knn.predict(X_test)
    error.append(1 - accuracy_score(y_test, pred))

plt.plot(range(1, 21), error, marker='o')
plt.xlabel("K value")
plt.ylabel("Error rate")
plt.title("Elbow Method")
plt.show()


## 6. KNN Regression Example (California Housing)

In [None]:

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Load dataset
data = fetch_california_housing()
X = data.data
y = data.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Model
knn_reg = KNeighborsRegressor(n_neighbors=5)
knn_reg.fit(X_train, y_train)

# Prediction
y_pred = knn_reg.predict(X_test)

# Evaluate
mse = mean_squared_error(y_test, y_pred)
print("MSE:", mse)
print("RMSE:", mse ** 0.5)
