## Introduction to k-Nearest Neighbors (k-NN) Algorithm and Its Applications

<img src="img/knn_algorithm_intro.png" width=400>

### How k-NN Works for Classification and Regression
#### Step-by-Step Process
1. Feature Scaling
2. Calculate Distances
3. Identify + Nearest Neighbors
4. Make Predictions
   - Classification
   - Regression

### Choosing the Optimal Value of *k*
**Choosing *k***
- Small *k*
  - High sensitivity to noise
  - Captures local variations in data
- Large *k*
  - Smoother decision boundaries but can miss finer details

**Common Practices**
- Use cross-validation to determine the optimal value of *k*,
- A common starting point is , *k* = √n, where n is the number of training samples


### Understanding the Model’s Limitations
<img src="img/knn_understanding_model_limitations.png" width=400>

In [16]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.linear_model import LogisticRegression

In [17]:
data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Experiment with different values of k
for k in range(1, 11):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)

    y_pred = knn.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"k = {k}, Accuracy = {accuracy: .2f}")
    print("\nClassification Report:\n", classification_report(y_test, y_pred))
    

k = 1, Accuracy =  1.00

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

k = 2, Accuracy =  1.00

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

k = 3, Accuracy =  1.00

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00   

In [18]:
# Logistic Regression
data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

log_reg = LogisticRegression(max_iter=200)
log_reg.fit(X_train, y_train)

y_pred_lr = log_reg.predict(X_test)
accuracy_lr = accuracy_score(y_test, y_pred_lr)

print(f"Accuracy = {accuracy_lr: .2f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred_lr))

Accuracy =  1.00

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

