# Day-22 k-Nearest Neighbors (k-NN)

Today, we're diving into one of the simplest yet most intuitive machine learning algorithms: k-Nearest Neighbors, or k-NN. It's a non-parametric, supervised learning method used for both classification and regression. The core idea is simple: a new data point is classified based on the majority class of its 'k' nearest neighbors in the feature space. Think of it like this—you are who you hang out with.

## Topics Covered:

- Understanding the Core Principle: How k-NN works.

- Distance Metrics: Measuring "closeness" between data points.

- Choosing the Optimal 'k': Finding the right number of neighbors.

- Strengths & Limitations: When to use k-NN and when to be cautious.



## Understanding the Core Principle: How k-NN works.

### Importing nessary libraries

In [11]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline as pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

### Dataset loading

In [12]:
df= pd.read_csv('Social_Network_Ads.csv')
print(df.head())
print(df.info())

   Age  EstimatedSalary  Purchased
0   19            19000          0
1   35            20000          0
2   26            43000          0
3   27            57000          0
4   19            76000          0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   Age              400 non-null    int64
 1   EstimatedSalary  400 non-null    int64
 2   Purchased        400 non-null    int64
dtypes: int64(3)
memory usage: 9.5 KB
None


### Data Preprocessing

In [13]:
X = df[['Age', 'EstimatedSalary']]
y = df['Purchased']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

#### create pipeline

In [16]:
pipeline = pipeline([
    ('scaler', StandardScaler()), 
    ('knn', KNeighborsClassifier())
])

#### predictions

In [20]:
pipeline.fit(X_train, y_train)

0,1,2
,steps,"[('scaler', ...), ('knn', ...)]"
,transform_input,
,memory,
,verbose,False

0,1,2
,copy,True
,with_mean,True
,with_std,True

0,1,2
,n_neighbors,5
,weights,'uniform'
,algorithm,'auto'
,leaf_size,30
,p,2
,metric,'minkowski'
,metric_params,
,n_jobs,


In [21]:
y_pred = pipeline.predict(X_test)

In [22]:
# Evaluate the model's performance
print("Accuracy on test set:", accuracy_score(y_test, y_pred))
print("\nConfusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Accuracy on test set: 0.93

Confusion Matrix:
 [[64  4]
 [ 3 29]]

Classification Report:
               precision    recall  f1-score   support

           0       0.96      0.94      0.95        68
           1       0.88      0.91      0.89        32

    accuracy                           0.93       100
   macro avg       0.92      0.92      0.92       100
weighted avg       0.93      0.93      0.93       100

