<a href="https://colab.research.google.com/github/adibhosn/Machine_learning_lab/blob/main/KNN_implementation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **K-Nearest Neighbors (KNN)**
KNN is a straightforward, instance-based algorithm that classifies data points based on the majority class of their nearest neighbors. It doesn’t make any assumptions about the underlying data distribution, which makes it versatile.

### How it works:
1. You choose a value for \( k \), which represents the number of neighbors to consider.
2. The algorithm calculates the distance (e.g., Euclidean or Manhattan) between the query point and all other points in the dataset.
3. It selects the \( k \) nearest neighbors based on these distances.
4. For classification, it assigns the class with the majority vote among the neighbors. For regression, it takes the average of the neighbors’ values.

### When to use it:
KNN is useful for simple classification and regression tasks, especially when the dataset is small or non-linear. It’s also used in recommendation systems and anomaly detection.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv('/content/drive/MyDrive/Curso_estatística_Python/census.csv')
df.head()

Unnamed: 0,age,workclass,final-weight,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loos,hour-per-week,native-country,income
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


# importing already processed data

In [None]:
import pickle

with open('/content/drive/MyDrive/Curso_estatística_Python/census_x_train.pkl', 'rb') as f:
    X_train_census, X_test_census, y_train_census, y_test_census = pickle.load(f)

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_census, y_train_census)

prediction = knn.predict(X_test_census)

accuracy  = accuracy_score(y_test_census, prediction)
print(accuracy)

0.782435129740519


In [None]:
matrix = confusion_matrix(y_test_census, prediction)
print(f'acertos = {matrix[0][0]}', end = '|')
print(f'erros = {matrix[0][1]}')
print(f'erros = {matrix[1][0]}', end = '|')
print(f'acertos = {matrix[1][1]}')


acertos = 4571|erros = 374
erros = 1043|acertos = 525


In [None]:
report = classification_report(y_test_census, prediction)
print(report)

              precision    recall  f1-score   support

           0       0.81      0.92      0.87      4945
           1       0.58      0.33      0.43      1568

    accuracy                           0.78      6513
   macro avg       0.70      0.63      0.65      6513
weighted avg       0.76      0.78      0.76      6513

