<a href="https://colab.research.google.com/github/ariquintal/machinelearning/blob/main/Implement_KNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **KNN**

K-Nearest Neighbors (KNN) is a supervised learning algorithm that is based on the idea that data points that are close in the feature space have similar labels. To predict the label of an unknown data point, KNN finds the K nearest data points or "neighbors" in the training dataset and assigns the most common label among those neighbors to the unknown data point.

### **Pseudocode**


Load dataset
Split data into features (X) and labels (y)

Define test set size (20%)

Set random seed for reproducibility

Split data into training and test sets

Define KNN classifier

- Initialize number of nearest neighbors (n_neighbors)

- Fit(X_train, y_train)

  - Store X_train and y_train

- Predict(X_query)
  - Calculate distances between X_query and X_train

 - Find indices of nearest neighbors

  - Get labels of nearest neighbors

 - Determine most common label for each point, y_pred

Create and train KNN classifier

Predict labels for test set

Calculate accuracy

Print classification results for each class




In [None]:
import numpy as np
import pandas as pd

# Cargar el conjunto de datos
df = pd.read_csv("diabetes.csv", sep=",")

# Separar features (X) y etiquetas (y)
X = df[['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI','DiabetesPedigreeFunction','Age']].values
y = df['Outcome'].values

# Definir el tamaño del conjunto de prueba (20%)
test_size = 0.2
num_samples = len(df)
num_test_samples = int(test_size * num_samples)

# Fijar la semilla para reproducibilidad
np.random.seed(42)

# Obtener índices aleatorios para el conjunto de prueba
test_indices = np.random.choice(num_samples, num_test_samples, replace=False)

# Crear conjuntos de entrenamiento y prueba
X_train, y_train = np.delete(X, test_indices, axis=0), np.delete(y, test_indices)
X_test, y_test = X[test_indices], y[test_indices]

# Definir el clasificador KNN
class KNNClassifier:
    def __init__(self, n_neighbors=5):
        self.n_neighbors = n_neighbors

    def fit(self, X_train, y_train):
        self.X_train, self.y_train = X_train, y_train

    def euclidean_distance(self, x1, x2):
        return np.sqrt(np.sum((x1 - x2) ** 2, axis=1))

    def predict(self, X_query):
        distances = np.sqrt(np.sum((X_query[:, np.newaxis] - self.X_train) ** 2, axis=2))
        k_nearest_neighbors = np.argsort(distances)[:, :self.n_neighbors]
        k_nearest_labels = self.y_train[k_nearest_neighbors]
        # Obtener la clase que más aparece (moda)
        y_pred = np.apply_along_axis(lambda x: np.bincount(x).argmax(), axis=1, arr=k_nearest_labels)
        return y_pred

# Crear y entrenar el clasificador KNN
knn = KNNClassifier(n_neighbors=31)
knn.fit(X_train, y_train)

# Predecir las etiquetas para el conjunto de prueba y obtener la clase que más aparece
y_pred_mode = knn.predict(X_test)
predicted_mode = np.bincount(y_pred_mode).argmax()

# Calcular y mostrar la precisión (accuracy)
accuracy = np.mean(y_pred_mode == y_test)

# Imprimir la clase que más aparece y la precisión
print("Predicted Class:")
print(predicted_mode)
print('Accuracy:', accuracy)

Predicted Class:
0
Accuracy: 0.7647058823529411


### **Loss Function**

A loss function is like a score that tells us how well our model is doing. It measures the difference between what our model predicts and what's actually true. We use this score to adjust our model and make it better at its task.

## **Optimization function identification**

Optimization function identification means picking the best way to improve and adjust a model during training so that it becomes really good at its job.

In [None]:
!jupyter nbconvert --to html KNN.ipynb

[NbConvertApp] Converting notebook KNN.ipynb to html
[NbConvertApp] Writing 591150 bytes to KNN.html
