# Make your own KNN


In this notebook, we will explore and compare two regression models—K-Nearest Neighbors (KNN) and Linear Regression—to predict sales based on TV advertising spend. We'll visualize the data, fit both models, and evaluate their performance.


## 1. Cloning the Repository and Loading Data

First, we need to clone the repository that contains the dataset we'll be using.

In [None]:
!git clone https://github.com/cesarlegendre/credit_scoring_7904_Q4_2024


## 2. Importing Necessary Libraries


We start by importing all the necessary libraries for data manipulation, visualization, and modeling.




In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, roc_curve, auc
import warnings
warnings.filterwarnings('ignore')

# Load data
path = 'credit_scoring_7904_Q4_2024/data_sets/advertising/data.csv'
df_base = pd.read_csv(path)
df_base

# 2. K-Nearest Neighbors (KNN) Regression
Fitting KNN on Sample Data

In [None]:
# prompt: now, plot a knn predictor for the sample data base (at 10) with the real values and the predicted values. The real values in scatter plot and the predictive value as line

from sklearn.neighbors import KNeighborsRegressor

# Sample data
df_sample = df_base.sample(10)
X = df_sample[['TV']]
y = df_sample['Sales']

# Create and train the KNN model
knn = KNeighborsRegressor(n_neighbors=1)
knn.fit(X, y)

# Generate predictions
X_pred = np.linspace(X['TV'].min(), X['TV'].max(), 50).reshape(-1, 1)
y_pred = knn.predict(X_pred)


# Plot the results
plt.figure(figsize=(10, 6))
plt.scatter(X, y, label='Real Values', marker='X',  s=100)
plt.plot(X_pred, y_pred, label='Predicted Values', color='red')
plt.xlabel('TV Advertising Spend')
plt.ylabel('Sales in 1000')
plt.title('KNN Regression for Sales Prediction')
plt.legend()
plt.show()


# 3. Code it yourself

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Sample data (replace df_base with your actual DataFrame)
# For illustration purposes, we'll create a dummy DataFrame
np.random.seed(0)
df_base = pd.DataFrame({
    'TV': np.random.uniform(0, 300, 200),
    'Sales': np.random.uniform(0, 20, 200)
})

class KNNRegressor:
    def __init__(self, n_neighbors=1):
        """
        Initialize the KNN regressor with the number of neighbors.

        Parameters:
        n_neighbors (int): Number of nearest neighbors to consider.
        """
        self.n_neighbors = n_neighbors

    def fit(self, X, y):
        """
        Fit the KNN regressor with training data.

        Parameters:
        X (array-like): Feature matrix of shape (n_samples, n_features).
        y (array-like): Target vector of shape (n_samples,).
        """
        self.X_train = np.asarray(X)
        self.y_train = np.asarray(y)

    def predict(self, X_pred):
        """
        Predict the target values for given feature matrix.

        Parameters:
        X_pred (array-like): Feature matrix of shape (n_samples, n_features).

        Returns:
        y_pred (array): Predicted target values.
        """
        X_pred = np.asarray(X_pred)
        y_pred = []
        for x_test in X_pred:
            # Compute distances between x_test and all training samples
            # clue, use np.liang.norm to compute the distance between X_train and x_text
            # Clue, use import pdb; pdb.set_trace() to stop the code and exit() to get out the debugger
            #import pdb; pdb.set_trace()
            #####################  Your code here ####################
            # distances = # your code here
            #####################  Your code here ####################

            # Find the indices of the k nearest neighbors
            # Clue, depends on distance, order the distance take the indexes and fin the k neignbougs

            #####################  Your code here ####################
            #neighbors_idx = # your code here
            #####################  Your code here ####################

            # Compute the mean target value of the nearest neighbors
            y_mean = np.mean(self.y_train[neighbors_idx])
            y_pred.append(y_mean)
        return np.array(y_pred)



# Sample data
X = df_sample[['TV']]
y = df_sample['Sales']

# Create and train the KNN model
knn_my = KNNRegressor(n_neighbors=1)
knn_my.fit(X, y)

# Generate predictions
X_pred = np.linspace(X['TV'].min(), X['TV'].max(), 50).reshape(-1, 1)
y_pred = knn_my.predict(X_pred)

# Plot the results
plt.figure(figsize=(10, 6))
plt.scatter(X, y, label='Real Values', marker='X', s=100)
plt.plot(X_pred, y_pred, label='Predicted Values', color='red')
plt.xlabel('TV Advertising Spend')
plt.ylabel('Sales in 1000')
plt.title('KNN Regression for Sales Prediction')
plt.legend()
plt.show()
