# Advertising Sales Prediction using KNN and SVM

## Project Extension – Non-Linear Models

This notebook extends the advertising sales prediction study by applying **non-linear machine learning models** on the same dataset.  
The goal is to evaluate whether **K-Nearest Neighbors (KNN)** and **Support Vector Machines (SVM)** can improve prediction performance compared to linear regression models.



## Importation des bibliothèques

This step imports the necessary Python libraries for data manipulation, visualization, model training, and performance evaluation.  
Specific machine learning models such as KNN and SVM are included for regression analysis.


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


## Dataset Description

In [3]:
df = pd.read_csv("data/Advertising.csv")
df.head()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   TV         200 non-null    float64
 1   Radio      200 non-null    float64
 2   Newspaper  200 non-null    float64
 3   Sales      200 non-null    float64
dtypes: float64(4)
memory usage: 6.4 KB


## Data Preparation and Scaling

In [5]:
X = df[['TV', 'Radio', 'Newspaper']]
y = df['Sales']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


## KNN Regression Results

In [6]:
knn = KNeighborsRegressor(n_neighbors=5)
knn.fit(X_train_scaled, y_train)

y_pred_knn = knn.predict(X_test_scaled)

mae_knn = mean_absolute_error(y_test, y_pred_knn)
rmse_knn = np.sqrt(mean_squared_error(y_test, y_pred_knn))
r2_knn = r2_score(y_test, y_pred_knn)

print("KNN Results")
print("MAE :", mae_knn)
print("RMSE :", rmse_knn)
print("R² :", r2_knn)


KNN Results
MAE : 1.4074999999999993
RMSE : 1.726276339408033
R² : 0.9035623327324918


## Support Vector Regression Results

In [7]:
svm = SVR(kernel='rbf')
svm.fit(X_train_scaled, y_train)

y_pred_svm = svm.predict(X_test_scaled)

mae_svm = mean_absolute_error(y_test, y_pred_svm)
rmse_svm = np.sqrt(mean_squared_error(y_test, y_pred_svm))
r2_svm = r2_score(y_test, y_pred_svm)

print("SVM Results")
print("MAE :", mae_svm)
print("RMSE :", rmse_svm)
print("R² :", r2_svm)


SVM Results
MAE : 1.5553546442382153
RMSE : 2.1026255636341373
R² : 0.8569295507005326


## Model Comparison

This comparison highlights the predictive performance of linear and non-linear models.  
It allows an objective evaluation of whether advanced machine learning techniques outperform traditional regression.


In [8]:
results = pd.DataFrame({
    "Model": ["Linear Regression", "KNN", "SVM"],
    "R² Score": [0.80, r2_knn, r2_svm]
})

results


Unnamed: 0,Model,R² Score
0,Linear Regression,0.8
1,KNN,0.903562
2,SVM,0.85693


## Conclusion

The objective of this project was to predict sales based on advertising investments using different regression techniques.  
The results show that **Linear Regression** provides a solid baseline model with an R² score of **0.80**, confirming a strong linear relationship between advertising budgets and sales.

However, the **K-Nearest Neighbors (KNN)** model achieved the highest performance with an R² score of **0.90**, indicating its ability to capture local and non-linear patterns in the data.  
The **Support Vector Machine (SVM)** model also performed well with an R² score of **0.86**, demonstrating its effectiveness in modeling more complex relationships.

Overall, this comparison highlights that while linear regression remains highly interpretable, **non-linear models such as KNN and SVM can significantly improve predictive accuracy**.  
The choice of the model should therefore balance interpretability and performance depending on the application context.
