# KNN(Regression) model
- KNN Model for Regression used the predict the numerical variable of known values.
- The KNN regression model used the dependent & independent variable to predict the values
- The KNN regression model used the Euclidean distance to calculate the similarity between the data points.
- The KNN regression model used the K nearest neighbors to predict the values.
- The KNN regression model used the mean or median of the K nearest neighbors to predict the values
- The KNN regression model used the weighted average of the K nearest neighbors to predict the values.
- The KNN regression model used the inverse distance weighting to predict the values.
- The KNN regression model used the local regression to predict the values


### Mathematics behind the KNN Regression model:
The KNN regression model is based on the idea of finding the K nearest neighbors to a new data
point and using their average value as the prediction. The mathematics behind this can be
explained as follows:
1.  **Distance Calculation**: The first step is to calculate the distance between the new data point
and all the existing data points in the dataset. 
2.  **K-Nearest Neighbors Selection**: After calculating the distances, the K nearest
    neighbors are selected based on the distance values. The K nearest neighbors are the data
    points with the smallest distance values.
3.  **Prediction**: Once the K nearest neighbors are selected, the average value of
    their target variable is calculated and used as the prediction for the new data point.
4.  **Handling Ties**: In case of ties, where multiple data points have
    the same distance value, the average of their target variable is used as the prediction.
5.  **Weighted Average**: In some cases, a weighted average of the target
    variable of the K nearest neighbors is used as the prediction, where the weights are
    inversely proportional to the distance values.
    

***Import the libraries***

In [69]:
#Import the libraries:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

***Import the ML libraries with sklearn***

In [70]:
#import the KNN for regression:
from sklearn.neighbors import KNeighborsRegressor
#import the train_test split
from sklearn.model_selection import train_test_split
#import metrics for regression methods:
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
#Import the preprocessing libraries:
from sklearn.preprocessing import StandardScaler, LabelEncoder, MinMaxScaler

In [71]:
#Import the dataset from seaborn:
df = sns.load_dataset('tips')

In [72]:
#Check the dataset:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [73]:
#Apply the labelencoder:
le = LabelEncoder()
for col in df.columns:
    if df[col].dtype == "object" or df[col].dtype == 'category':
        df[col] = le.fit_transform(df[col])

In [74]:
#Check the dataset info:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   total_bill  244 non-null    float64
 1   tip         244 non-null    float64
 2   sex         244 non-null    int64  
 3   smoker      244 non-null    int64  
 4   day         244 non-null    int64  
 5   time        244 non-null    int64  
 6   size        244 non-null    int64  
dtypes: float64(2), int64(5)
memory usage: 13.5 KB


In [75]:
#Split the data:
X = df.drop('tip', axis=1)
y = df['tip']

In [76]:
#Preprocess the data into the StandardScaler:
scaler = StandardScaler()
# Fit and transform the data
scaled_model = scaler.fit_transform(X)

In [77]:
#Apply the train_test_split:
X_train, X_test, y_train, y_test = train_test_split(X, y , test_size=0.2, random_state=42)


In [78]:
#Apply the model of knn regression
model = KNeighborsRegressor(n_neighbors=5)
#Fit the model:
model.fit(X_train, y_train)

In [79]:
#Predict the values:
y_pred = model.predict(X_test)

In [80]:
model.predict([[19.82, 1, 0, 1, 1, 3]])



array([2.632])

In [81]:
#Apply the evalution methods for Regression:
print("R2_Score: " ,r2_score(y_test , y_pred ))
print("Mean Absolute Error: ",mean_absolute_error(y_test , y_pred))
print("Mean Squared Error: ", mean_squared_error(y_test, y_pred))

R2_Score:  0.3294034029001649
Mean Absolute Error:  0.7262448979591837
Mean Squared Error:  0.8382265306122448


In [82]:
#Save the model:
import pickle
pickle.dump(model, open('knn_r_model.pkl', 'wb'))

In [83]:
# X_test values
X_test.head()

Unnamed: 0,total_bill,sex,smoker,day,time,size
24,19.82,1,0,1,0,2
6,8.77,1,0,2,0,2
153,24.55,1,0,2,0,4
211,25.89,1,1,1,0,4
198,13.0,0,1,3,1,2
