# KNN for Regression

<!-- 1. Import data set
2. Separate X (Gender, Height) and Y (y=Weight).
3. Train = 70%, Test = 30%
4. Apply Linear Regression
5. Evaluate the Model (Testing and training Accuracy, MSE for testing)
6. Apply KNN Regressor: Scikit-Learn Link
7. Evaluate the Model (Testing and training Accuracy, MSE for testing)
8. Compare KNN & Linear Regression with the KNN Model and Linear regression as well. -->

In [1]:
import pandas as pd

In [2]:
df=pd.read_csv('weight-height.csv') # 1.Importing Data Sets

In [3]:
df.head()

Unnamed: 0,Gender,Height,Weight
0,Male,73.847017,241.893563
1,Male,68.781904,162.310473
2,Male,74.110105,212.740856
3,Male,71.730978,220.04247
4,Male,69.881796,206.349801


# Separating X (Gender, Height) and Y (y=Weight).

In [4]:
df = pd.get_dummies(df, columns=['Gender'], drop_first=True)
df.head()

Unnamed: 0,Height,Weight,Gender_Male
0,73.847017,241.893563,1
1,68.781904,162.310473,1
2,74.110105,212.740856,1
3,71.730978,220.04247,1
4,69.881796,206.349801,1


In [5]:
x = df.drop('Weight', axis = 1)
x.head()

Unnamed: 0,Height,Gender_Male
0,73.847017,1
1,68.781904,1
2,74.110105,1
3,71.730978,1
4,69.881796,1


In [6]:
y = df[['Weight']]
y.head()

Unnamed: 0,Weight
0,241.893563
1,162.310473
2,212.740856
3,220.04247
4,206.349801


# Train = 70%, Test = 30%

In [7]:
from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.30)

# Applying Linear Regression

In [8]:
from sklearn.linear_model import LinearRegression #4. Applying Linear Regression
reg=LinearRegression()
reg.fit(xtrain, ytrain)

In [9]:
reg.predict(xtest)

array([[132.02247701],
       [128.30556338],
       [177.25798324],
       ...,
       [199.8235822 ],
       [182.84922742],
       [141.11981374]])

# Evaluating the Model (Testing and training Accuracy, MSE for testing)

In [11]:
reg.score(xtrain, ytrain)

0.8992469991784297

In [15]:
ytest

Unnamed: 0,Weight,Predicted Weight
8543,138.451499,132.022477
6783,137.086322,128.305563
1752,174.030954,177.257983
4823,166.631666,174.984844
1357,172.508657,169.655031
...,...,...
5160,153.846107,136.282259
6231,128.269265,121.213296
4969,195.330596,199.823582
103,177.984729,182.849227


In [16]:
ytest.drop('Predicted Weight', axis=1, inplace=True)

In [17]:
ytest

Unnamed: 0,Weight
8543,138.451499
6783,137.086322
1752,174.030954
4823,166.631666
1357,172.508657
...,...
5160,153.846107
6231,128.269265
4969,195.330596
103,177.984729


In [18]:
reg.score(xtest,ytest)

0.9016159543776875

In [20]:
from sklearn.metrics import mean_squared_error
MSE = mean_squared_error(ytest, reg.predict(xtest))
MSE

100.30984032150735

# Applying KNN Regressor:

In [21]:
from sklearn.neighbors import KNeighborsRegressor

knn = KNeighborsRegressor(n_neighbors=3)

knn.fit(xtrain, ytrain)

In [22]:
ytest['Predicted Weight'] = knn.predict(xtest)

In [23]:
ytest

Unnamed: 0,Weight,Predicted Weight
8543,138.451499,134.533858
6783,137.086322,119.314446
1752,174.030954,168.170201
4823,166.631666,169.617635
1357,172.508657,168.480448
...,...,...
5160,153.846107,138.708212
6231,128.269265,128.888924
4969,195.330596,197.963757
103,177.984729,195.217803


In [24]:
ytest.drop('Predicted Weight', axis=1, inplace=True)

# Evaluating the Model (Testing and training Accuracy, MSE for testing)

In [25]:
knn.score(xtrain, ytrain)

0.9311659925004823

In [26]:
knn.score(xtest, ytest)

0.8681598056079112

In [27]:
from sklearn.metrics import mean_squared_error
MSE_KNN = mean_squared_error(ytest, knn.predict(xtest))
MSE_KNN

134.42086838140403

# Comparing KNN & Linear Regression as well as the KNN Model and Linear regression model

In [28]:
comparison_df = ytest.copy()

In [29]:
comparison_df['KNN weight'] = knn.predict(xtest)
comparison_df['LR weight'] = reg.predict(xtest)

In [30]:
comparison_df.head()

Unnamed: 0,Weight,KNN weight,LR weight
8543,138.451499,134.533858,132.022477
6783,137.086322,119.314446,128.305563
1752,174.030954,168.170201,177.257983
4823,166.631666,169.617635,174.984844
1357,172.508657,168.480448,169.655031


In [31]:
print('Difference between the MSE of KNN and Linear Regression is:')
MSE_KNN-MSE_LR

Difference between the MSE of KNN and Linear Regression is:


34.11102805989668