# K Nearest Neighbors: Predicting King County Housing Prices



Dataset
The dataset is available at "data/kc_house_data.csv" in the respective challenge's repo.
Original Source: https://www.kaggle.com/shivachandel/kc-house-data 



### How would you predict the price of a house that is about to go on sale?



Online property companies offer valuations of houses using machine learning techniques. The aim of this report is to predict the house sales in King County, Washington State, USA using Multiple Linear Regression (MLR). The dataset consisted of historic data of houses sold between May 2014 to May 2015.
We will predict the sales of houses in King County with an accuracy of at least 75-80% and understand which factors are responsible for higher property value - $650K and above.”

The dataset consists of house prices from King County an area in the US State of Washington, this data also covers Seattle. 

## Similar houses should be similar in price

* Square footage
* Number of floors
* Location


## Distance as a measure of similarity

How 'far away' are houses from each other given all of their features?



## What is K-Nearest Neighbors?

**_K-Nearest Neighbors_** (or KNN, for short) is a supervised learning algorithm that can be used for both **_Classification_** and **_Regression_** tasks. KNN is a distance-based classifier, meaning that it implicitly assumes that the smaller the distance between 2 points, the more similar they are. In KNN, each column acts as a dimension. In a dataset with two columns, we can easily visualize this by treating values for one column as X coordinates and and the other as Y coordinates. Since this is a **_Supervised Learning Algorithm_**, we must also have the labels for each point in our dataset, or else we can't use this algorithm for prediction.

## Fitting the Model

KNN is unique compared to other algorithms in that it does almost nothing during the "fit" step, and all the work during the "predict" step. During the 'fit' step, KNN just stores all the training data and corresponding values. No distances are calculated at this point. 

## Making Predictions with K

All the magic happens during the 'predict' step. During this step, KNN takes a point that we want a class prediction for, and calculates the distances between that point and every single point in the training set. It then finds the `K` closest points, or **_Neighbors_**, and examines the values of each. You can think of each of the K-closest points getting a 'vote' about the predicted value. Often times the mean of all the values is taken to make a prediction about the new point.

In the following animation, K=3.

<img src='https://github.com/Bmcgarry194/knn_workshop/blob/master/knn.gif?raw=1'>

## Distance Metrics

As we explored in a previous lesson, there are different **_distance metrics_** when using KNN. For KNN, we can use **_Manhattan_**, **_Euclidean_**, or **_Minkowski Distance_**--from an algorithmic standpoint, it doesn't matter which! However, it should be noted that from a practical standpoint, these can affect our results and our overall model performance. 


Tasks
1.	Load preprocess the dataset
2.	Creating our own implementation of KNN regressor 
3.	Housing data predictions
4.	Limit our predictions to the middle 80% of our dataset
5.	Apply data scaling
6.	Predict data using your own knn
7.	Predict data using sklearn’s knn
8.	Choosing the optimal number of neighbors: Model behavior with increasing k for regression problem
9.	Finding optimal k for King County Dataset


Download and load the data (csv file contains ';' as delimiter)

In [24]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from scipy.spatial.distance import euclidean as euc
# From visualize import generate_moons_df, preprocess, plot_boundaries

# Sklearn processing
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler

from sklearn.metrics import classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
np.random.seed(0)

## Importing Data

In [25]:
data = pd.read_csv('Data/kc_house_data.csv')

In [26]:
data

Unnamed: 0,id,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,...,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,7129300520,20141013T000000,221900.0,3,1.00,1180,5650,1.0,0,0,...,7,1180.0,0,1955,0,98178,47.5112,-122.257,1340,5650
1,6414100192,20141209T000000,538000.0,3,2.25,2570,7242,2.0,0,0,...,7,2170.0,400,1951,1991,98125,47.7210,-122.319,1690,7639
2,5631500400,20150225T000000,180000.0,2,1.00,770,10000,1.0,0,0,...,6,770.0,0,1933,0,98028,47.7379,-122.233,2720,8062
3,2487200875,20141209T000000,604000.0,4,3.00,1960,5000,1.0,0,0,...,7,1050.0,910,1965,0,98136,47.5208,-122.393,1360,5000
4,1954400510,20150218T000000,510000.0,3,2.00,1680,8080,1.0,0,0,...,8,1680.0,0,1987,0,98074,47.6168,-122.045,1800,7503
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21608,263000018,20140521T000000,360000.0,3,2.50,1530,1131,3.0,0,0,...,8,1530.0,0,2009,0,98103,47.6993,-122.346,1530,1509
21609,6600060120,20150223T000000,400000.0,4,2.50,2310,5813,2.0,0,0,...,8,2310.0,0,2014,0,98146,47.5107,-122.362,1830,7200
21610,1523300141,20140623T000000,402101.0,2,0.75,1020,1350,2.0,0,0,...,7,1020.0,0,2009,0,98144,47.5944,-122.299,1020,2007
21611,291310100,20150116T000000,400000.0,3,2.50,1600,2388,2.0,0,0,...,8,1600.0,0,2004,0,98027,47.5345,-122.069,1410,1287


**Preprocessing**

In [27]:
# Checking the data types of each column
for i in data:
    print(i, ":", type(data[i][0]))

id : <class 'numpy.int64'>
date : <class 'str'>
price : <class 'numpy.float64'>
bedrooms : <class 'numpy.int64'>
bathrooms : <class 'numpy.float64'>
sqft_living : <class 'numpy.int64'>
sqft_lot : <class 'numpy.int64'>
floors : <class 'numpy.float64'>
waterfront : <class 'numpy.int64'>
view : <class 'numpy.int64'>
condition : <class 'numpy.int64'>
grade : <class 'numpy.int64'>
sqft_above : <class 'numpy.float64'>
sqft_basement : <class 'numpy.int64'>
yr_built : <class 'numpy.int64'>
yr_renovated : <class 'numpy.int64'>
zipcode : <class 'numpy.int64'>
lat : <class 'numpy.float64'>
long : <class 'numpy.float64'>
sqft_living15 : <class 'numpy.int64'>
sqft_lot15 : <class 'numpy.int64'>


In [28]:
# Check for null values
data.isnull().sum()

id               0
date             0
price            0
bedrooms         0
bathrooms        0
sqft_living      0
sqft_lot         0
floors           0
waterfront       0
view             0
condition        0
grade            0
sqft_above       2
sqft_basement    0
yr_built         0
yr_renovated     0
zipcode          0
lat              0
long             0
sqft_living15    0
sqft_lot15       0
dtype: int64

In [29]:
# Checking unique values
len(data['id'].unique())

21436

In [30]:
#Printing the unique values

for i in data.columns:
    print(i, ":", len(data[i].unique()))

id : 21436
date : 372
price : 4028
bedrooms : 13
bathrooms : 30
sqft_living : 1038
sqft_lot : 9782
floors : 6
waterfront : 2
view : 5
condition : 5
grade : 12
sqft_above : 947
sqft_basement : 306
yr_built : 116
yr_renovated : 70
zipcode : 70
lat : 5034
long : 752
sqft_living15 : 777
sqft_lot15 : 8689


In [31]:
# Since id has too many unique values it won't be helpful, thus dropping
# Dropping useless features
data = data.drop('id', axis = 1)

# Label encoding
data['date'] = LabelEncoder().fit_transform(data['date'])

In [32]:
data

Unnamed: 0,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,164,221900.0,3,1.00,1180,5650,1.0,0,0,3,7,1180.0,0,1955,0,98178,47.5112,-122.257,1340,5650
1,220,538000.0,3,2.25,2570,7242,2.0,0,0,3,7,2170.0,400,1951,1991,98125,47.7210,-122.319,1690,7639
2,290,180000.0,2,1.00,770,10000,1.0,0,0,3,6,770.0,0,1933,0,98028,47.7379,-122.233,2720,8062
3,220,604000.0,4,3.00,1960,5000,1.0,0,0,5,7,1050.0,910,1965,0,98136,47.5208,-122.393,1360,5000
4,283,510000.0,3,2.00,1680,8080,1.0,0,0,3,8,1680.0,0,1987,0,98074,47.6168,-122.045,1800,7503
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21608,19,360000.0,3,2.50,1530,1131,3.0,0,0,3,8,1530.0,0,2009,0,98103,47.6993,-122.346,1530,1509
21609,288,400000.0,4,2.50,2310,5813,2.0,0,0,3,8,2310.0,0,2014,0,98146,47.5107,-122.362,1830,7200
21610,52,402101.0,2,0.75,1020,1350,2.0,0,0,3,7,1020.0,0,2009,0,98144,47.5944,-122.299,1020,2007
21611,252,400000.0,3,2.50,1600,2388,2.0,0,0,3,8,1600.0,0,2004,0,98027,47.5345,-122.069,1410,1287


**Normalizing**

In [33]:
# Storing columns names
column_names = data.columns

In [34]:
# Normalizing
feature_columns = [i for i in data.columns if i!='price']
mms = MinMaxScaler()
data[feature_columns] = mms.fit_transform(data[feature_columns])

In [35]:
# Converting back into a dataframe
data = pd.DataFrame(data, columns=column_names)

In [36]:
data

Unnamed: 0,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,0.442049,221900.0,0.090909,0.12500,0.067170,0.003108,0.0,0.0,0.0,0.5,0.500000,0.097588,0.000000,0.478261,0.000000,0.893939,0.571498,0.217608,0.161934,0.005742
1,0.592992,538000.0,0.090909,0.28125,0.172075,0.004072,0.4,0.0,0.0,0.5,0.500000,0.206140,0.082988,0.443478,0.988089,0.626263,0.908959,0.166113,0.222165,0.008027
2,0.781671,180000.0,0.060606,0.12500,0.036226,0.005743,0.0,0.0,0.0,0.5,0.416667,0.052632,0.000000,0.286957,0.000000,0.136364,0.936143,0.237542,0.399415,0.008513
3,0.592992,604000.0,0.121212,0.37500,0.126038,0.002714,0.0,0.0,0.0,1.0,0.500000,0.083333,0.188797,0.565217,0.000000,0.681818,0.586939,0.104651,0.165376,0.004996
4,0.762803,510000.0,0.090909,0.25000,0.104906,0.004579,0.0,0.0,0.0,0.5,0.583333,0.152412,0.000000,0.756522,0.000000,0.368687,0.741354,0.393688,0.241094,0.007871
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
21608,0.051213,360000.0,0.090909,0.31250,0.093585,0.000370,0.8,0.0,0.0,0.5,0.583333,0.135965,0.000000,0.947826,0.000000,0.515152,0.874055,0.143688,0.194631,0.000986
21609,0.776280,400000.0,0.121212,0.31250,0.152453,0.003206,0.4,0.0,0.0,0.5,0.583333,0.221491,0.000000,0.991304,0.000000,0.732323,0.570693,0.130399,0.246257,0.007523
21610,0.140162,402101.0,0.060606,0.09375,0.055094,0.000503,0.4,0.0,0.0,0.5,0.500000,0.080044,0.000000,0.947826,0.000000,0.722222,0.705324,0.182724,0.106866,0.001558
21611,0.679245,400000.0,0.090909,0.31250,0.098868,0.001132,0.4,0.0,0.0,0.5,0.583333,0.143640,0.000000,0.904348,0.000000,0.131313,0.608975,0.373754,0.173980,0.000731


## Limit our predictions to the middle 80% of our dataset

It is easier to make predictions where the data is most dense but doing this means that any predictions made outside of the range of values we are training on will be highly suspect

In [37]:
features = ['sqft_living', 'lat', 'long']

X = np.array(data[features])
y = np.array(data['price'])

In [38]:
# Checking the columns
pd.DataFrame(X)

Unnamed: 0,0,1,2
0,0.067170,0.571498,0.217608
1,0.172075,0.908959,0.166113
2,0.036226,0.936143,0.237542
3,0.126038,0.586939,0.104651
4,0.104906,0.741354,0.393688
...,...,...,...
21608,0.093585,0.874055,0.143688
21609,0.152453,0.570693,0.130399
21610,0.055094,0.705324,0.182724
21611,0.098868,0.608975,0.373754


In [39]:
# Checking the columns
y

array([221900., 538000., 180000., ..., 402101., 400000., 325000.])

In [40]:
# Do train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

In [41]:
# To check if the data is correctly segregated
X_train_shape = X_train.shape
y_train_shape = y_train.shape
X_test_shape  = X_test.shape
y_test_shape  = y_test.shape

print(f"X_train: {X_train_shape} , y_train: {y_train_shape}")
print(f"X_test: {X_test_shape} , y_test: {y_test_shape}")

X_train: (17290, 3) , y_train: (17290,)
X_test: (4323, 3) , y_test: (4323,)


In [42]:
# Checking
for train_rows in X_train:
    print(train_rows)

[0.11245283 0.33955284 0.30481728]
[0.05358491 0.63712401 0.19019934]
[0.05962264 0.5274248  0.15282392]
[0.13584906 0.38475149 0.28820598]
[0.10950943 0.88209747 0.25747508]
[0.15471698 0.5287116  0.40199336]
[0.13207547 0.9216664  0.28820598]
[0.22867925 0.97008203 0.2076412 ]
[0.1109434  0.61058388 0.10714286]
[0.09962264 0.36577127 0.39534884]
[0.06792453 0.55766447 0.23504983]
[0.16528302 0.48689078 0.19601329]
[0.08301887 0.44732186 0.30315615]
[0.05660377 0.47788322 0.1910299 ]
[0.20301887 0.36158919 0.24667774]
[0.0890566  0.63985845 0.17109635]
[0.0309434  0.73797652 0.18189369]
[0.30641509 0.7976516  0.29817276]
[0.1109434  0.58291781 0.21096346]
[0.09584906 0.59723339 0.13289037]
[0.16528302 0.68859579 0.26910299]
[0.18641509 0.64371884 0.11627907]
[0.16286792 0.30062731 0.28239203]
[0.09433962 0.54913946 0.2217608 ]
[0.21660377 0.61267492 0.25166113]
[0.12377358 0.9559273  0.36046512]
[0.20603774 0.86681679 0.28322259]
[0.16754717 0.81647097 0.30730897]
[0.16528302 0.979089

[0.06339623 0.94756313 0.17109635]
[0.10792453 0.68055332 0.18687708]
[0.01735849 0.85684414 0.11710963]
[0.13886792 0.80199453 0.15448505]
[0.14415094 0.61926974 0.20847176]
[0.07169811 0.60527586 0.37043189]
[0.15471698 0.7196397  0.17940199]
[0.16830189 0.77561525 0.37956811]
[0.05735849 0.80730256 0.10714286]
[0.11018868 0.8433328  0.10880399]
[0.19471698 0.47659643 0.32890365]
[0.12603774 0.99887406 0.22093023]
[0.07471698 0.88032813 0.1320598 ]
[0.07396226 0.72382178 0.18604651]
[0.08603774 0.72076564 0.1769103 ]
[0.08475472 0.42834164 0.27076412]
[0.21132075 0.72350008 0.39202658]
[0.15396226 0.8436545  0.13372093]
[0.0890566  0.94949332 0.18687708]
[0.06490566 0.72205244 0.33056478]
[0.13283019 0.44474827 0.28903654]
[0.25283019 0.88306257 0.32890365]
[0.10792453 0.26250603 0.11295681]
[0.06566038 0.98375422 0.16943522]
[0.16226415 0.65803442 0.11461794]
[0.18928302 0.2813254  0.27574751]
[0.08830189 0.58951263 0.10714286]
[0.13962264 0.63245939 0.2051495 ]
[0.15320755 0.900595

[0.23184906 0.16615731 0.17857143]
[0.18415094 0.97908959 0.20930233]
[0.08       0.33054528 0.39368771]
[0.10641509 0.93952067 0.14950166]
[0.07773585 0.37686987 0.25830565]
[0.12377358 0.90670742 0.4127907 ]
[0.10867925 0.33537076 0.17192691]
[0.14067925 0.48222615 0.3230897 ]
[0.09509434 0.6171787  0.12541528]
[0.08377358 0.82322664 0.15448505]
[0.07245283 0.85909603 0.17857143]
[0.08679245 0.54495737 0.27740864]
[0.22188679 0.80183368 0.43272425]
[0.08075472 0.79009168 0.1013289 ]
[0.12226415 0.71352742 0.2666113 ]
[0.17886792 0.63857166 0.33139535]
[0.1154717  0.90011259 0.34468439]
[0.08301887 0.19414509 0.30980066]
[0.09962264 0.33762265 0.28820598]
[0.13358491 0.46405018 0.13953488]
[0.07924528 0.50957053 0.18438538]
[0.08       0.54045359 0.27574751]
[0.09056604 0.97281647 0.23754153]
[0.05358491 0.22695834 0.24252492]
[0.20603774 0.78944829 0.17857143]
[0.07396226 0.72173074 0.29152824]
[0.15924528 0.83239505 0.28737542]
[0.05660377 0.89721731 0.25415282]
[0.02867925 0.579379

[0.09433962 0.69342126 0.11212625]
[0.17433962 0.86713849 0.49916944]
[0.23245283 0.42850249 0.1461794 ]
[0.14264151 0.96252212 0.14784053]
[0.14943396 0.99131414 0.26162791]
[0.03169811 0.53788001 0.22757475]
[0.10037736 0.82387003 0.13621262]
[0.10415094 0.80151198 0.12126246]
[0.14716981 0.59787679 0.29734219]
[0.15698113 0.46646292 0.61960133]
[0.13056604 0.86343896 0.25996678]
[0.05584906 0.69197362 0.10631229]
[0.14037736 0.99260093 0.12292359]
[0.14566038 0.62264758 0.19019934]
[0.07396226 0.87341161 0.16777409]
[0.15320755 0.93517774 0.14036545]
[0.05962264 0.82161814 0.13538206]
[0.15471698 0.95013672 0.36710963]
[0.14792453 0.74811002 0.30564784]
[0.13886792 0.68666559 0.18936877]
[0.06716981 0.94370275 0.33471761]
[0.11924528 0.74987936 0.16112957]
[0.17132075 0.92761782 0.14700997]
[0.21132075 0.8867621  0.49169435]
[0.17132075 0.29612353 0.20598007]
[0.07471698 0.66865047 0.31561462]
[0.12150943 0.78912659 0.13372093]
[0.04754717 0.22502815 0.19518272]
[0.1954717  0.148946

[0.08377358 0.70242882 0.18272425]
[0.10943396 0.55766447 0.23504983]
[0.13886792 0.26524047 0.20431894]
[0.30792453 0.62425607 0.53737542]
[0.26113208 0.80762426 0.42607973]
[0.06867925 0.42544636 0.16196013]
[0.12301887 0.81341483 0.13538206]
[0.13509434 0.89239183 0.1910299 ]
[0.14188679 0.84751488 0.15780731]
[0.12528302 0.19350169 0.30813953]
[0.06490566 0.91684092 0.15863787]
[0.10490566 0.91201544 0.339701  ]
[0.24981132 0.47048416 0.14950166]
[0.09584906 0.51745215 0.15116279]
[0.10867925 0.87228567 0.19684385]
[0.08075472 0.6749236  0.10215947]
[0.10264151 0.97104713 0.17857143]
[0.06264151 0.213447   0.20348837]
[0.10792453 0.93437349 0.26910299]
[0.0890566  0.95480135 0.15946844]
[0.13509434 0.17870355 0.22342193]
[0.20301887 0.92633103 0.31561462]
[0.08754717 0.28695512 0.29817276]
[0.19245283 0.59208622 0.26495017]
[0.06490566 0.64806177 0.10548173]
[0.05660377 0.42834164 0.06312292]
[0.13132075 0.60254142 0.39202658]
[0.12       0.29065466 0.38787375]
[0.24150943 0.507479

[0.14754717 0.32137687 0.37873754]
[0.13811321 0.54367058 0.18438538]
[0.11924528 0.87502011 0.28986711]
[0.12830189 0.26250603 0.27242525]
[0.10113208 0.96622165 0.37043189]
[0.20301887 0.93501689 0.16860465]
[0.37811321 0.81244973 0.2076412 ]
[0.03698113 0.56940647 0.15697674]
[0.07924528 0.54286633 0.23421927]
[0.11124528 0.44378318 0.2807309 ]
[0.15698113 0.30786553 0.18023256]
[0.03849057 0.63792826 0.20348837]
[0.16226415 0.76226476 0.13039867]
[0.07471698 0.23290976 0.24169435]
[0.16830189 0.78011903 0.25913621]
[0.         0.60302397 0.52408638]
[0.13433962 0.31848158 0.41112957]
[0.06490566 0.944507   0.16611296]
[0.08754717 0.6511179  0.11710963]
[0.13207547 0.99372688 0.2333887 ]
[0.07471698 0.7405501  0.18687708]
[0.09283019 0.73974586 0.25415282]
[0.11471698 0.31108252 0.38704319]
[0.14339623 0.68956088 0.10963455]
[0.12377358 0.28277304 0.12790698]
[0.05433962 0.91137204 0.28239203]
[0.15698113 0.28454238 0.28737542]
[0.10792453 0.93389094 0.14950166]
[0.20830189 0.946758

[0.30490566 0.71577931 0.42192691]
[0.22188679 0.32169857 0.41196013]
[0.23773585 0.65578253 0.23920266]
[0.12377358 0.53241113 0.1486711 ]
[0.13886792 0.5727843  0.29817276]
[0.07622642 0.88000643 0.41860465]
[0.36377358 0.91716262 0.28986711]
[0.06264151 0.6057584  0.37126246]
[0.08830189 0.34663021 0.28405316]
[0.15849057 0.35032974 0.33554817]
[0.11924528 0.47643558 0.29734219]
[0.09811321 0.85330545 0.14784053]
[0.05358491 0.22695834 0.24418605]
[0.16226415 0.84671063 0.12292359]
[0.20754717 0.70194628 0.29069767]
[0.18716981 0.8769503  0.25996678]
[0.28150943 0.54576162 0.37458472]
[0.15924528 0.89190928 0.23172757]
[0.17056604 0.16969599 0.19019934]
[0.17388679 0.24046968 0.20182724]
[0.16       0.79459546 0.35797342]
[0.06264151 0.68023162 0.16943522]
[0.03622642 0.97490751 0.16445183]
[0.29735849 0.62489947 0.32225914]
[0.15698113 0.34421747 0.41528239]
[0.16754717 0.48415635 0.41445183]
[0.13660377 0.8322342  0.31063123]
[0.14566038 0.66462924 0.10631229]
[0.11622642 0.427698

[0.17524528 0.35258163 0.27408638]
[0.28981132 0.64291459 0.40614618]
[0.13358491 0.27859096 0.17857143]
[0.06716981 0.31236931 0.41445183]
[0.20603774 0.60913624 0.52990033]
[0.21207547 0.77561525 0.17026578]
[0.20075472 0.92600933 0.40199336]
[0.09962264 0.16470967 0.18189369]
[0.26339623 0.80006434 0.27906977]
[0.15924528 0.93324755 0.39285714]
[0.04679245 0.59176452 0.13538206]
[0.09735849 0.80135113 0.15780731]
[0.21433962 0.65899952 0.09634551]
[0.13584906 0.19012385 0.15116279]
[0.10339623 0.9657391  0.36877076]
[0.17811321 0.51970404 0.32807309]
[0.19924528 0.2248673  0.27574751]
[0.10113208 0.99453112 0.31478405]
[0.15245283 0.91796687 0.12956811]
[0.18037736 0.65160045 0.10049834]
[0.13660377 0.33183207 0.3820598 ]
[0.09811321 0.68714814 0.10548173]
[0.12226415 0.93051311 0.28737542]
[0.25811321 0.61010134 0.54069767]
[0.11396226 0.3935982  0.29651163]
[0.16679245 0.28968956 0.15365449]
[0.27018868 0.81904456 0.27242525]
[0.17584906 0.5277465  0.62126246]
[0.15698113 0.345343

[0.16981132 0.69535146 0.42275748]
[0.2        0.77320251 0.12458472]
[0.13433962 0.26427537 0.10299003]
[0.1290566  0.88499276 0.12541528]
[0.19245283 0.64082355 0.20847176]
[0.12150943 0.8325559  0.13039867]
[0.03471698 0.64146695 0.2051495 ]
[0.08       0.71384912 0.17192691]
[0.11320755 0.90397298 0.9269103 ]
[0.06867925 0.59112112 0.28405316]
[0.21132075 0.9892231  0.25498339]
[0.10641509 0.97394242 0.26245847]
[0.15396226 0.95254946 0.18853821]
[0.22641509 0.32025092 0.16611296]
[0.08754717 0.95834004 0.14700997]
[0.16301887 0.8534663  0.10215947]
[0.13811321 0.6302075  0.31395349]
[0.04528302 0.93469519 0.18189369]
[0.07849057 0.59449895 0.15946844]
[0.17584906 0.70162458 0.42026578]
[0.17132075 0.97394242 0.40946844]
[0.16679245 0.7199614  0.19435216]
[0.05962264 0.63631977 0.1345515 ]
[0.15320755 0.56811967 0.23089701]
[0.02188679 0.59353386 0.12126246]
[0.07169811 0.46807142 0.30232558]
[0.14188679 0.4830304  0.15697674]
[0.09962264 0.98841885 0.21179402]
[0.06490566 0.370918

[0.14566038 0.66865047 0.4127907 ]
[0.29132075 0.6847354  0.41362126]
[0.16       0.65916037 0.10299003]
[0.13132075 0.21039086 0.25747508]
[0.11773585 0.08557182 0.42358804]
[0.07169811 0.46951906 0.30066445]
[0.10309434 0.35370758 0.07807309]
[0.20603774 0.76242561 0.34883721]
[0.10641509 0.91072865 0.16777409]
[0.07320755 0.98198488 0.2641196 ]
[0.1690566  0.9324433  0.46511628]
[0.22264151 0.74360624 0.25249169]
[0.06943396 0.87148142 0.14119601]
[0.09735849 0.98118063 0.17358804]
[0.26830189 0.77127232 0.2666113 ]
[0.02037736 0.83496863 0.55232558]
[0.0490566  0.42576806 0.16196013]
[0.14566038 0.35370758 0.25083056]
[0.18060377 0.87228567 0.31644518]
[0.15924528 0.32025092 0.38372093]
[0.24075472 0.85217951 0.39368771]
[0.11396226 0.31188676 0.38787375]
[0.09735849 0.62586456 0.12541528]
[0.13509434 0.85266206 0.10631229]
[0.22415094 0.83850732 0.20930233]
[0.21584906 0.77658034 0.19518272]
[0.13509434 0.68956088 0.29152824]
[0.08981132 0.99694386 0.13787375]
[0.27698113 0.683126

[0.33056604 0.74376709 0.24252492]
[0.09509434 0.44587422 0.26910299]
[0.09132075 0.50957053 0.19767442]
[0.06716981 0.65417404 0.11212625]
[0.14792453 0.17323468 0.25498339]
[0.06867925 0.25687631 0.32475083]
[0.28830189 0.2921023  0.15116279]
[0.08377358 0.88177578 0.1486711 ]
[0.06415094 0.24979894 0.21096346]
[0.12603774 0.707576   0.18272425]
[0.09056604 0.7539006  0.26578073]
[0.07773585 0.58597394 0.1345515 ]
[0.15018868 0.58854753 0.2948505 ]
[0.12075472 0.90027344 0.16777409]
[0.12075472 0.52597716 0.14784053]
[0.12301887 0.65529998 0.32225914]
[0.14490566 0.85764838 0.16694352]
[0.03245283 0.5832395  0.10548173]
[0.22943396 0.50924883 0.31810631]
[0.06037736 0.24239987 0.25083056]
[0.16150943 0.56393759 0.27242525]
[0.26490566 0.99437028 0.26245847]
[0.14188679 0.73427698 0.18770764]
[0.15396226 0.69535146 0.18604651]
[0.06566038 0.68023162 0.16943522]
[0.06113208 0.55412578 0.29651163]
[0.12       0.60849284 0.5282392 ]
[0.08754717 0.76258646 0.14119601]
[0.11698113 0.726234

[0.14339623 0.17275213 0.22840532]
[0.1690566  0.85153611 0.29817276]
[0.13132075 0.64774007 0.17275748]
[0.3109434  0.59900273 0.26079734]
[0.1509434  0.64114525 0.42940199]
[0.11018868 0.31542545 0.38289037]
[0.13056604 0.91732347 0.23754153]
[0.08981132 0.86971208 0.14036545]
[0.08       0.89030079 0.40614618]
[0.04830189 0.63905421 0.20099668]
[0.11622642 0.34148303 0.28239203]
[0.10264151 0.81663182 0.16611296]
[0.1290566  0.70966704 0.18604651]
[0.16150943 0.94595464 0.26993355]
[0.12226415 0.81035869 0.19019934]
[0.18415094 0.98745376 0.51578073]
[0.0845283  0.6171787  0.20348837]
[0.07396226 0.54688757 0.23089701]
[0.14037736 0.76017372 0.17940199]
[0.05509434 0.46646292 0.20016611]
[0.21132075 0.65867782 0.29568106]
[0.08150943 0.96718675 0.24169435]
[0.25962264 0.6625382  0.24169435]
[0.05207547 0.84687148 0.1154485 ]
[0.05207547 0.85008847 0.10963455]
[0.12981132 0.92665273 0.26744186]
[0.10264151 0.31735564 0.17857143]
[0.00981132 0.63712401 0.17275748]
[0.06113208 0.677658

[0.1645283  0.78027988 0.38372093]
[0.22188679 0.8769503  0.10797342]
[0.15169811 0.27891266 0.1769103 ]
[0.2309434  0.32861509 0.4410299 ]
[0.08377358 0.63873251 0.12790698]
[0.13811321 0.07752935 0.41362126]
[0.05886792 0.64580988 0.19518272]
[0.14490566 0.96750844 0.19933555]
[0.07849057 0.4719318  0.20348837]
[0.12       0.98166318 0.33637874]
[0.08981132 0.92150555 0.27242525]
[0.0445283  0.74521473 0.18438538]
[0.05735849 0.54109699 0.29401993]
[0.18037736 0.98053724 0.35963455]
[0.18264151 0.51246582 0.660299  ]
[0.18490566 0.7646775  0.17940199]
[0.07018868 0.50715779 0.34800664]
[0.0754717  0.71529677 0.17774086]
[0.18641509 0.98278913 0.20847176]
[0.08603774 0.9102461  0.18272425]
[0.12377358 0.17130449 0.18272425]
[0.12528302 0.80440727 0.31976744]
[0.09283019 0.27682162 0.15116279]
[0.08377358 0.51970404 0.32142857]
[0.24679245 0.85957857 0.25415282]
[0.09660377 0.33167122 0.35299003]
[0.11698113 0.52983754 0.67275748]
[0.09358491 0.83464694 0.11627907]
[0.09132075 0.941933

[0.10264151 0.91249799 0.28654485]
[0.06641509 0.20090076 0.13704319]
[0.06264151 0.82435258 0.12541528]
[0.09056604 0.91957536 0.17607973]
[0.0845283  0.97024288 0.30980066]
[0.09358491 0.76660769 0.17940199]
[0.09132075 0.17146534 0.24750831]
[0.06566038 0.7760978  0.12126246]
[0.1954717  0.75727843 0.24667774]
[0.11471698 0.55975551 0.21013289]
[0.10233962 0.33569246 0.40116279]
[0.08528302 0.85459225 0.14368771]
[0.13509434 0.7649992  0.34634551]
[0.13207547 0.36834486 0.29568106]
[0.17962264 0.57455364 0.21843854]
[0.10490566 0.65047451 0.36378738]
[0.17509434 0.85073186 0.160299  ]
[0.18415094 0.78188837 0.13372093]
[0.15320755 0.30030561 0.19435216]
[0.20981132 0.80923275 0.19767442]
[0.08301887 0.64468393 0.19933555]
[0.09358491 0.71947885 0.17275748]
[0.45056604 0.70580666 0.3986711 ]
[0.12754717 0.5271031  0.25332226]
[0.08754717 0.35805051 0.16362126]
[0.38792453 0.80794595 0.39950166]
[0.25433962 0.87067718 0.09883721]
[0.08       0.64854431 0.11129568]
[0.07924528 0.961074

[0.21962264 0.94643719 0.21511628]
[0.11849057 0.6742802  0.09302326]
[0.06415094 0.81840116 0.19518272]
[0.22573585 0.27569567 0.29568106]
[0.27773585 0.7421586  0.3845515 ]
[0.12226415 0.67218916 0.41694352]
[0.14867925 0.81437993 0.31063123]
[0.13886792 0.79716905 0.14368771]
[0.30566038 0.78301432 0.1013289 ]
[0.0709434  0.53385877 0.17026578]
[0.10264151 0.99115329 0.25996678]
[0.28226415 0.90220364 0.12790698]
[0.15773585 0.25558951 0.14119601]
[0.11773585 0.44732186 0.28903654]
[0.09283019 0.54125784 0.17275748]
[0.12150943 0.68039247 0.10465116]
[0.0709434  0.5953032  0.13122924]
[0.09132075 0.72848641 0.18770764]
[0.06716981 0.59819849 0.37873754]
[0.0709434  0.79813415 0.09883721]
[0.07320755 0.32700659 0.3679402 ]
[0.05433962 0.67138491 0.11461794]
[0.26188679 0.90767251 0.24584718]
[0.21509434 0.53804086 0.24335548]
[0.07169811 0.96236127 0.17607973]
[0.04830189 0.49091202 0.16196013]
[0.14113208 0.52227763 0.30481728]
[0.06415094 0.77352421 0.13704319]
[0.11320755 0.953997

[0.05735849 0.5740711  0.10880399]
[0.06415094 0.77384591 0.13621262]
[0.22113208 0.24979894 0.14950166]
[0.13660377 0.95898343 0.19933555]
[0.04981132 0.3593373  0.16445183]
[0.25056604 0.68409201 0.43687708]
[0.08301887 0.42206852 0.27990033]
[0.05433962 0.06369632 0.43189369]
[0.0445283  0.79073508 0.1013289 ]
[0.15169811 0.32813254 0.36295681]
[0.18792453 0.66302075 0.32475083]
[0.15698113 0.49557664 0.27159468]
[0.22641509 0.37719157 0.28322259]
[0.18264151 0.49461155 0.44186047]
[0.08754717 0.78446196 0.09468439]
[0.07018868 0.88306257 0.13787375]
[0.08301887 0.64227119 0.17275748]
[0.03471698 0.584848   0.14036545]
[0.09358491 0.87276822 0.14119601]
[0.08830189 0.87196397 0.14119601]
[0.12754717 0.31671224 0.39700997]
[0.04603774 0.21296445 0.2051495 ]
[0.30188679 0.32588065 0.4410299 ]
[0.13886792 0.50940968 0.19684385]
[0.06490566 0.80682001 0.1038206 ]
[0.25962264 0.71224063 0.40697674]
[0.07471698 0.63535467 0.1179402 ]
[0.22867925 0.37719157 0.30315615]
[0.06641509 0.928582

## Model

### Creating our own implementation of KNN regressor

In [56]:
class KNN:
    def __init__(self, k=3):
        self.k = k
        
    def fit(self, X, y):
        self.X_train = X
        self.y_train = y
    
    def predictOutput(self, X):
        neighbours=[]
        responses=[]
        
        for i in range(len(X)):
            neighbours.append(self.distNeighbours(self.X_train, self.y_train, X[i,:], self.k))
        
        for i in neighbours:
            mean=0
            
            for j in i:
                mean += j[-1]
            mean /= self.K
            
            responses.append(mean)
            
        return responses
             
    def distNeighbours(self, X_train, Y_train, X_test, K):
        distance=[]

        # Calculating the Euclidian Distance
        for i in range(len(X_train)):
            eDistance=0
            for j in range(X_train.shape[1]):   
                    eDistance += round(np.sqrt(pow((X_train[i,j] - X_test[j]),2)),2)
            distance.append((eDistance,i,Y_train[i]))
            distance=sorted(distance, key=lambda x: x[0])[0:K]
        return distance

    # Accuarcy of the categorical predictions 
    def getAccuracyCategorical(self, y, y_pred):
        '''
        Just in case they are not an numpy array :p
        COMPLETELY OPTIONAL THE FOLLOWING 2 LINES
        '''
        y = np.array(y)
        y_pred = np.array(y_pred)
        
        correct = np.sum(y == y_pred)
        
        return round((correct/len(y))*100,2)

    # Accuarcy of the numerical predictions
    def getAccuracyNumeric(self, y, y_pred):
        error=0
        
        for i in range(len(y_pred)):
            error += pow((y[i] - y_pred[i]),2)
            
        error = error/len(y_pred)-1
        
        return (100 - error)

In [57]:
# Initializing the model
my_knn = KNN(k=3)

In [58]:
# Fitting
my_knn.fit(X_train, y_train)

In [None]:
#This will run for a long time
'''LONG TIME - ONE HELL OF A LONG TIME'''
y_pred = my_knn.predictOutput(X_test)

**Metrices**

In [None]:
print("----------KNN----------")
test_accuracy = my_knn.getAccuracyNumeric(y_test, y_pred)
print(f"Testing Accuracy : {test_accuracy}")

## Lets use Sklearn's KNN implementation

In [None]:
from sklearn.neighbors import KNeighborsRegressor

In [None]:
nn = KNeighborsRegressor(n_neighbors=5, n_jobs=-1)

nn.fit(X_train, y_train)

sk_preds = nn.predict(X_test)

acc = (np.sum(sk_preds == y_test)/len(y_test))*100
rmse = np.sqrt(mean_squared_error(y_test,sk_preds))

print(f'Root Mean Squared Error: {rmse:.2f}')
print(f'Testing Accuracy: {acc:.2f}')

## Finding optimal k for King County Dataset

In [None]:
ks = range(1, 30)

test_errors = np.zeros(len(list(ks)))

for i, k in enumerate(ks):
    
    nn = KNeighborsRegressor(n_neighbors=k, n_jobs=-1)

    nn.fit(X_train, y_train)
    test_preds = nn.predict(X_test)
    
    test_errors[i] = np.sqrt(mean_squared_error(y_test, test_preds))

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))

ax.plot(list(ks), test_errors)
ax.axvline(list(ks)[np.argmin(test_errors)], linestyle='--', color='black');

In [None]:
optimal_k = list(ks)[np.argmin(test_errors)]

optimal_error = np.min(test_errors)

print(f'Optimal number of Neighbors: {optimal_k} Root Mean Squared Error: {optimal_error:.2f}')

## Trial

In [None]:
y_test = np.array([1, 2, 3, 4, 5])
predictions = np.array([1, 0, 5, 2, 5])

print(np.sum(predictions == y_test))

In [None]:
pow(2, 3)

In [None]:
X_train.iloc[1,2]

In [None]:
X_test[2]

In [None]:
print([euc(X_test,X_train) for x_train in X_train])