## KNN Classification

### About
KNN regression is a non-parametric method that, in an intuitive manner, approximates the association between independent variables and the continuous outcome by averaging the observations in the same neighbourhood.

Classificates depending on K nearest neighbors classes

Each point has a vote, highest voted class in the neighbourhood wins

K should always be odd

### Imports

In [53]:
import pandas as pd
import numpy as np
import sklearn
from sklearn import preprocessing
from sklearn.utils import shuffle
from sklearn.neighbors import KNeighborsClassifier

### Read data

In [54]:
data = pd.read_csv("../../data/cars/car.data")
data.head()

Unnamed: 0,buying,maint,door,persons,lug_boot,safety,class
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


### Preprocess data
Convert string values to corresponding integers
> Example
>- low: 0
>- med: 1
>- high: 2

In [55]:
label_encoder = preprocessing.LabelEncoder()

In [56]:
buying = label_encoder.fit_transform(list(data['buying']))
maint = label_encoder.fit_transform(list(data['maint']))
door = label_encoder.fit_transform(list(data["door"]))
persons = label_encoder.fit_transform(list(data["persons"]))
lug_boot = label_encoder.fit_transform(list(data['lug_boot']))
safety = label_encoder.fit_transform(list(data['safety']))
cls = label_encoder.fit_transform(list(data['class']))


### Select attribute to predict

In [57]:
predict = "class"

### Split data

In [58]:
x = list(zip(buying, maint, door, persons, lug_boot, safety))
y = list(cls)

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)

### Train model
> k = number of neighbors

In [59]:
k = 9
model = KNeighborsClassifier(n_neighbors=k)

In [60]:
model.fit(x_train, y_train)

KNeighborsClassifier(n_neighbors=9)

### Model accuracy

In [61]:
acc = model.score(x_test, y_test)
acc

0.9421965317919075

### Model prediction

In [62]:
names = ['acc', 'good', 'unacc', 'vgood']

In [63]:
predicted = model.predict(x_test)

In [64]:
for x in range(len(predicted)):
    print('Predicted:', names[predicted[x]], "Actual:", names[y_test[x]], "Data:", x_test[x])