### ALGORITHM 7: KNN (K- NEAREST NEIGHBORS) CLASSIFIER

This algorithm can be used for both classification and regression. The approach is simple, plot all the data
points and the new data points are voted by 'k' no. of nearest neighbors and the points are assigned to the class with the majority of the votes. If k=1, new data point is assigned to the class of the nearest neighbor hence the selection of K times can be difficult in some cases. KNN is computationally expensive.

```
distance between two data points (D) = square root of ((x2-x1)^2 + (y2-y1)^2) [Euclidean distance]
```

![knn-1](../docs/knn1.jpg)

The K-NN working can be explained on the basis of the below algorithm:

Step-1: Select the number K of the neighbors

Step-2: Calculate the Euclidean distance of K number of neighbors

Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.

Step-4: Among these k neighbors, count the number of the data points in each category.

Step-5: Assign the new data points to that category for which the number of the neighbor is maximum.

Step-6: Our model is ready.


In [9]:
# importing required libraries
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# read the train and test dataset
dataset = pd.read_csv("../data/titanic.csv")
train_data, test_data = train_test_split(dataset, test_size=0.2, shuffle=False)

# separate the train X,y and test X,y dataset
train_X = train_data.drop("Survived", axis=1)
train_y = train_data["Survived"]

test_X = test_data.drop("Survived", axis=1)
test_y = test_data["Survived"]


In [10]:
# create the model and train with data
model = KNeighborsClassifier()
model.fit(train_X, train_y)

print("no. of neighbors used to predict :", model.n_neighbors)
print("test data :")
display(test_data.head())


no. of neighbors used to predict : 5
test data :


Unnamed: 0,Survived,Age,Fare,Pclass_1,Pclass_2,Pclass_3,Sex_female,Sex_male,SibSp_0,SibSp_1,...,Parch_0,Parch_1,Parch_2,Parch_3,Parch_4,Parch_5,Parch_6,Embarked_C,Embarked_Q,Embarked_S
712,0,35.0,7.125,0,0,1,0,1,1,0,...,1,0,0,0,0,0,0,0,0,1
713,0,20.0,7.05,0,0,1,0,1,1,0,...,1,0,0,0,0,0,0,0,0,1
714,0,26.0,7.8958,0,0,1,0,1,1,0,...,1,0,0,0,0,0,0,0,0,1
715,1,58.0,146.5208,1,0,0,1,0,1,0,...,1,0,0,0,0,0,0,1,0,0
716,1,35.0,83.475,1,0,0,1,0,0,1,...,1,0,0,0,0,0,0,0,0,1


In [11]:
# predict the results
pred_y = model.predict(test_X)
print("predicted survivors :", pred_y[:5])

# score of the model
score = accuracy_score(test_y, pred_y)
print("score of model      :", score * 100, "%")


predicted survivors : [0 0 0 1 1]
score of model      : 72.06703910614524 %
