## Today Content
- K-Nearest Neighbours Classifier
- Evaluation Metrics

### KNN Classifier
- knn is very easy to implement
- knn model will assumes that similar things exist nearer to each other.

### How does KNN works
- Step1: Select K no of neighbours. K = 3,4,5,..(take odd values for better result)
- Step2: Calculate the Euclidien distance selected data point to nearest neighbours.
    - **Formula**: sqrt( ( X2 - X1 )^2 + (Y2 - Y1)^2 )
- Step3: Assign new data point to majority of nearest neighbour classifier.

### Advantages and disadvantages
- KNN is mostly used for multinomial classifications.
- With huge dataset, the prediction stage will be slow.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('https://raw.githubusercontent.com/nagamounika5/Datasets/master/Tshirt_size.csv')
df.head()

Unnamed: 0,Height,Weight,Shirt_Size
0,158,58,m
1,158,59,m
2,158,63,m
3,160,59,m
4,160,60,m


In [3]:
df.shape

(18, 3)

In [4]:
df.isnull().sum()

Height        0
Weight        0
Shirt_Size    0
dtype: int64

In [6]:
df['Shirt_Size'].value_counts()

l    11
m     7
Name: Shirt_Size, dtype: int64

In [10]:
X = df[['Height','Weight']] ## Input variable
Y = df['Shirt_Size'] ## Target Variable

### Apply KNN classifier

In [7]:
from sklearn.neighbors import KNeighborsClassifier

In [8]:
knn = KNeighborsClassifier()
knn

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [9]:
knn = KNeighborsClassifier(n_neighbors = 3)
knn

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')

In [11]:
knn.fit(X,Y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')

In [12]:
Y_prediction = knn.predict(X)
Y_prediction

array(['m', 'm', 'm', 'm', 'm', 'm', 'm', 'l', 'l', 'l', 'l', 'l', 'l',
       'l', 'l', 'l', 'l', 'l'], dtype=object)

In [14]:
X.head()

Unnamed: 0,Height,Weight
0,158,58
1,158,59
2,158,63
3,160,59
4,160,60


In [16]:
Y[:5]

0    m
1    m
2    m
3    m
4    m
Name: Shirt_Size, dtype: object

In [17]:
knn.predict([[160,58]])

array(['m'], dtype=object)

### Evaluation Metrics
- Confusion matrix
- accuracy_score

In [18]:
## Confusion matrix
##===================

### Cats = 50, birds = 50, fish = 50
#-----------------------------------------


##          cats           birds         fish
##   cats    50             0             0
##   birds   0              45            5
##   fish    0              4            46

In [19]:
## Accuracy_score
##====================

##                         TP + TN
## accuracy_score = --------------------------
##                     TP + TN + FP + FN

In [20]:
from sklearn.metrics import confusion_matrix
confusion_matrix(Y, Y_prediction)

array([[11,  0],
       [ 0,  7]], dtype=int64)

In [21]:
from sklearn.metrics import accuracy_score
accuracy_score(Y, Y_prediction)

1.0

In [22]:
df2 = pd.read_csv('https://raw.githubusercontent.com/nagamounika5/Datasets/master/IRIS.csv')

In [23]:
df2.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [24]:
df2.shape

(150, 5)

In [27]:
df2['species'].value_counts()

Iris-setosa        50
Iris-virginica     50
Iris-versicolor    50
Name: species, dtype: int64

In [33]:
X2 = df2.iloc[:,[0,1,2,3]]
Y2 = df2['species']

In [35]:
X2.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [36]:
Y2[:5]

0    Iris-setosa
1    Iris-setosa
2    Iris-setosa
3    Iris-setosa
4    Iris-setosa
Name: species, dtype: object

### Apply KNN for IRIS dataset

In [37]:
from sklearn.neighbors import KNeighborsClassifier

In [38]:
knn2 = KNeighborsClassifier(n_neighbors = 3)
knn2

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')

In [39]:
knn2.fit(X2,Y2)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=3, p=2,
                     weights='uniform')

In [41]:
Y_prediction2 = knn2.predict(X2)
Y_prediction2

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versic

In [42]:
from sklearn.metrics import accuracy_score
accuracy_score(Y2, Y_prediction2)

0.96

In [43]:
### Increase K value

In [44]:
from sklearn.neighbors import KNeighborsClassifier

In [45]:
knn3 = KNeighborsClassifier(n_neighbors = 5)
knn3

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [46]:
knn3.fit(X2, Y2)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')

In [47]:
Y_prediction3 = knn3.predict(X2)

In [48]:
accuracy_score(Y2,Y_prediction3)

0.9666666666666667

In [49]:
knn3.predict([[5.1,3.5,1.4,0.2]])

array(['Iris-setosa'], dtype=object)

In [50]:
knn3.predict([[3.1,3.5,5.4,0.2]])

array(['Iris-versicolor'], dtype=object)

### Task:
- Apply KNN model for below dataset. Take k = 3, 5, 7 and find accuracy for each K value
- https://raw.githubusercontent.com/nagamounika5/Datasets/master/Student_PassOrFail.csv