## Day Objectives
- K Nearest Neighbors
- Logistic Regression

### Classification Models
- K Nearest Neighbors
- Logistic Regression
- Support Vector Machine learning
- Decision Tree
- Random Forest

### K Nearest Neighbors
- Simple and easy to implement.
- Main disadvantage is, if dataset contains huge amount of data then execution will become slow.

### How does KNN works
- Step1: Select K no of neighbors(K = 3,4,5,6.....)
- Step2: Calculate the distance between new data point and nearest neighbors.
    - Distance formula: ***sqrt( (X2 - X1)^2 + (Y2 - Y1)^2 )***
- Step3: It will asign the new data point to the nearest classifier.

In [1]:
import numpy as np
import pandas as pd

In [3]:
df = pd.read_csv('https://raw.githubusercontent.com/nagamounika5/Datasets/master/IRIS.csv')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [4]:
df['species'].value_counts()

Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: species, dtype: int64

In [6]:
X = df.drop(['species'], axis = 1) # Independent variables
Y = df['species'] # Dependent variables

In [8]:
X.head(3)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2


In [10]:
Y

0         Iris-setosa
1         Iris-setosa
2         Iris-setosa
3         Iris-setosa
4         Iris-setosa
            ...      
145    Iris-virginica
146    Iris-virginica
147    Iris-virginica
148    Iris-virginica
149    Iris-virginica
Name: species, Length: 150, dtype: object

### Split the dataset for testing and training

In [11]:
from sklearn.model_selection import train_test_split

In [12]:
xtrain, xtest, ytrain, ytest = train_test_split(X, Y, test_size = 0.2, random_state=1)

In [14]:
df.head(3)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa


In [13]:
df.shape

(150, 5)

In [15]:
xtrain.shape, xtest.shape

((120, 4), (30, 4))

In [16]:
ytrain.shape, ytest.shape

((120,), (30,))

In [17]:
xtrain

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
91,6.1,3.0,4.6,1.4
135,7.7,3.0,6.1,2.3
69,5.6,2.5,3.9,1.1
128,6.4,2.8,5.6,2.1
114,5.8,2.8,5.1,2.4
...,...,...,...,...
133,6.3,2.8,5.1,1.5
137,6.4,3.1,5.5,1.8
72,6.3,2.5,4.9,1.5
140,6.7,3.1,5.6,2.4


In [18]:
xtest

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
14,5.8,4.0,1.2,0.2
98,5.1,2.5,3.0,1.1
75,6.6,3.0,4.4,1.4
16,5.4,3.9,1.3,0.4
131,7.9,3.8,6.4,2.0
56,6.3,3.3,4.7,1.6
141,6.9,3.1,5.1,2.3
44,5.1,3.8,1.9,0.4
29,4.7,3.2,1.6,0.2
120,6.9,3.2,5.7,2.3


**Apply KNN classifier**

In [19]:
from sklearn.neighbors import KNeighborsClassifier

In [20]:
knn = KNeighborsClassifier(n_neighbors=3)

In [21]:
knn.fit(xtrain, ytrain)# training

KNeighborsClassifier(n_neighbors=3)

In [22]:
Y_predict = knn.predict(xtest) # testing
Y_predict

array(['Iris-setosa', 'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',
       'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',
       'Iris-setosa', 'Iris-setosa', 'Iris-virginica', 'Iris-versicolor',
       'Iris-setosa', 'Iris-virginica', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',
       'Iris-virginica', 'Iris-versicolor', 'Iris-setosa', 'Iris-setosa',
       'Iris-versicolor', 'Iris-virginica'], dtype=object)

In [25]:
xtest.tail(3)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
19,5.1,3.8,1.5,0.3
73,6.1,2.8,4.7,1.2
146,6.3,2.5,5.0,1.9


In [26]:
ytest[-3:]

19         Iris-setosa
73     Iris-versicolor
146     Iris-virginica
Name: species, dtype: object

In [27]:
knn.predict([[5.1, 3.8, 1.5, 0.3]])

array(['Iris-setosa'], dtype=object)

In [29]:
knn.predict([[3.1, 5.8, 5.9, 4.9]])

array(['Iris-virginica'], dtype=object)

### Evaluation metrics for Classification modles

In [30]:
from sklearn.metrics import accuracy_score

In [31]:
accuracy_score(ytest, Y_predict)

1.0

- Incase, we are not satisfied with accuracy, then increase K value.

In [32]:
knn2 = KNeighborsClassifier(n_neighbors = 5)

In [33]:
knn2.fit(xtrain, ytrain)

KNeighborsClassifier()

In [34]:
Y_predict2 = knn2.predict(xtest)

In [35]:
accuracy_score(ytest, Y_predict2)

1.0

### task
- Load the dataset(Fish species) from repository
- Apply KNN classifier to that data
    - 1st take k=3 and check accuray
    - 2nd take k=5 and check accuray