# K-Nearest Neighbors
Scikit-learn implementation
Let's implement K-nearest neighbors using scikit-learn .

We will load iris dataset in sklearn.datasets for our KNN implementation. Let's load independent variables to X and target variable to y.

In [1]:
## load the dataset
from sklearn.datasets import load_iris
dataset = load_iris()

In [2]:
# X - independent variables
X = dataset.data
 
# y - target variable
y = dataset.target

scale X using StandardScaler before implementing the distance metric based KNN algorithm.

In [3]:
## pre-processing
# standardize the data to make sure each feature contributes equally 
# to the distance
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
X_processed = ss.fit_transform(X)

In [4]:
# Dataset after scaling
print('Dataset after scaling')
print('X_processed (first 5 rows)')
print(X_processed[:5, :])

Dataset after scaling
X_processed (first 5 rows)
[[-0.90068117  1.01900435 -1.34022653 -1.3154443 ]
 [-1.14301691 -0.13197948 -1.34022653 -1.3154443 ]
 [-1.38535265  0.32841405 -1.39706395 -1.3154443 ]
 [-1.50652052  0.09821729 -1.2833891  -1.3154443 ]
 [-1.02184904  1.24920112 -1.34022653 -1.3154443 ]]


In [7]:
# Print first 5 values of y
print('y (first 5 values)')
print(y[:5]) 

y (first 5 values)
[0 0 0 0 0]


split the data into train and test data, to enable evaluating the model trained.

In [8]:
## split the dataset into train and test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_processed, y, test_size=0.3, random_state=42)

Now, let's fit the KNN model. We will set n_neighbors=5 .

In [10]:
## fit n nearest neighbor model 
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=5, metric="minkowski", p=2)
# p=2 for euclidean distance
 

In [11]:
# fit the model with train set
model.fit(X_train, y_train)
print('\nDisplay all arguments scikit-learn is using for the KNN model')
print(model)


Display all arguments scikit-learn is using for the KNN model
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform')


Finally, let's evaluate the model.

In [12]:
## evaluate
score = model.score(X_test, y_test)
print('\nScore:', score)


Score: 1.0
