## What is KNN?

K-Nearest Neighbors is one of the simplest supervised machine learning algorithms used for classification. It classifies a data point based on its neighbors’ classifications. It stores all available cases and classifies new cases based on similar features. 

The following example below shows a KNN algorithm being leveraged to predict if a glass of wine is red or white. Different variables that are considered in this KNN algorithm include sulphur dioxide and chloride levels. 

![chloride.jpg](attachment:chloride.jpg)

K in KNN is a parameter that refers to the number of nearest neighbors in the majority voting process. 

![chloride-level.jpg](attachment:chloride-level.jpg)

Here, we have taken K=5. The majority votes from its fifth nearest neighbor and classifies the data point. The glass of wine will be classified as red since four out of five neighbors are red. 

### How to Choose the Factor ‘K’?

A KNN algorithm is based on feature similarity. Selecting the right K value is a process called parameter tuning, which is important to achieve higher accuracy.

There is not a definitive way to determine the best value of K. It depends on the type of problem you are solving, as well as the business scenario. The most preferred value for K is five. Selecting a K value of one or two can be noisy and may lead to outliers in the model, and thus resulting in overfitting of the model. The algorithm performs well on the training set, compared to its true performance on unseen test data.

Consider the following example below to predict which class the new data point belongs to.

If you take K=3, the new data point is a red square. 

![k-3.jpg](attachment:k-3.jpg)

But, if we consider K=7, the new data point is a blue triangle. This is because the amount of red squares outnumbers the blue triangles.

![k-7.jpg](attachment:k-7.jpg)

To choose the value of K, take the square root of n (sqrt(n)), where n is the total number of data points. Usually, an odd value of K is selected to avoid confusion between two classes of data.

### When Do We Use the KNN Algorithm?

The KNN algorithm is used in the following scenarios:

1. Data is labeled
2. Data is noise-free
3. Dataset is small, as KNN is a lazy learne

## How Does a KNN Algorithm Work?

Consider a dataset that contains two variables: height (cm) & weight (kg). Each point is classified as normal or underweight.

![weight-2.jpg](attachment:weight-2.jpg)

Based on the above data, you need to classify the following set as normal or underweight using the KNN algorithm.

To find the nearest neighbors, we will calculate the Euclidean distance.

The Euclidean distance between two points in the plane with coordinates (x,y) and (a,b) is given by:

![dist.jpg](attachment:dist.jpg)

![distance.jpg](attachment:distance.jpg)

Let us calculate the Euclidean distance with the help of unknown data points.

![distance-2.jpg](attachment:distance-2.jpg)

The following table shows the calculated Euclidean distance of unknown data points from all points.

![height.jpg](attachment:height.jpg)

Now, we have a new data point (x1, y1), and we need to determine its class.

![kg.jpg](attachment:kg.jpg)

Looking at the new data, we can consider the last three rows from the table—K=3.

![rows.jpg](attachment:rows.jpg)

Since the majority of neighbors are classified as normal as per the KNN algorithm, the data point (57, 170) should be normal.