In [1]:
import math

def classifyAPoint(points,p,k=3):
	'''
	This function finds the classification of p using
	k nearest neighbor algorithm. It assumes only two
	groups and returns 0 if p belongs to group 0, else
	1 (belongs to group 1).

	Parameters - 
		points: Dictionary of training points having two keys - 0 and 1
				Each key have a list of training data points belong to that 

		p : A tuple, test data point of the form (x,y)

		k : number of nearest neighbour to consider, default is 3 
	'''

	distance=[]
	for group in points:
		for feature in points[group]:

			#calculate the euclidean distance of p from training points 
			euclidean_distance = math.sqrt((feature[0]-p[0])**2 +(feature[1]-p[1])**2)

			# Add a tuple of form (distance,group) in the distance list
			distance.append((euclidean_distance,group))

	# sort the distance list in ascending order
	# and select first k distances
	distance = sorted(distance)[:k]

	freq1 = 0 #frequency of group 0
	freq2 = 0 #frequency og group 1

	for d in distance:
		if d[1] == 0:
			freq1 += 1
		elif d[1] == 1:
			freq2 += 1

	return 0 if freq1>freq2 else 1

# driver function
def main():

	# Dictionary of training points having two keys - 0 and 1
	# key 0 have points belong to class 0
	# key 1 have points belong to class 1

	points = {0:[(1,12),(2,5),(3,6),(3,10),(3.5,8),(2,11),(2,9),(1,7)],
			1:[(5,3),(3,2),(1.5,9),(7,2),(6,1),(3.8,1),(5.6,4),(4,2),(2,5)]}

	# testing point p(x,y)
	p = (2.5,7)

	# Number of neighbours 
	k = 3

	print("The value classified to unknown point is: {}".\
		format(classifyAPoint(points,p,k)))

if __name__ == '__main__':
	main()


The value classified to unknown point is: 0


# Another Example

In [3]:
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the iris dataset
data = load_iris()
X = data.data
y = data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the KNN classifier with k=3
knn = KNeighborsClassifier(n_neighbors=3)

# Train the model
knn.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')




Accuracy: 100.00%


K-Nearest Neighbors (KNN) is one of the simplest yet effective machine learning algorithms used for both classification and regression tasks. Here's an overview of KNN, its working principles, advantages, disadvantages, and applications:

### What is K-Nearest Neighbors (KNN)?

K-Nearest Neighbors is a non-parametric, lazy learning algorithm. Non-parametric means that it makes no explicit assumptions about the form of the function it is trying to learn, and lazy learning means that the algorithm does not learn an explicit model during training but instead stores the training data and performs computations only when it needs to make predictions.

### How Does KNN Work?

1. **Data Storage**: The algorithm stores all the training data points.
2. **Distance Calculation**: When a prediction is required for a new data point, the algorithm calculates the distance between the new data point and all the stored data points. Common distance metrics include Euclidean distance, Manhattan distance, and Minkowski distance.
3. **Neighbor Identification**: The algorithm identifies the 'k' training data points that are closest to the new data point.
4. **Voting/Prediction**:
   - **Classification**: For classification tasks, the algorithm assigns the most common class among the 'k' nearest neighbors to the new data point.
   - **Regression**: For regression tasks, the algorithm averages the values of the 'k' nearest neighbors to predict the value for the new data point.

### Key Parameters in KNN

1. **Number of Neighbors (k)**: The number of nearest neighbors to consider for making the prediction. A small 'k' can lead to noise sensitivity, while a large 'k' can make the algorithm computationally intensive.
2. **Distance Metric**: The measure of similarity or dissimilarity between data points. Common metrics include:
   - **Euclidean Distance**: \( \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2} \)
   - **Manhattan Distance**: \( \sum_{i=1}^{n} |x_i - y_i| \)
   - **Minkowski Distance**: \( (\sum_{i=1}^{n} |x_i - y_i|^p)^{1/p} \)

### Advantages of KNN

1. **Simplicity**: KNN is easy to understand and implement.
2. **No Training Phase**: Since it is a lazy learner, there is no explicit training phase, which makes it straightforward.
3. **Versatility**: Can be used for both classification and regression tasks.
4. **Adaptability**: Works well with multi-class problems and can handle both binary and multi-class classification problems.

### Disadvantages of KNN

1. **Computationally Intensive**: The algorithm can be slow for large datasets since it requires distance calculations for each query point.
2. **Memory Intensive**: Requires storing all the training data, which can be impractical for large datasets.
3. **Curse of Dimensionality**: As the number of dimensions increases, the distance between points becomes less meaningful, which can degrade the algorithm's performance.
4. **Sensitivity to Noise**: KNN is sensitive to noisy data and outliers.

### Applications of KNN

1. **Image Recognition**: Used in image classification tasks where the similarity between images is measured.
2. **Recommendation Systems**: Utilized to recommend products based on the similarity of users' preferences.
3. **Finance**: Applied in predicting stock prices and credit risk assessment.
4. **Medical Diagnosis**: Used for diagnosing diseases based on patient data and historical records.

### Example of KNN in Python

Here's a simple example using the `scikit-learn` library to implement KNN for a classification task:

```python
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the iris dataset
data = load_iris()
X = data.data
y = data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the KNN classifier with k=3
knn = KNeighborsClassifier(n_neighbors=3)

# Train the model
knn.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
```

This example demonstrates how to load a dataset, split it into training and testing sets, train a KNN classifier, make predictions, and evaluate the model's accuracy. Adjusting the number of neighbors (k) and experimenting with different distance metrics can help optimize the model for specific datasets and tasks.

Explanation
Data Loading: We load the Iris dataset using load_iris() from sklearn.datasets.
Data Splitting: We split the dataset into training and testing sets using train_test_split().
Model Initialization: We initialize the KNN classifier with n_neighbors=3.
Model Training: We train the KNN model using the training data.
Predictions: We make predictions on the test data.
Evaluation: We evaluate the model's accuracy, print the classification report, and display the confusion matrix.
Visualization: We visualize the decision boundaries for the first two features of the dataset.
You can modify the number of neighbors (n_neighbors) and experiment with different distance metrics to see how they affect the model's performance. This example provides a solid foundation for understanding and implementing the KNN algorithm in Python.