This repository contains a Python implementation of a K-Nearest Neighbors (KNN) classifier from scratch. The KNN classifier is applied to the "BankNote_Authentication" dataset, which consists of four features (variance, skew, curtosis, and entropy) and a class attribute indicating whether a banknote is real or forged.
The dataset used for training and testing the KNN classifier is provided in the "BankNote_Authentication.csv" file. The dataset is loaded into a pandas DataFrame and then shuffled to ensure randomization during training and testing.
The KNN classifier is implemented in the KNN_Classifier
class. The class takes the following inputs during initialization:
x_train
: The training data features.y_train
: The training data labels.x_test
: The test data features.k
: The number of nearest neighbors to consider.
The KNN classifier consists of the following methods:
This method calculates the Euclidean distance between a training row and a test row. It takes two input vectors and computes the Euclidean distance according to the formula:
distance = sqrt(sum((x_train_row[i] - x_test_row[i])^2))
This method predicts the class for each test point based on the K-nearest neighbors in the training data. For each test point, the Euclidean distance is calculated between the test point and all training points. The K-nearest neighbors with the smallest distances are determined, and their corresponding class labels are counted. If there is a tie in the number of votes for different classes, the tie is broken in favor of the class that comes first in the training data.
This method calculates the accuracy of the KNN classifier by comparing the predicted labels with the true labels for the test data. The accuracy is computed as the ratio of correctly classified instances to the total number of instances in the test set.
Before training and testing the KNN classifier, the feature columns are normalized separately using the mean and standard deviation of the values in the training data. Each feature is transformed using the function:
f(v) = (v - mean) / std
This normalization ensures that each feature contributes equally to the distance calculation.
The dataset is split into 70% for training and 30% for testing. The training and test sets are created by dividing the feature and label arrays accordingly.
The KNN classifier is then trained on the training data and tested on the test data for different values of K ranging from 1 to 9. For each value of K, the classifier's accuracy is calculated and stored in a list.
The KNN classifier is evaluated using different values of K ranging from 1 to 15. The accuracy of the classifier is measured for each K value, and the results are summarized in the following table:
K |
Accuracy |
1 |
1.0 |
2 |
1.0 |
3 |
1.0 |
4 |
1.0 |
5 |
1.0 |
6 |
1.0 |
7 |
1.0 |
8 |
1.0 |
9 |
1.0 |
10 |
1.0 |
11 |
1.0 |
12 |
0.9975728155339806 |
13 |
1.0 |
14 |
0.9975728155339806 |
15 |
0.9975728155339806 |
The results of the KNN classifier for different values of K are displayed in the console. The output includes the value of K used for the test set and summary information for each K value, including the number of correctly classified test instances, the total number of instances in the test set, and the accuracy.
An example of the output:
K Value: 12
Number of correctly classified instances: 444
Total number of instances: 445
Accuracy: 0.9975728155339806
This code provides implementation of a KNN classifier from scratch using Python. It demonstrates the steps involved in training and testing a KNN classifier, including data normalization, distance calculation, and prediction. By experimenting with different values of K, the code evaluates the performance of the classifier and provides accuracy metrics for each K value.
Contributions are welcome! If you find any issues or have suggestions for improvement, please open an issue or submit a pull request.
- Khaled Ashraf Hanafy Mahmoud - 20190186.
- Noura Ashraf Abdelnaby Mansour - 20190592.
- Samaa Khalifa Elsayed Othman - 20190247.
This program is licensed under the MIT License.