In [None]:
# install libraries
# This script installs the necessary libraries for data analysis and machine learning.
! pip install pandas scikit-learn matplotlib

# K-Nearest Neighbors (KNN) Classification - Fruits Dataset

In this notebook, we will use K-Nearest Neighbors (KNN) to classify a dataset of fruits based on their size and weight. We will then visualize the dataset and evaluate the performance of our model.


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

## Step 1: Data Collection

We will start by reading in the dataset that contains information about various fruits.

In [None]:
dataset = pd.read_csv("../Data/fruits.csv")
dataset.sample(10)

## Step 2: Data Visualization

Let's visualize the dataset using a scatter plot. We'll plot size versus weight and color the points according to the fruit label.

In [None]:
label_colors = {'Apple': 'red', 'Lemon': 'yellow', 'Orange': 'orange'}
# Use 'Size' for the x-axis and 'Weight' for the y-axis, and color by 'Label'
dataset.plot.scatter(x='Size', y='Weight', c=dataset['Label'].map(label_colors), colormap='viridis', title='Fruits by Label')
# Show the plot
plt.show()

## Step 3: Train a KNN Classifier

Now, we'll prepare the data by splitting it into training and test sets. Then, we'll train a KNN model.

In [None]:
# 3. Train a KNN Classifier
features = dataset[["Size", "Weight"]] 
labels = dataset["Label"] 

# Normalize features
scaler = StandardScaler()
features_scaled = scaler.fit_transform(features)

# Initialize KNN with distance weighting
X_train, X_test, y_train, y_test = train_test_split(features_scaled, labels, test_size=0.33, random_state=42)

# Initialize KNN with distance weighting
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

## Step 4: Make Predictions and Evaluate

After training the KNN Classifier, we will predict the labels for the test data and calculate the accuracy of the model.

In [None]:
# 4. Make predictions with the KNN Classifier
y_pred = knn.predict(X_test)
# Evaluate with additional metrics
print(classification_report(y_test, y_pred))


The accuracy metric gives us a first insight into the model's performance. We can further improve the model by tuning hyperparameters or exploring other machine learning algorithms.