## We will classify fruits based on:

- Weight (grams) - Heavier fruits might be oranges, lighter ones might be apples.
- Color Intensity (1-100 scale) - Apples tend to have a deep red color, oranges have a lighter orange shade.
- Roundness (1-100 scale) - Apples are usually more round than oranges.

### The target variable (y) will be:

- 0 = Apple
- 1 = Orange


K-Nearest Neighbors (KNN) follows a very simple training process because it does not actually "train" a model in the traditional sense. Instead, it just stores all the training data and waits for a prediction request.

Now, let’s go step by step through the training phase and explain how KNN handles multiple features.




###  Training Scenario
#### We have a dataset where we are classifying fruits as either Apples (0) or Oranges (1) based on the following features:

##### Feature Name	|   What It Represents
- Weight (grams)	|How heavy the fruit is
- Color Intensity (scale 1-100) |	How dark the color is
- Roundness (scale 1-100)	|How round the fruit is

In [2]:
# Step 1: Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score


In [40]:
# # Creating 100 random fruit samples
# data = pd.DataFrame({
#     'Weight': np.random.randint(120, 250, 100),   # Weight in grams (120g - 250g)
#     'Color_Intensity': np.random.randint(20, 100, 100),  # Color shade scale (1-100)
#     'Roundness': np.random.randint(50, 100, 100), # Roundness scale (1-100)
#     'Fruit_Type': np.random.choice([0, 1], 100)  # 0 = Apple, 1 = Orange
# })

# # Display the first few rows of the dataset
# print("Sample Dataset:")
# print(data.head())

In [51]:

# Creating 100 random fruit samples
mix_data = pd.DataFrame({
    'Weight': np.random.randint(90, 150, 100),   # Weight in grams (100g - 150g)
    'Color_Intensity': np.random.randint(10, 100, 100),  # Color shade scale (1-100)
    'Roundness': np.random.randint(30, 60, 100), # Roundness scale (1-100)
    'Fruit_Type': np.random.choice([0,1], 100)  # 0 = Apple, 1 = Orange
})

# Display the first few rows of the dataset
print("Sample Dataset MIX:")
(mix_data.head())

Sample Dataset MIX:


Unnamed: 0,Weight,Color_Intensity,Roundness,Fruit_Type
0,124,72,58,1
1,108,94,36,0
2,137,41,34,0
3,105,96,51,0
4,92,42,58,1


In [54]:

# Step 2: Generate a random dataset (APPLES)
np.random.seed(42)  # Ensures the same random numbers each time for reproducibility

# Creating 100 random fruit samples
data_apple = pd.DataFrame({
    'Weight': np.random.randint(100, 150, 100),   # Weight in grams (100g - 150g)
    'Color_Intensity': np.random.randint(10, 60, 100),  # Color shade scale (1-100)
    'Roundness': np.random.randint(30, 50, 100), # Roundness scale (1-100)
    'Fruit_Type': np.random.choice([0], 100)  # 0 = Apple, 1 = Orange
})

# Display the first few rows of the dataset
print("Sample Dataset APPLE:")



Sample Dataset APPLE:


In [47]:
# Step 2.1: Generate a random dataset (ORANGE)

np.random.seed(42)  # Ensures the same random numbers each time for reproducibility

data_oranges = pd.DataFrame({
    'Weight': np.random.randint(90, 120, 100),   # Weight in grams (90g - 120g)
    'Color_Intensity': np.random.randint(50, 100, 100),  # Color shade scale (1-100)
    'Roundness': np.random.randint(50, 60, 100), # Roundness scale (1-100)
    'Fruit_Type': np.random.choice([1], 100)  # 0 = Apple, 1 = Orange
})

# Display the first few rows of the dataset
print("Sample Dataset ORANGE:")
(data_oranges.head())



Sample Dataset ORANGE:


Unnamed: 0,Weight,Color_Intensity,Roundness,Fruit_Type
0,96,59,53,1
1,109,85,59,1
2,118,63,56,1
3,104,80,58,1
4,100,97,56,1


In [49]:
data = pd.concat([data_apple, data_oranges], ignore_index=True, sort=False)

data.head()

Unnamed: 0,Weight,Color_Intensity,Roundness,Fruit_Type
0,138,18,44,0
1,128,33,38,0
2,114,10,49,0
3,142,53,46,0
4,107,17,46,0


In [50]:
data.tail()

Unnamed: 0,Weight,Color_Intensity,Roundness,Fruit_Type
195,118,77,54,1
196,107,51,54,1
197,115,91,56,1
198,101,94,58,1
199,91,55,58,1


In [52]:

# Step 3: Split the dataset into training and testing sets
X = data[['Weight', 'Color_Intensity', 'Roundness']]  # Features (inputs)
y = data['Fruit_Type']  # Target (output)

# Splitting 80% training and 20% testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



### How KNN Handles Multiple Features?
Each fruit is stored as a point in a 3D space, where:

- Weight is one axis
- Color Intensity is another axis
- Roundness is the third axis

So, the training data forms a cloud of points in a three-dimensional space.

When making predictions, KNN will:

Measure the distance from a new fruit to all stored points.
Find the closest K neighbors.
Use majority voting to decide the class (Apple or Orange).

Since KNN does not learn a function, it relies on distance measurements between fruits.

The most commonly used distance formula is Euclidean Distance, which calculates how far two points are in the 3D space:

### Euclidean Distance Formula

For two points **Fruit 1** \((W_1, C_1, R_1)\) and **Fruit 2** \((W_2, C_2, R_2)\), the Euclidean distance \(d\) is calculated as:

$$
d = \sqrt{(W_1 - W_2)^2 + (C_1 - C_2)^2 + (R_1 - R_2)^2}
$$

Where:
- \( W \) = **Weight** (grams)
- \( C \) = **Color Intensity** (scale 1-100)
- \( R \) = **Roundness** (scale 1-100)

 
Where:
- 𝑊
1
,
𝐶
1
,
𝑅
1
  are the Weight, Color_Intensity, and Roundness of fruit 1.
- 𝑊
2
,
𝐶
2
,
𝑅
2
  are the Weight, Color_Intensity, and Roundness of fruit 2.

- KNN will use this distance when making predictions, not during training.

In [53]:
# Step 4: Train a KNN classifier
knn = KNeighborsClassifier(n_neighbors=3)  # K=3 means we look at 3 nearest neighbors
knn.fit(X_train, y_train)  # KNN stores the training data as it is.




 - What happens here?

- KNN does NOT analyze, optimize, or adjust anything.
- It simply stores these rows as they are.
- No computation happens yet.

In [55]:
knn.predict([[80,55,50]])



array([1])

In [56]:
knn.predict([[120,55,50]])



array([1])

In [57]:
knn.predict([[130,55,50]])



array([0])

In [59]:
# Step 5: Make predictions on test data
y_pred = knn.predict(X_test)

y_pred

array([0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0,
       1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0])

## Once a new fruit comes in, KNN:

- Computes distances between the new fruit and stored training data.
- Finds the K nearest neighbors.
- Uses majority voting to classify the fruit.
- That’s when KNN actually "works"—but only at prediction time!

In [61]:
# Step 6: Evaluate accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print("\nModel Accuracy:", accuracy*100,"%")




Model Accuracy: 97.5 %


In [62]:
# Step 7: Predict a new fruit
new_fruit = pd.DataFrame({'Weight': [160], 'Color_Intensity': [80], 'Roundness': [90]})
prediction = knn.predict(new_fruit)
print("\nPrediction for new fruit (Weight=160g, Color=80, Roundness=90):", "Apple" if prediction[0] == 0 else "Orange")



Prediction for new fruit (Weight=160g, Color=80, Roundness=90): Apple


### When given a new example to classify, K-Nearest Neighbors (KNN) follows these steps:

-️ Measure the distance between the new example and all training examples.
- Sort the distances in ascending order (smallest distance = closest neighbor).
️- Select the top K nearest examples from the sorted list.
- Majority voting: The new example is classified based on the most common class among the K nearest neighbors.

- KNN does this for every new example during prediction.
- It does NOT perform any computation during training—it just stores the data.