In [1]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

### 📦 Step 1: Import Required Libraries
- **pandas (`pd`)** → For handling tabular data (rows & columns).
- **load_iris** → Loads the Iris flower dataset (built-in in scikit-learn).
- **train_test_split** → Splits the dataset into training and testing sets.
- **KNeighborsClassifier** → The K-Nearest Neighbors algorithm for classification.
- **accuracy_score, classification_report** → Tools to measure model accuracy and show detailed classification results.

In [3]:
iris = load_iris()
X = iris.data
y = iris.target

print("Feature Names:", iris.feature_names)
print("Target Names:", iris.target_names)


Feature Names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Target Names: ['setosa' 'versicolor' 'virginica']


### 📝 Note
This cell seems to contain descriptive text **inside a code cell** instead of a Markdown cell.  
Move this explanation into a Markdown cell for better readability.

### 📂 Step 2: Load the Iris Dataset
- **`load_iris()`** loads a famous dataset of Iris flower measurements.
- **`X = iris.data`** → The feature data (sepal length, sepal width, petal length, petal width).
- **`y = iris.target`** → The labels (0 = Setosa, 1 = Versicolor, 2 = Virginica).
- Printing `feature_names` shows the names of each feature.


In [5]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

### ✂️ Step 3: Train-Test Split
We use `train_test_split` to split data:
- **Training set (80%)** → Used to train the model.
- **Testing set (20%)** → Used to check model accuracy.
- `random_state=42` → Ensures results are reproducible.


In [7]:
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)


### 🤖 Step 4: Create and Train the Model
- We create a **K-Nearest Neighbors** model with `n_neighbors=3`.
- `model.fit(X_train, y_train)` → Teaches the model patterns from the training data.


In [9]:
y_pred = model.predict(X_test)


### 📊 Step 5: Make Predictions
- `model.predict(X_test)` → Predicts the species for test data.
- The predictions are stored in `y_pred`.

In [11]:
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


Accuracy: 1.0

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



### 📈 Step 6: Evaluate the Model
- `accuracy_score(y_test, y_pred)` → Calculates how many predictions were correct.
- `classification_report()` → Shows precision, recall, f1-score for each class.

### 1️⃣ Macro Average (macro avg)
Takes the average of each metric treating all classes equally.
Does not consider how many samples each class has (support).
Good for balanced datasets where all classes have roughly the same number of examples.

## Macro Avg Precision = (Precision_class0 + Precision_class1 + Precision_class2) / 3

### 2️⃣ Weighted Average (weighted avg)
Takes the average of each metric weighted by the number of samples in each class (support).
Good for imbalanced datasets because it gives more importance to larger classes.
Formula Example for Precision:

## Weighted Avg Precision =
   (Precision_class0 * Support0 +Precision_class1 * Support1 +Precision_class2 * Support2) 
   / (Total samples)
   