## üßÆ Exercise 3 ‚Äì Na√Øve Bayes Classifier (Iris Dataset)

### üéØ Objective
In this exercise, we implement a **Na√Øve Bayes classifier** using the **Iris dataset**, applying probability theory to predict the class of each example.
The Na√Øve Bayes method assumes that all features are independent given the class, allowing for efficient computation of conditional probabilities.

### üß© Tasks Overview
1. **Data Preparation**
   - Load the Iris dataset and discretize all numerical columns into three categories: **low**, **medium**, and **high**.
   - Split the dataset randomly into 70 % training and 30 % testing subsets.

2. **Na√Øve Bayes Implementation**
   - Estimate prior probabilities *P(Class)* for each class.
   - Estimate conditional probabilities *P(X·µ¢ | Class)* from the training data.
   - For each test instance, compute *P(Class | X)* using Bayes‚Äô theorem and choose the class with the highest probability.

3. **Evaluation**
   - Repeat 30 random train/test splits.
   - Compute accuracy and display a confusion matrix for one representative run.

4. **Comparison**
   - Compare performance with the k-NN classifier (Exercise 2) and discuss advantages/disadvantages.

### ‚öôÔ∏è Tools Used
- **pandas** for data manipulation
- **numpy** for numeric calculations
- **matplotlib** for visualizations
- **sklearn.metrics** for evaluation metrics (accuracy, confusion matrix)

This implementation is written manually to reinforce understanding of probabilistic reasoning before using automated libraries.


üß© Cell 1 ‚Äî Imports & basic setup
- pandas and numpy for data handling
- matplotlib.pyplot for visualizations
- sklearn.datasets for loading the Iris dataset
- sklearn.model_selection for train/test split
- sklearn.metrics for accuracy and confusion matrix

In [None]:
# Import libraries used throughout the experiment
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score

üìù Cell 2 ‚Äì Load and Display the Iris Dataset
- Load the Iris dataset and convert it into a pandas DataFrame for easier handling

In [None]:
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target, name='species')
df = pd.concat([X, y], axis=1)
df.head()

üìù Cell 3 ‚Äì Discretize Continuous Features
- Discretize each numerical feature into three categories: low, medium, high.
- We use pandas 'qcut' to split the distribution into three equal-frequency bins.

In [None]:
def discretize_features(df):
    discretized = df.copy()
    for col in iris.feature_names:
        discretized[col] = pd.qcut(df[col], q=3, labels=["low", "medium", "high"])
    return discretized

df_disc = discretize_features(df)
df_disc.head()


üìù Cell 4 ‚Äì Split Dataset into Train/Test Sets
#### Randomly split the discretized dataset into training (70%) and testing (30%) subsets.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    df_disc.iloc[:, :-1], df_disc['species'], test_size=0.3, stratify=df_disc['species']
)


üìù Cell 5 ‚Äì Estimate Prior and Conditional Probabilities
#### Build dictionaries storing:
1. Prior probabilities for each class (P(Class))
2. Conditional probabilities for each feature value given a class (P(Xi | Class))

#### Laplace smoothing is applied (+1) to avoid zero probabilities.


In [None]:
def train_naive_bayes(X_train, y_train):
    classes = np.unique(y_train)
    priors = {c: (y_train == c).mean() for c in classes}
    cond_probs = {}

    for c in classes:
        X_c = X_train[y_train == c]
        cond_probs[c] = {}
        for col in X_train.columns:
            value_counts = X_c[col].value_counts()
            total = value_counts.sum()
            probs = {val: (value_counts.get(val, 0) + 1) / (total + 3) for val in ["low", "medium", "high"]}
            cond_probs[c][col] = probs

    return priors, cond_probs


üìù Cell 6 ‚Äì Na√Øve Bayes Prediction Function
#### For a single sample x, compute the posterior probability for each class:
-   P(Class|X) ‚àù P(Class) √ó Œ†_i P(X_i|Class)
- Return the class with the highest posterior probability.

In [None]:
def predict_naive_bayes(x, priors, cond_probs):
    posteriors = {}
    for c in priors.keys():
        prob = np.log(priors[c])  # use log to avoid underflow
        for feature, value in x.items():
            prob += np.log(cond_probs[c][feature].get(value, 1e-6))
        posteriors[c] = prob
    return max(posteriors, key=posteriors.get)


üìù Cell 7 ‚Äì Confusion Matrix for Each k
- This cell runs one representative trial for each k and prints its confusion matrix.
- The confusion matrix shows how well the classifier distinguishes between classes.


In [None]:
for k in ks:
    acc, y_true, y_pred = evaluate_knn(df, k)
    print(f"\nConfusion Matrix for k={k} (accuracy={acc:.3f})")
    cm = confusion_matrix(y_true, y_pred)
    print(cm)
    plt.figure()
    plt.imshow(cm, cmap='Blues', interpolation='nearest')
    plt.title(f"Confusion Matrix (k={k})")
    plt.colorbar()
    plt.xlabel("Predicted")
    plt.ylabel("True")
    plt.show()


#### üß© Why Should k Be an Odd Number?

When the number of classes is even, using an **odd value of k** helps to **break ties** during majority voting.
If k were even (e.g., k = 4) and two neighbors belong to class A while two belong to class B,
the classifier would have no clear majority and would need a tie-breaking rule.
Choosing an odd k ensures that one class always receives more votes than the others,
resulting in a deterministic prediction.


### ‚úÖ Discussion and Conclusions

#### üßæ Summary of Results
- As *k* increases, the classifier becomes smoother (less variance) but may lose sensitivity to local patterns.
- Lower values of *k* (like 3) can achieve slightly higher accuracy but may be more affected by noise.
- The overall performance across 30 runs is consistent and high (> 0.9 accuracy for all tested *k* values).

#### üìä Observations
- The boxplots confirm that different *k* values produce similar accuracy, with small variability.
- Confusion matrices show clear class separations for **Iris-setosa** and small confusion between **versicolor** and **virginica**.

#### üîç Conclusions
1. The k-NN algorithm is simple yet powerful for small, well-structured datasets like Iris.
2. Proper choice of *k* controls the trade-off between overfitting (low *k*) and underfitting (high *k*).
3. Repeated random splits provide a more reliable estimate of model performance.

#### üöÄ Future Improvements
- Apply feature scaling (normalization) to improve distance computation consistency.
- Use cross-validation instead of random splitting for more robust accuracy estimation.
- Extend the experiment to multi-dimensional datasets or higher values of *k* to study asymptotic behavior.

This exercise demonstrates the intuition behind **instance-based learning** and how the choice of neighborhood size affects classification performance.
