<a href="https://colab.research.google.com/github/LishaRamon/applied-ml/blob/main/HW3_Comparison_of_ML_Classification_Algorithms.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HW3: - Classification Comparison with Synthetic Data

Thought process:

1. Need to compare **6 classifiers** on **4 datasets**
3. For every dataset need to: split data → fit on train → evaluate on train and test
4. Also need to visualize the decision boundaries

The 6 classifiers are:
- Naive Bayes
- Logistic Regression
- Quadratic Discriminant Analysis(QDA)
- SVM with radial basis functions (RBF) kernel
- Decision Tree
- KNN with K=1


---

## Step 1: Importing libraries

- `numpy` and `matplotlib`
- `sklearn.datasets` for creating synthetic data
- `sklearn.model_selection` for train/test splitting
- All 6 classifier classes
- `classification_report` for evaluation

Also set a random seed so results are reproducible during every run

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap

# data generating
from sklearn.datasets import make_blobs, make_circles, make_moons

# split data into train/test
from sklearn.model_selection import train_test_split

# import 6 classifers
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

# for evaluation
from sklearn.metrics import classification_report

# setting random seed for consistent reproducibility
SEED = 42
np.random.seed(SEED)

## Step 2: Define Classifers

Each dataset needs a fresh set of classifiers so nothing carries over from a previous run. I'll wrap them in a function and call it once per dataset

In [4]:
def get_classifiers():
    #returns a fresh dict for all 6 classifiers
    return {
        'Naive Bayes':    GaussianNB(),
        'Logistic Reg':   LogisticRegression(max_iter=1000, random_state=SEED),
        'QDA':            QuadraticDiscriminantAnalysis(),
        'SVM (RBF)':      SVC(kernel='rbf', random_state=SEED),
        'Decision Tree':  DecisionTreeClassifier(random_state=SEED),
        'KNN (K=1)':      KNeighborsClassifier(n_neighbors=1),
    }

## Step 3: Decision Boundary Plot Function

Since the data is 2D, I can visualize each classifier's decision boundary by predicting the class for every point on a mesh grid, coloring the regions, then overlaying the actual data points on top

In [7]:
def plot_decision_boundaries(X, y, fitted_classifiers, dataset_name):
    """ Plots the decision boundary for each fitted classifier over the 2D data
    Param:
        x: feature array
        y: label array
        fitted_classifiers: dict of model type
        dataset_name: plot title"""

    # decision region(bkgd) colors + data points
    cmap_bg     = ListedColormap(['#FFAAAA', '#AAAAFF'])  # light red/blue regions
    cmap_points = ListedColormap(['#CC0000', '#0000CC'])  # dark red/blue dots

    # mesh grid over feature space
    h = 0.05  # step size (smaller = finer grid;slower to render)
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(
        np.arange(x_min, x_max, h),
        np.arange(y_min, y_max, h)
    )

    # 1 subplot per classifier= shape( 2 rows x 3 columns)
    fig, axes = plt.subplots(2, 3, figsize=(16, 9))
    axes = axes.flatten()
    fig.suptitle(f'Decision Boundaries — {dataset_name}', fontsize=14, fontweight='bold')

#loop picks next subplot+classifer together
    for ax, (name, clf) in zip(axes, fitted_classifiers.items()): #
        # predict class for every grid point — fills in the colored regions
        Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) #asks classifier to predict class for each point and flatten the 2D grid into a long list of x and y coordinates
        Z = Z.reshape(xx.shape) #puts predictions into 2D grid shapefor color

        # fill decision area w color
        ax.contourf(xx, yy, Z, cmap=cmap_bg, alpha=0.5)

        # plot points on top
        ax.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_points,
                   edgecolors='k', s=25, linewidth=0.4)

        ax.set_title(name, fontsize=11)
        ax.set_xlim(xx.min(), xx.max())
        ax.set_ylim(yy.min(), yy.max())
        ax.set_xticks([])
        ax.set_yticks([])

    plt.tight_layout()
    plt.show()