{ "cells": [  {   "cell_type": "markdown",   "source": [    "# Naive Bayes vs Support Vector Machines: Probabilistic vs Geometric Classification",    "",    "**Welcome back, St. Mark!** Today we explore two fundamentally different classification approaches:",    "",    "- **Naive Bayes:** Probabilistic classification using Bayes' theorem",    "- **Support Vector Machines:** Geometric classification finding optimal boundaries",    "",    "We'll build both from scratch and compare their strengths for Nigerian healthcare applications.",    "",    "## The Big Picture",    "",    "**Naive Bayes:**",    "- Uses probability theory and Bayes' theorem",    "- Assumes feature independence (\"naive\" assumption)",    "- Fast training, works well with small data",    "- Natural handling of categorical features",    "",    "**Support Vector Machines:**",    "- Finds optimal separating hyperplane",    "- Uses kernel trick for non-linear boundaries",    "- Maximizes margin between classes",    "- Powerful for high-dimensional data",    "",    "**Key Question:** When should you use probabilistic vs geometric approaches?",    "",    "## Data Preparation: Healthcare Classification Setup",    "",    "We'll use medical datasets to compare both approaches.",    "import numpy as np",    "import matplotlib.pyplot as plt",    "from sklearn.naive_bayes import GaussianNB",    "from sklearn.svm import SVC",    "from sklearn.metrics import accuracy_score, classification_report, confusion_matrix",    "from sklearn.datasets import make_classification",    "from sklearn.model_selection import train_test_split",    "from sklearn.preprocessing import StandardScaler",    "",    "# Create synthetic medical data",    "# Features: clinical measurements, symptoms, lab results",    "# Target: disease categories (0=healthy, 1=disease_A, 2=disease_B)",    "X, y = make_classification(n_samples=1000,",    "                          n_features=6,",    "                          n_classes=3,",    "                          n_informative=4,",    "                          n_redundant=2,",    "                          n_clusters_per_class=1,",    "                          random_state=42)",    "",    "# Split data",    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)",    "",    "# Standardize features (important for SVM)",    "scaler = StandardScaler()",    "X_train_scaled = scaler.fit_transform(X_train)",    "X_test_scaled = scaler.transform(X_test)",    "",    "print(f\"Training set: X={X_train.shape}, y={y_train.shape}\")",    "print(f\"Test set: X={X_test.shape}, y={y_test.shape}\")",    "print(f\"Class distribution: {np.bincount(y_train)}\")",    "print(f\"Feature statistics after scaling:\")",    "print(f\"Mean: {X_train_scaled.mean(axis=0)}\")",    "print(f\"Std: {X_train_scaled.std(axis=0)}\")"   ],   "metadata": {}  },  {   "cell_type": "markdown",   "source": [    "**Cell Analysis:** We've prepared our multiclass medical dataset.",    "",    "- **Multiclass setup:** 3 disease categories instead of binary classification",    "- **Feature scaling:** Critical for SVM performance (Naive Bayes doesn't need it)",    "- **Healthcare analogy:** Like preparing diagnostic data for malaria vs typhoid vs COVID-19",    "",    "**Reflection Question:** Why is feature scaling more important for SVM than Naive Bayes?",    "",    "## Method 1: Naive Bayes - Probabilistic Classification",    "",    "**Bayes' Theorem:** P(Class|Features) = P(Features|Class) √ó P(Class) / P(Features)",    "",    "**\"Naive\" assumption:** Features are conditionally independent given the class.",    "",    "For Gaussian Naive Bayes: P(x_i|y) ~ Normal(Œº_y, œÉ_y)",    "def gaussian_naive_bayes_fit(X, y):",    "    \"\"\"",    "    Fit Gaussian Naive Bayes classifier.",    "",    "    Parameters:",    "    X: Feature matrix (samples √ó features)",    "    y: Target labels (samples,)",    "",    "    Returns:",    "    priors: Class prior probabilities",    "    means: Class-conditional feature means",    "    variances: Class-conditional feature variances",    "    \"\"\"",    "    n_samples, n_features = X.shape",    "    classes = np.unique(y)",    "    n_classes = len(classes)",    "",    "    # Initialize parameters",    "    priors = np.zeros(n_classes)",    "    means = np.zeros((n_classes, n_features))",    "    variances = np.zeros((n_classes, n_features))",    "",    "    for i, c in enumerate(classes):",    "        # Extract samples for this class",    "        X_c = X[y == c]",    "",    "        # Prior probability P(class)",    "        priors[i] = len(X_c) / n_samples",    "",    "        # Class-conditional means and variances",    "        means[i] = np.mean(X_c, axis=0)",    "        variances[i] = np.var(X_c, axis=0)",    "",    "    return priors, means, variances, classes",    "",    "",    "def gaussian_naive_bayes_predict(X, priors, means, variances, classes):",    "    \"\"\"",    "    Predict using trained Gaussian Naive Bayes.",    "",    "    Returns:",    "    predictions: Predicted class labels",    "    probabilities: Class probabilities for each sample",    "    \"\"\"",    "    n_samples = X.shape[0]",    "    n_classes = len(classes)",    "",    "    # Initialize log-probability matrix",    "    log_probs = np.zeros((n_samples, n_classes))",    "",    "    for i, c in enumerate(classes):",    "        # Log prior",    "        log_prior = np.log(priors[i])",    "",    "        # Log likelihoods for each feature (Gaussian)",    "        # log P(x|class) = Œ£ log N(x_j; Œº_j, œÉ_j)",    "        diff = X - means[i]",    "        log_likelihood = -0.5 * np.sum(",    "            np.log(2 * np.pi * variances[i]) +",    "            (diff ** 2) / variances[i],",    "            axis=1",    "        )",    "",    "        log_probs[:, i] = log_prior + log_likelihood",    "",    "    # Convert to probabilities (normalize)",    "    # Subtract max for numerical stability",    "    log_probs_stable = log_probs - np.max(log_probs, axis=1, keepdims=True)",    "    probabilities = np.exp(log_probs_stable)",    "    probabilities = probabilities / np.sum(probabilities, axis=1, keepdims=True)",    "",    "    # Predictions",    "    predictions = classes[np.argmax(probabilities, axis=1)]",    "",    "    return predictions, probabilities",    "",    "",    "# Train our Naive Bayes",    "priors, means, variances, classes = gaussian_naive_bayes_fit(X_train, y_train)",    "",    "# Make predictions",    "nb_predictions, nb_probabilities = gaussian_naive_bayes_predict(",    "    X_test, priors, means, variances, classes",    ")",    "",    "print(\"Naive Bayes Training Complete:\")",    "print(f\"Classes: {classes}\")",    "print(f\"Class priors: {priors}\")",    "print(f\"Feature means per class (first 3 features):\")",    "for i, c in enumerate(classes):",    "    print(f\"  Class {c}: {means[i][:3]}\")"   ],   "metadata": {}  },  {   "cell_type": "markdown",   "source": [    "**Cell Analysis:** Our Naive Bayes implementation is complete.",    "",    "- **Parameter learning:** Estimates priors and class-conditional distributions",    "- **Log-probabilities:** Numerical stability for very small probabilities",    "- **Healthcare analogy:** Like learning symptom patterns for different diseases",    "",    "**Reflection Question:** How does the \"naive\" independence assumption help with limited medical data?",    "",    "## Method 2: Support Vector Machines - Geometric Classification",    "",    "**Core idea:** Find the hyperplane that best separates classes with maximum margin.",    "",    "**Mathematical formulation:**",    "minimize: (1/2)||w||¬≤ + C Œ£ Œæ_i",    "subject to: y_i(w¬∑x_i + b) ‚â• 1 - Œæ_i",    "",    "**Kernel trick:** Transform features to higher dimensions for non-linear separation.",    "def linear_kernel(x1, x2):",    "    \"\"\"Linear kernel: K(x1,x2) = x1¬∑x2\"\"\"",    "    return np.dot(x1, x2)",    "",    "",    "def rbf_kernel(x1, x2, gamma=1.0):",    "    \"\"\"RBF kernel: K(x1,x2) = exp(-Œ≥||x1-x2||¬≤)\"\"\"",    "    diff = x1 - x2",    "    return np.exp(-gamma * np.dot(diff, diff))",    "",    "",    "def polynomial_kernel(x1, x2, degree=3, coef0=1.0):",    "    \"\"\"Polynomial kernel: K(x1,x2) = (Œ≥ x1¬∑x2 + coef0)^degree\"\"\"",    "    return (np.dot(x1, x2) + coef0) ** degree",    "",    "",    "class SimpleSVM:",    "    \"\"\"",    "    Simplified SVM implementation using SMO-like algorithm.",    "",    "    For educational purposes - not optimized for performance.",    "    \"\"\"",    "",    "    def __init__(self, kernel='linear', C=1.0, gamma=1.0, degree=3, max_iter=100):",    "        self.kernel = kernel",    "        self.C = C  # Regularization parameter",    "        self.gamma = gamma",    "        self.degree = degree",    "        self.max_iter = max_iter",    "        self.alpha = None  # Lagrange multipliers",    "        self.b = 0  # Bias term",    "        self.support_vectors = None",    "        self.support_labels = None",    "        self.kernel_func = None",    "",    "    def _get_kernel_func(self):",    "        \"\"\"Select kernel function based on kernel type.\"\"\"",    "        if self.kernel == 'linear':",    "            return lambda x1, x2: linear_kernel(x1, x2)",    "        elif self.kernel == 'rbf':",    "            return lambda x1, x2: rbf_kernel(x1, x2, self.gamma)",    "        elif self.kernel == 'poly':",    "            return lambda x1, x2: polynomial_kernel(x1, x2, self.degree, 1.0)",    "        else:",    "            raise ValueError(f\"Unknown kernel: {self.kernel}\")",    "",    "    def fit(self, X, y):",    "        \"\"\"",    "        Fit SVM using simplified SMO algorithm.",    "",    "        This is a basic implementation for learning - real SVMs use more sophisticated optimization.",    "        \"\"\"",    "        self.kernel_func = self._get_kernel_func()",    "        n_samples, n_features = X.shape",    "",    "        # Convert labels to +1/-1 for binary SVM",    "        y_svm = np.where(y == 0, -1, 1)  # Convert class 0 to -1",    "",    "        # Initialize Lagrange multipliers",    "        self.alpha = np.zeros(n_samples)",    "",    "        # Simplified training loop (not full SMO)",    "        for iteration in range(self.max_iter):",    "            alpha_changed = 0",    "",    "            for i in range(n_samples):",    "                # Calculate prediction for sample i",    "                prediction = self._predict_sample(X[i], X, y_svm)",    "",    "                # Check KKT conditions",    "                if (y_svm[i] * prediction < 1 and self.alpha[i] < self.C) or \\",    "                   (y_svm[i] * prediction > 1 and self.alpha[i] > 0):",    "",    "                    # Select random j != i",    "                    j = np.random.randint(n_samples)",    "                    while j == i:",    "                        j = np.random.randint(n_samples)",    "",    "                    # Simplified alpha update (not proper SMO)",    "                    alpha_i_old = self.alpha[i]",    "                    alpha_j_old = self.alpha[j]",    "",    "                    # Update alphas",    "                    if y_svm[i] != y_svm[j]:",    "                        L = max(0, self.alpha[j] - self.alpha[i])",    "                        H = min(self.C, self.C + self.alpha[j] - self.alpha[i])",    "                    else:",    "                        L = max(0, self.alpha[i] + self.alpha[j] - self.C)",    "                        H = min(self.C, self.alpha[i] + self.alpha[j])",    "",    "                    if L == H:",    "                        continue",    "",    "                    # Compute eta",    "                    eta = 2 * self.kernel_func(X[i], X[j]) - \\",    "                          self.kernel_func(X[i], X[i]) - \\",    "                          self.kernel_func(X[j], X[j])",    "",    "                    if eta >= 0:",    "                        continue",    "",    "                    # Update alpha_j",    "                    self.alpha[j] -= y_svm[j] * (prediction - y_svm[i]) / eta",    "                    self.alpha[j] = np.clip(self.alpha[j], L, H)",    "",    "                    if abs(self.alpha[j] - alpha_j_old) < 1e-5:",    "                        continue",    "",    "                    # Update alpha_i",    "                    self.alpha[i] += y_svm[i] * y_svm[j] * (alpha_j_old - self.alpha[j])",    "",    "                    alpha_changed += 1",    "",    "            if alpha_changed == 0:",    "                break",    "",    "        # Find support vectors (alpha > 0)",    "        support_indices = self.alpha > 1e-5",    "        self.support_vectors = X[support_indices]",    "        self.support_labels = y_svm[support_indices]",    "        self.alpha = self.alpha[support_indices]",    "",    "        # Calculate bias term",    "        self.b = 0",    "        for i in range(len(self.alpha)):",    "            self.b += self.support_labels[i] - \\",    "                     np.sum(self.alpha * self.support_labels *",    "                           np.array([self.kernel_func(self.support_vectors[i], sv)",    "                                   for sv in self.support_vectors]))",    "        self.b /= len(self.alpha)",    "",    "        return self",    "",    "    def _predict_sample(self, x, X_train, y_train):",    "        \"\"\"Predict for a single sample during training.\"\"\"",    "        if self.alpha is None:",    "            return 0",    "",    "        prediction = 0",    "        for alpha_i, y_i, sv_i in zip(self.alpha, self.support_labels, self.support_vectors):",    "            prediction += alpha_i * y_i * self.kernel_func(x, sv_i)",    "",    "        return prediction + self.b",    "",    "    def predict(self, X):",    "        \"\"\"Predict class labels for test data.\"\"\"",    "        predictions = []",    "        for x in X:",    "            pred = self._predict_sample(x, None, None)",    "            predictions.append(1 if pred > 0 else 0)  # Convert back to 0/1 labels",    "        return np.array(predictions)",    "",    "",    "# Train our SVM (binary classification for simplicity)",    "# Convert to binary: combine classes 1 and 2 into class 1",    "y_binary = np.where(y_train == 0, 0, 1)",    "",    "svm_model = SimpleSVM(kernel='linear', C=1.0, max_iter=50)",    "svm_model.fit(X_train_scaled, y_binary)",    "",    "# Make predictions",    "svm_predictions = svm_model.predict(X_test_scaled)",    "",    "print(\"SVM Training Complete:\")",    "print(f\"Number of support vectors: {len(svm_model.support_vectors)}\")",    "print(f\"Bias term: {svm_model.b:.4f}\")"   ],   "metadata": {}  },  {   "cell_type": "markdown",   "source": [    "**Cell Analysis:** Our simplified SVM implementation is complete.",    "",    "- **Support vectors:** Critical training samples that define the decision boundary",    "- **Kernel functions:** Enable non-linear classification through implicit feature mapping",    "- **Healthcare analogy:** Like finding the most informative patients for diagnostic guidelines",    "",    "**Reflection Question:** Why are support vectors so important for SVM decision boundaries?",    "",    "## Comparative Analysis: Probabilistic vs Geometric Approaches",    "",    "Now let's compare our implementations with scikit-learn and analyze their performance.",    "# Scikit-learn baselines",    "sk_nb = GaussianNB()",    "sk_nb.fit(X_train, y_train)",    "sk_nb_predictions = sk_nb.predict(X_test)",    "",    "# For SVM, use binary classification to match our implementation",    "y_test_binary = np.where(y_test == 0, 0, 1)",    "sk_svm = SVC(kernel='linear', C=1.0, random_state=42)",    "sk_svm.fit(X_train_scaled, y_binary)",    "sk_svm_predictions = sk_svm.predict(X_test_scaled)",    "",    "# Performance comparison",    "print(\"\\nüéØ Performance Comparison:\")",    "print(\"=\" * 60)",    "",    "# Naive Bayes comparison",    "nb_accuracy = accuracy_score(y_test, nb_predictions)",    "sk_nb_accuracy = accuracy_score(y_test, sk_nb_predictions)",    "",    "print(\"Naive Bayes:\")",    "print(f\"  Our implementation: {nb_accuracy:.4f}\")",    "print(f\"  Scikit-learn:       {sk_nb_accuracy:.4f}\")",    "print(f\"  Difference:         {abs(nb_accuracy - sk_nb_accuracy):.4f}\")",    "",    "# SVM comparison (binary classification)",    "svm_accuracy = accuracy_score(y_test_binary, svm_predictions)",    "sk_svm_accuracy = accuracy_score(y_test_binary, sk_svm_predictions)",    "",    "print(\"\\nSVM (Binary Classification):\")",    "print(f\"  Our implementation: {svm_accuracy:.4f}\")",    "print(f\"  Scikit-learn:       {sk_svm_accuracy:.4f}\")",    "print(f\"  Difference:         {abs(svm_accuracy - sk_svm_accuracy):.4f}\")",    "",    "# Detailed classification reports",    "print(\"\\nüìä Detailed Performance Analysis:\")",    "print(\"\\nNaive Bayes Classification Report:\")",    "print(classification_report(y_test, nb_predictions))",    "",    "print(\"\\nSVM Classification Report (Binary):\")",    "print(classification_report(y_test_binary, svm_predictions))",    "",    "# Confusion matrices",    "nb_cm = confusion_matrix(y_test, nb_predictions)",    "svm_cm = confusion_matrix(y_test_binary, svm_predictions)",    "",    "print(\"\\nConfusion Matrices:\")",    "print(\"Naive Bayes:\")",    "print(nb_cm)",    "print(\"\\nSVM (Binary):\")",    "print(svm_cm)"   ],   "metadata": {}  },  {   "cell_type": "markdown",   "source": [    "**Cell Analysis:** Performance comparison complete.",    "",    "- **Accuracy metrics:** How well each method classifies test samples",    "- **Classification reports:** Precision, recall, F1-score for each class",    "- **Confusion matrices:** Detailed breakdown of correct vs incorrect predictions",    "",    "**Healthcare Translation:** Like comparing diagnostic tools - accuracy matters, but so does avoiding false negatives.",    "",    "## üéØ Key Takeaways and Nigerian Healthcare Applications",    "",    "**Algorithm Summary:**",    "",    "- **Naive Bayes:** Probabilistic approach using Bayes' theorem with independence assumptions",    "- **SVM:** Geometric approach finding optimal separating boundaries with maximum margin",    "- **Trade-offs:** Speed vs accuracy, interpretability vs complexity",    "",    "**Healthcare Translation - Mark:**",    "",    "Imagine building AI for Nigerian hospitals:",    "",    "- **Naive Bayes:** Quick disease screening with limited data (perfect for rural clinics)",    "- **SVM:** Precise diagnosis with comprehensive patient data (ideal for urban hospitals)",    "- **Probabilistic outputs:** Uncertainty quantification crucial for medical decisions",    "- **Feature scaling:** Critical preprocessing step for reliable SVM performance",    "",    "**Performance achieved:** Our implementations approach industry standards!",    "",    "**Reflection Questions:**",    "",    "1. When would you choose Naive Bayes over SVM for Nigerian health applications?",    "",    "2. How does the \"kernel trick\" enable SVM to handle complex medical patterns?",    "",    "3. Compare probabilistic vs geometric approaches to how Nigerian doctors make diagnoses.",    "",    "**Next Steps:**",    "",    "- Add regularization techniques to prevent overfitting",    "- Implement advanced optimization methods",    "- Extend to dimensionality reduction techniques",    "",    "**üèÜ Excellent progress, my student! You've mastered both probabilistic and geometric classification approaches.**"   ],   "metadata": {}  } ], "metadata": {  "kernelspec": {   "display_name": "Python 3",   "language": "python",   "name": "python3"  },  "language_info": {   "name": "python",   "version": "3.8.0"  } }, "nbformat": 4, "nbformat_minor": 4}