# Bayesian Classifiers

**Example (1) Classifying for Categorical Features**

Consider the following frequency table for the Iris data set with $N = 150$, and, specifically, two features: Sepal Length $= X_1$ and Sepal Width $= X_2$.

All $150$ observations fall into one of two classes: $c_1$ and $c_2$. The prior probabilities for the classes are:
- $P(c_1) = 50/150 = 0.333$
- $P(c_2) = 100/150 = 0.67$

For some observation $\mathbf{x} = (5.3,3.0)^T$, where $5.3 \in X_1$ and $3.0 \in X_2$, we can find the class to which $x$ belons as follows.

1. Find the $X_1$ bin in which $5.3$ falls.
2. Find the $X_2$ bin in which $3.0$ falls.
3. Note down the prior probability $P(c_i)$ for the class
4. Lookup the likelihood $P(\mathbf{x} | c_i)$ (i.e, joint probabilities of $X_1$ and $X_2$ ) in the frequency table.
5. Calculate the posterior probability $f(c_i | \mathbf{x}) \propto P(\mathbf{x} | c_i) \times P(c_i)$
6. Repeat step (1-5) for all classes
7. Compare posteriors for all classes, $c_i$. The class with the highest posterior probability is the class to which $\mathbf{x}$ belongs.

**$X_1 = $ Sepal Length**
| Bins | Type |
|-|-|
| $[4.3, 5.2]$ | Very Short |
| $(5.2, 6.1]$ | Short |
| $(6.1, 7.0]$ | Long |
| $(7.0, 7.9]$ | Very long |

<br/>

**$X_2 = $ Sepal Width**
| Bins | Type |
|-|-|
| $[2.0, 2.8]$ | Short |
| $(2.8, 3.6]$ | Medium |
| $(3.6, 4.4]$ | Long |

<br/>

**Class 1**
| | $X_{2 \text{, short}}$ | $X_{2 \text{, medium}}$ | $X_{2 \text{, long}}$ | $X_{1, \text{total}}$ |
|-|-|-|-|-|
|$X_{1 \text{, very short}}$ | 1/50 | 33/50 | 5/50 | 39/50 |
|$X_{1 \text{, short}}$ | 0 | 3/50 | 8/50 | 11/50 |
|$X_{1 \text{, long}}$ |0 | 0 | 0 | 0 |
|$X_{1 \text{, very long}}$ | 0 | 0 | 0 | 0 |
|$X_{2, \text{total}}$| 1/50 | 36/50 | 13/50 | 50/50 |

<br/>

**Class 2**
| | $X_{2 \text{, short}}$ | $X_{2 \text{, medium}}$ | $X_{2 \text{, long}}$ | **$X_{1, \text{total}}$** |
|-|-|-|-|-|
|$X_{1 \text{, very short}}$ | 6/100 | 0 | 0 | 6/100 |
|$X_{1 \text{, short}}$ | 24/100 | 15/100 | 0 | 39/100 |
|$X_{1 \text{, long}}$ | 13/100 | 30/100 | 0 | 43/100 |
|$X_{1 \text{, very long}}$ | 3/100 | 7/100 | 2/100 | 12/100 |
| **$X_{2, \text{total}}$** | 46/100 | 52/100 | 2/100 | 100/100 |

**Finding $f(c_1 | \mathbf{x})$**

$$f(c_1 | \mathbf{x}) \propto P(\mathbf{x} | c_1) \times P(c_1) = 0.06 \times 0.33 = 0.0198$$

where
- $P(c_1) = 50/150 = 0.333$
- $P(\mathbf{x} | c_1) = 3/50 = 0.06$

<br/><br/>

**Example (2) Classifying for Continuous Features**

Given the following dataset, use the Bayesian Rule to classify the observation

- $P(c_1 | \mathbf{x}) \propto p(\mathbf{x}|c) \times p(c_1)$
- $\mathbf{x} = (5.3, 3.0)$

where
- $n$ is sample size

- $\hat{P}(c_1) = \dfrac{n_1}{n} = 50/150$ is the prior probability for $c_1$

- $\hat{P}(c_2) = \dfrac{n_2}{n} = 100/150$  is the prior probability for $c_2$

The PDF of a multivariate gaussian is given by
$$f(\mathbf{x}) = \dfrac{1}{\sqrt{(2\pi)^k \det(\Sigma)}} \exp \left[-\dfrac{1}{2} (\mathbf{x} - \mathbf{\mu})^T \: \Sigma^{-1} (\mathbf{x} - \mathbf{\mu})\right]$$

In [19]:
import numpy as np
from numpy.linalg import inv

import numpy as np
def f(x, mu, sigma):
    """
    Compute the probability density function (PDF) of a multivariate Gaussian distribution.

    Args:
        x (array-like): Observation feature(s)
        mu (array-like): Point of center of class distribution
        covariance (array-like): Covariance matrix of the distribution.

    Returns:
        pdf (float or ndarray): Probability density value(s) for the given data point(s).
    """
    k = len(mu)
    det_cov = np.linalg.det(sigma)
    sigma_inv = np.linalg.inv(sigma)
    x_mu = x - mu
    exponent = -(1/2) * np.dot(np.dot(x_mu.T, sigma_inv), x_mu)
    coefficient = 1 / np.sqrt((2 * np.pi) ** k * det_cov)
    pdf = coefficient * np.exp(exponent)
    return pdf


# Given
x = np.array([ 6.75,4.25]).T


# Class 1 Parameters
mu_1 = np.array([ 5.006, 3.418 ])

sigma_1 = np.array(
    [
        [0.1218, 0.0983],
        [0.0983, 0.1423]
    ]
)

sigma_1_inv = inv(sigma_1)

prior_1 = 0.333
likeihood_1 = f(x, mu_1, sigma_1)
posterior_1 = likeihood_1 * prior_1




# Class 2 Parameters
mu_2 = np.array([ 5.006, 3.418 ])

sigma_2 = np.array(
    [
        [0.435, 0.1209],
        [0.1209, 0.435]
    ]
)

sigma_2_inv = inv(sigma_2)

prior_2 = 0.67
likeihood_2 = f(x, mu_2, sigma_2)
posterior_2 = likeihood_2 * prior_2


# Classification Result
print(f"class 1 probability: {posterior_1}")
print(f"class 2 probability: {posterior_2}")

if posterior_2 > posterior_1:
    print(f"{x} is class 2")
else:
    print(f"{x} is class 1")
    

class 1 probability: 1.6486729528270847e-07
class 2 probability: 0.00665754843814572
[6.75 4.25] is class 2


In [20]:
# Using scipy for calculating multivariate gaussian probability
import scipy.stats as stats
likeihood = stats.multivariate_normal.pdf(x, mu_1, sigma_1)
prior = 0.333

posterior = likeihood * prior
posterior

1.6486729528271024e-07