# 3.3: Naive Bayes Classification

In [1]:
import numpy as np

## Basic Probability

* Probability is in the range $[0,1]$
* $P(A)$: Probability that an event happens
    * $P(A) = \frac{\text{total positive outcomes}}{\text{total outcomes}}$
* Conditional Probability ($P(A|B)$): probability of event $A$ occurring given that event $B$ has occurred
    * Bayes Theorem: $P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{P(B|A)P(A)}{P(B)}$

> **Classification Big Picture:**
>
> Find the largest probability of a class *given* a test instance.

## Notation

* We are finding the largest $P(c_i|x)$
    * $c_i$ is a class label
    * $x$ is an instance
* Using Bayes Theorem, we can simplify to $P(c_i|x) = \frac{P(x|c_i)P(c_i)}{P(x)}$
* We can ommit $P(x)$ because this value is constant for all class labels
    * The reason this is constant is because it doesn't depend on $c_i$
* $P(c_i)$ is your prior, so $\frac{\text{number of } c_i \text{ instances}}{\text{total \# of instances}}$ (coming from your training set)
    * `get_frequencies()` will help us do this
* There are two ways to calculate $P(x|c_i)$
    1. without assuming conditional independence $\implies$ look at all attribute values $v_k$ together PLUS $c_i$
        * $P(x|c_i) = \frac{P(a_1 = v_1 \cap a_2 = v_2 \cap \cdots \cap a_n = v_n)}{P(c_i)}$ where $\vec a$ is our attribute values
        * restrictive... this will likely be low or zero
    2. assuming conditional independence $\implies$ the effects of an attribute on a class label are independent of other attributes
        * $P(x|c_i) = \prod_{k=1}^n P(v_k | c_i) = P(a_1 = v_1 | c_i) \cdot P(a_2 = v_2 | c_i) \cdot ... \cdot P(a_n = v_n | c_i)$
        * NOTE: $P(a_k = v_k | c_i) = \frac{P(a_k = v_k \cap \,class = c_i)}{P(c_i)}$

## Example

![Example 3.1](images/3-3-a.PNG)