# Naive Bayesian Network 

## Introduction
Let's classify some spam emails!




## Mathematics
### Definitions

Let's define $N$ to be the number of total emails we have in the dataset and $N_{s}$ to be the number of spam emails in the email set.

$N_{so}$ is the number of spam emails that contain the word "offer"

$N_{o}$ is the number of emails that contain the word "offer"

Then the probability of having a spam email in the set is said to be:

$$
P(SPAM=1) = \frac{N_{s}}{N}
$$ 

And the probability of having an email that contains the word *offer* is:

$$
P(OFFER=1) = \frac{N_{o}}{N}
$$

Finally, the conditional probability of an email being a spam email given that it contains the word *offer*:

$$
P(SPAM=1\mid OFFER=1) := \frac{N_{so}}{N_{o}}
$$

### Postulate

If the probability of finding the word *offer* given that it's a spam email is higher than that of finding the word *offer* in a non-spam email:

$$
P(OFFER =1 \mid SPAM=1)  > P(OFFER = 1 \mid SPAM=0)
$$

then we can infer that:

$$
P(SPAM=1 \mid OFFER=1) > P(SPAM = 1)
$$

### Proof

$$
P(SPAM=1 \mid OFFER=1) = \frac{P(OFFER=1 \mid SPAM=1)P(SPAM=1)}{P(OFFER=1)} = \frac{\frac{N_{so}}{N_{s}}\frac{N_{s}}{N}}{\frac{N_{o}}{N}} = \frac{N_{so}}{N_{o}}
$$


This is known as the **Bayes' rule**, famously stated as $P(A \mid B)=\frac{P(B \mid A)P(A)}{P(B)}$

$$
P(SPAM=0 \mid OFFER=1) = \frac{P(OFFER=1 \mid SPAM=0)P(SPAM=0)}{P(OFFER=1)} \\
P(SPAM=1 \mid OFFER=1) = \frac{P(OFFER=1 \mid SPAM=1)P(SPAM=1)}{P(OFFER=1)}
$$



For abbreviation, let's define that:

$$
P(SPAM=1) := P(S) \\
P(OFFER=1 \mid SPAM=1) := P(O \mid S) \\
P(OFFER=1 \mid SPAM=0) := P(O \mid S_{c}) \\
P(SPAM=1 \mid OFFER=1):= P(S \mid O)
$$

Begin with

$$
P(O \mid S) > P(O \mid S_{c})
$$

Rewrite them using **Bayes' rule**:

$$
\frac{P(S \mid O) P(O)}{P(S)} > \frac{P(S_{c} \mid O)P(O)}{P(S_{c})}
$$

The $P(O)$ terms cancel out each other:

$$
\frac{P(S \mid O)}{P(S)} > \frac{P(S_{c} \mid O)}{P(S_{c})}
$$

By definition, we can rewrite the right hand side as the following:

$$
\frac{P(S \mid O)}{P(S)} > \frac{1 - P(S \mid O)}{1 - P(S)}
$$

Re-organize the terms:

$$
\frac{1 - P(S)}{P(S)} > \frac{1 - P(S \mid O)}{P(S \mid O)}
$$

Then we can easily see that:

$$
\frac{1}{P(S)} - 1 > \frac{1}{P(S \mid O)} - 1 \\
\frac{1}{P(S)} > \frac{1}{P(S \mid O)} \\
$$

**Q.E.D.**
$$
P(S \mid O) > P(S)
$$

In [2]:
from naive_bayes.email_set import EmailSet
from naive_bayes.email_set import build_and_save_email_set
from naive_bayes.feature_prob import FeatureProbability

# If you haven't pickled it, then run 
build_and_save_email_set()

es = EmailSet.get()
fp = FeatureProbability.from_email_set(es)

code = es.word_encoding_dictionary.word_to_code("offer")
print "Code: %s" % code
print "Ham count: %s" % fp.class_count.ham_count
print "Spam count: %s" % fp.class_count.spam_count
print "Code count: %s" % fp.code_count[code]
print "Prob ratio: %s" % fp.code_prob_ratio(code)

Dataset already processed!
Code: 3751
Ham count: 3672
Spam count: 1500
Code count: {'spam_count': 141, 'ham_count': 61}
Prob ratio: 5.65849180328
