# How do we use Naive Bayes theorem to do Machine Learning?

***Repeatedly!***

Imagine we have a data in the bunch of emails, some are spam some are not spam we call (ham).

`Spam: Win money now!, Make cash easy!, Cheap Money, reply etc`

`Ham: How are you, There you are, Can I borrow money?, Say Hi to grandama, Was the exam easy?etc`

**Prior probabilities**

p(spam) = 3/8, p(ham) = 5/8

So a new email comes in says `easy money!`, we want to check if the email is spam or ham. To do this, we take it word by word. We can be more effective if we took into account the order of the word. But we wont do that for this classifier.

**Easy** - We see that it appears once in the 3 spam emails and once in the 5 ham emails

**Money** - We see it appears twice in the 3 spam emails and once in 5 ham emails.

Given this information, let us calculate some preliminary probabilities:

1) What is the probability that an email contains the word 'easy' given that it is spam? 
p('easy'|spam) = 1/3

2) What is the probability that an email contains the word 'money' given that it is spam? 
p('money'|spam) = 2/3

3) What is the probability that an email contains the word 'easy' given that it is ham? 
p('easy'|ham) = 1/5

4) What is the probability that an email contains the word 'money' given that it is ham? 
p('money'|ham) = 1/5

#### Baysian Learning: 

- Go from what is known which is p('easy'|spam) or p('money'|spam) to what is inferred which is p(spam|'easy') or p(spam|'money')

Now because the total number of emails is small, we can easily look at them to find the follwing values:

**contains easy**: Make cash easy, Was the exam easy? 1/2 are spam, 1/2 ham

**contains money**: Win money now, Cheap money, reply, Can I borrow money? 2/3 are spam, 1/3 ham

Note in real life, there are tons of emails to be processed. Thus to understand ths mathematically,


#### Probability that an email is spam given that it contains easy 


\begin{align}
\text{p(spam|'easy')} =& \frac{\text{p(spam)p('easy'|spam)}}{\text{(p(spam)p('easy'|spam)} + \text{p(ham)p('easy'|ham)) }}\\
=& \frac{(3/8 * 1/3)}{(3/8 * 1/3) + (5/8 * 1/5)} = 1/2
\end{align}


#### Probability that an email is spam given that it contains money 


\begin{align}
\text{p(spam|'money')} =& \frac{\text{p(spam)p('money'|spam)}}{\text{(p(spam)p('money'|spam)} + \text{p(ham)p('money'|ham))} }\\
=& \frac{(3/8 * 2/3)}{ ((3/8 * 2/3) + (5/8 * 1/5)) }= 2/3
\end{align}

#### Probability that an email is ham given that it contains easy


\begin{align}
\text{p(ham|'easy')} =& \frac{\text{p(ham)p('easy'|ham)}}{\text{(p(ham)p('easy'|ham)} + \text{p(spam)p('easy'|spam)) }}\\
=& \frac{(5/8 * 1/5)}{(5/8 * 1/5) + (3/8 * 1/3)} = 1/2
\end{align}


#### Probability that an email is ham given that it contains money 


\begin{align}
\text{p(ham|'money')} =& \frac{\text{p(ham)p('money'|ham)}}{\text{(p(ham)p('money'|ham)} + \text{p(spam)p('money'|spam))} }\\
=& \frac{(5/8 * 1/5)}{ ((5/8 * 1/5) + (3/8 * 2/3)) }= 1/3
\end{align}

Which aligns with our earlier observations.

#### Probability that an email is spam given that it contains money and easy

\begin{align}
\text{p(spam|'easy','money')} =& \frac{\text{p(spam)p('easy','money'|spam)}}{\text{(p(spam)p('easy','money'|spam)} + \text{p(ham)p('easy','money'|ham)) }}\\
& \frac{\text{p(spam)p('easy'|spam)p('money'|spam)}}{\text{(p(spam)p('easy'|spam)p('money'|spam)} + \text{p(ham)p('easy'|ham))p('money'|ham)) }}\\
=& \frac{(1/3 * 2/3 * 3/8)}{(1/3 * 3/8 * 2/3) + (5/8 * 1/5*1/5) }= \frac{10}{13}
\end{align}


#### Probability that an email is ham given that it contains money and easy


\begin{align}
\text{p(ham|'easy','money')} =& \frac{\text{p(ham)p('easy','money'|ham)}}{\text{(p(ham)p('easy','money'|ham)} + \text{p(spam)p('easy','money'|spam)) }}\\
& \frac{\text{p(ham)p('easy'|ham)p('money'|ham)}}{\text{(p(ham)p('easy'|ham)p('money'|ham)} + \text{p(spam)p('easy'|spam))p('money'|spam)) }}\\
=& \frac{(5/8 * 1/5 * 1/5)}{(1/3 * 3/8 * 2/3) + (5/8 * 1/5*1/5) }= \frac{3}{13}
\end{align}

### Step 4.1: Bayes Theorem implementation from scratch ###

Now that we have seen how Bayes theorem works, we shall now go into a little more detail.

In layman's terms, the Bayes theorem calculates the probability of an event occurring, based on certain other probabilities that are related to the event in question. It is composed of "prior probabilities" - or just "priors." These "priors" are the probabilities that we are aware of, or that are given to us. And Bayes theorem is also composed of the "posterior probabilities," or just "posteriors," which are the probabilities we are looking to compute using the "priors". 

Let us implement the Bayes Theorem from scratch using a simple example. Let's say we are trying to find the odds of an individual having diabetes, given that he or she was tested for it and got a positive result.  In the medical field, such probabilities play a very important role as they often deal with life and death situations. 

We assume the following:

`P(D)` is the probability of a person having Diabetes. Its value is `0.01`, or in other words, 1% of the general population has diabetes (disclaimer: these values are assumptions and are not reflective of any actual medical study).

`P(Pos)` is the probability of getting a positive test result.

`P(Neg)` is the probability of getting a negative test result.

`P(Pos|D)` is the probability of getting a positive result on a test done for detecting diabetes, given that you have diabetes. This has a value `0.9`. In other words the test is correct 90% of the time. This is also called the Sensitivity or True Positive Rate.

`P(Neg|~D)` is the probability of getting a negative result on a test done for detecting diabetes, given that you do not have diabetes. This also has a value of `0.9` and is therefore correct, 90% of the time. This is also called the Specificity or True Negative Rate.

The Bayes formula is as follows:

<img src="bayes_formula.png" height="242" width="242">

* `P(A)` is the prior probability of A occurring independently. In our example this is `P(D)`. This value is given to us.

* `P(B)` is the prior probability of B occurring independently. In our example this is `P(Pos)`.

* `P(A|B)` is the posterior probability that A occurs given B. In our example this is `P(D|Pos)`. That is, **the probability of an individual having diabetes, given that this individual got a positive test result. This is the value that we are looking to calculate.**

* `P(B|A)` is the prior probability of B occurring, given A. In our example this is `P(Pos|D)`. This value is given to us.