[Table of Contents](00.00-Learning-ML.ipynb#Table-of-Contents) &bull; [&larr; *Chapter 2.01 - Dummy Classifiers*](02.01-Dummy-Classifiers.ipynb) &bull; [*Chapter 2.03 - ?* &rarr;](02.02-?.ipynb)

---

# 02.02 - Naive Bayes

In statistics, *Bayes theorem* describes the probability of an occurence based on input conditions. The theorem states: the probability of A given B is equal to the probability of B given A multiplied by the probability of A and divided by the probability of B, or notationally:

    P(A|B) = P(B|A) * P(A) / P(B)

`P()` means *the probability of* and `|` means *given* or "where".

A naive Bayes classifier applies this theorem naively, assuming that features (inputs into) in the model are indepedent of (unrelated to) each other.

In the previous chapter, we looked at using class probabilities to build a dummy classifier, and considered an example that 95% of loans do not default. This probability is known as a *prior* probability - it is known without knowing anything about the class inputs.


## Proof of Bayes theorem

We can prove Bayes theorem by starting with the probability of two events, A and B, occuring together.

    P(A and B) = P(A) * P(B|A)
    also
    P(A and B) = P(B) * P(A|B)

Equating the right sides of each equation:

    P(B) * P(A|B) = P(A) * P(B|A)

Divide both sides by `P(B)`, gives us Bayes theorem:

    P(A|B) = P(A) * P(B|A) / P(B)

## Example

Let's expand our dummy classifier example with some input:

| Employment | Default | Count |
|---|---|---|
| FT | N | 59 |
| FT | Y | 1 |
| PT | N | 36 |
| PT | Y | 4 |

Probability of default given full-time employment:

    P(Default=Y|Emp=FT) = P(Default=Y) * P(Emp=FT|Default=Y) / P(Emp=FT)

    = 0.05 * 0.2 / 0.6
    = 0.0167...

Probability of default given part-time employment:

    P(Default=Y|Emp=PT) = P(Default=Y) * P(Emp=PT|Default=Y) / P(Emp=PT)

    = 0.05 * 0.8 / 0.4
    = 0.1

Given just one input, for this example we can see that part time employees are almost 6 times more likely to default than their full time counterparts.

If we want to predict the class of a given employment type, we calculate the probability of all classes and take the maximum.

To extend on the above, if the employment type is FT, we know the probability of default is 0.0167.

The probability of not defaulting is:

    P(Default=N|Emp=FT) = P(Default=N) * P(Emp=FT|Default=N) / P(Emp=FT)

    = 0.95 * (59/95) / 0.6
    = 0.983...

Since there are only two classes of default (true or false), the probabilities are intuitively inverse! As you can see, a loan to a full time worker is predicted to not default.

If we are not interested in the probability and only interested in the predicted class, we can take a shortcut and not calculate the divisor `P(Emp=FT)` for both equations, as it is the same for both - it can only scale the results.

## What about multiple inputs?

We can expand Bayes theorem with even more inputs and try to improve our classifier! This is where the naive aspect comes into play. For each input, we will assume (naively) that it is unrelated to every other input. Consider the following:

| Gender | Employment | Default | Count |
|---|---|---|---|
| M | FT | N | 30 |
| M | FT | Y | 1 |
| M | PT | N | 14 |
| M | PT | Y | 3 |
| F | FT | N | 29 |
| F | FT | Y | 0 |
| F | PT | N | 22 |
| F | PT | Y | 1 |

While we won't go through the mathematical proof here, Bayes theorem is generalised for multiple inputs as:

    p(class|f1,f2,f3,...) = p(class) * p(f1|class) * p(f2|class) * p(f3|class) ...
    
Let's predict default for a full-time employed female:

    p(default=True|emp=FT,gen=F) = p(default=True) * p(emp=FT|default=True) * p(gen=F|default=True)
    = 0.05 * 0.2 * 0.2
    = 0.05
    
    p(default=False|emp=FT,gen=F) = p(default=False) * p(emp=FT|default=False) * p(gen=F|default=False)
    = 0.95 * (59/95) * (51/95)
    = 0.3167
    
Now was take the maximum of the two probabilities, and assign the corresponding class as our prediction. That is, for a full-time employed female, we predict no default.

## What about numerical inputs?

## What about multiple classes?


'b'

TODO

TODO

0.5


TODO

0.509203681473


TODO

---

[Table of Contents](00.00-Learning-ML.ipynb#Table-of-Contents) &bull; [&larr; *Chapter 2.01 - Dummy Classifiers*](02.01-Dummy-Classifiers.ipynb) &bull; [*Chapter 2.03 - ?* &rarr;](02.02-?.ipynb)