## First, Conditional Probability & Bayes' Rule

Before someone can understand and appreciate the nuances of Naive Bayes', they need to know a couple of related concepts first, namely, the idea of Conditional Probability, and Bayes' Rule. **Conditional Probability** in plain English: What is the probability that something will happen, given that something else has already happened.

Let's say that there is some **Outcome O**. And some **Evidence E**. From the way these probabilities are defined: 
> The Probability of having both the Outcome O and Evidence E is: (Probability of O occurring) multiplied by the (Prob of E given that O happened)

One Example to understand Conditional Probability:

Let say we have a collection of US Senators. Senators could be Democrats or Republicans. They are also either male or female.

If we select one senator completely randomly, what is the probability that this person is a female Democrat? Conditional Probability can help us answer that.

> Probability of (Democrat and Female Senator) = Prob(Senator is Democrat) multiplied by Conditional Probability of Being Female given that they are a Democrat.

<pre>
P(Democrat & Female) = P(Democrat) * P(Female | Democrat) 
</pre>

We could compute the exact same thing, the reverse way:

<pre>
P(Democrat & Female) = P(Female) * P(Democrat | Female)
</pre>

## Understanding Bayes Rule

Conceptually, this is a way to go from 
   > P(Evidence| Known Outcome) to P(Outcome|Known Evidence).

Often, we know how frequently some particular evidence is observed, given a known outcome. We have to use this known fact to compute the reverse, to compute the chance of that outcome happening, given the evidence.

> P(Outcome given that we know some Evidence) = P(Evidence given that we know the Outcome) times Prob(Outcome), scaled by the P(Evidence)

The classic example to understand Bayes' Rule:
<pre>
Probability of Disease D given Test-positive =        Prob(Test is positive|Disease) * P(Disease)
                                               _______________________________________________________________
                                               (scaled by) Prob(Testing Positive, with or without the disease)        
                                                        
</pre>             

## Getting to Naive Bayes'

So far, we have talked only about one piece of evidence. In reality, we have to predict an outcome given **multiple evidence**. In that case, the math gets very complicated. To get around that complication, one approach is to 'uncouple' multiple pieces of evidence, and to treat each of piece of evidence as independent. This approach is why this is called naive Bayes.

<pre>

                      P(Likelihood of Evidence) * Prior prob of outcome
P(outcome|evidence) = _________________________________________________
                                         P(Evidence)
</pre>

## Fruit Example

Let's try it out on an example to increase our understanding: The OP asked for a 'fruit' identification example.

Let's say that we have data on 1000 pieces of fruit. They happen to be **Banana, Orange** or some **Other Fruit**. We know 3 characteristics about each fruit:

1. Whether it is Long
2. Whether it is Sweet and
3. If its color is Yellow.

Berikut adalah data yang diberikan:

| Type | * | Long | Not Long | * | Sweet | Not Sweet | * | Yellow | Not Yellow | * | TOTAL | 
| :--: | :-: | :--: | :--: | :-: | :--: | :--: | :-: | :--: | :--: | :-: | :---: |
| Banana | * | 400 | 100 | * | 350 | 150 | * | 450 | 50 | * | 500 |
| Orange | * | 0 | 300 | * | 150 | 150 | * | 300 | 0 | * | 300 |
| Other Fruit | * | 100 | 100 | * | 150 | 50 | * | 50 | 150 | * | 200 |
| **TOTAL** | * | **500** | **500** | * | **650** | **350** | * | **800** | **200** | * | **1000** |

Kemudian kita menghitung **Prior** probabilities.

In [4]:
# P(Banana)
P_Banana = None

# P(Orange)
P_Orange = None

# P(Other Fruit)
P_Other_Fruit = None


Probability of **Evidence**.

In [5]:
# P(Long)
P_Long = None

# P(Sweet)
P_Sweet = None

# P(Yellow)
P_Yellow = None

Probability of **Likelihood**

In [7]:
# P(Long|Banana)
P_Long_Banana = None
# P(Long|Orange)
P_Long_Orange = None
# P(Long|Other Fruit)
P_Long_Other = None

# P(Sweet|Banana)
P_Sweet_Banana = None
# P(Sweet|Orange)
P_Sweet_Orange = None
# P(Sweet|Other Fruit)
P_Sweet_Other = None

# P(Yellow|Banana)
P_Yellow_Banana = None
# P(Yellow|Orange)
P_Yellow_Orange = None
# P(Yellow|Other Fruit)
P_Yellow_Other = None

## Given a Fruit, how to classify it?

Let's say that we are given the properties of an unknown fruit, and asked to classify it. We are told that the fruit is Long, Sweet and Yellow. Is it a Banana? Is it an Orange? Or Is it some Other Fruit?

We can simply run the numbers for each of the 3 outcomes, one by one. Then we choose the highest probability and 'classify' our unknown fruit as belonging to the class that had the highest probability based on our prior evidence (our 1000 fruit training set):

In [9]:
# P(Banana| Long, Sweet, Yellow)

# P(Orange| Long, Sweet, Yellow)

# P(Other fruit| Long, Sweet, Yellow)