In [34]:
%matplotlib inline

In [1]:
import numpy as np
import sympy

***
# What is Naive Bayes algorithm?

   It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

   For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these features depend on each other or upon the existence of the other features, all of these properties independently contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.

   Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

   Bayes theorem provides a way of calculating posterior probability $P(c|x)$ from $P(c)$, $P(x)$ and $P(x|c)$. Look at the equation below:


<br>

\begin{align}
{\large P(c|x) = \frac{P(x|c)P(c)}{P(x)}}
\end{align}

<br>

For a random variable $x$, $P(x)$ is a function that assigns a probability to all values of $x$.

- Probability Density of $x = P(x)$

The probability of a specific event A for a random variable x is denoted as P(x=A), or simply as P(A).

- Probability of Event $A = P(A)$

Probability is calculated as the number of desired outcomes divided by the total possible outcomes, in the case where all outcomes are equally likely.

- Probability = ${\large \frac{number-of-desired-outcomes}{total-number-of-possible-outcomes}}$

>This is intuitive if we think about a discrete random variable such as the roll of a dice. For example, the probability of a dice rolling a 5 is calculated as one outcome of rolling a 5 (1) divided by the total number of discrete outcomes (6) or 1/6 or about 0.1666 or about 16.666%.

In [12]:
# Here is a simple preview of probability calculator
def probability(desired_outcomes, total_outcomes):
    probability = desired_outcomes / total_outcomes
    
    print("The probability is:", str(round(probability*100, 2))+"%")
    return probability # Probability is a number between 0 and 1. 1 Denoting 100% and 0 denoting 0%

# Let's say we have 12 Green balls, 8 blue balls, 2 red balls and 1 yellow ball
# What is the probability of reaching and taking 1 yellow ball?

total_balls = 12 + 8 + 2 + 1
desired_balls = 1 # 

probability(desired_balls, total_balls)

The probability is: 4.35%


0.043478260869565216

The sum of the probabilities of all outcomes must equal one. If not, we do not have valid probabilities.

- Sum of the Probabilities for All Outcomes = 1.0.

The probability of an impossible outcome is zero. For example, it is impossible to roll a 7 with a standard six-sided die.

- Probability of Impossible Outcome = 0.0

The probability of a certain outcome is one. For example, it is certain that a value between 1 and 6 will occur when rolling a six-sided die.

- Probability of Certain Outcome = 1.0

The probability of an event not occurring, called the complement.

This can be calculated by one minus the probability of the event, or 1 – P(A). For example, the probability of not rolling a 5 would be 1 – P(5) or 1 – 0.166 or about 0.833 or about 83.333%.

- Probability of Not Event $A = 1 – P(A)$

## Joint, Marginal and Conditional probability

Recall that marginal probability is the probability of an event, irrespective of other random variables. If the random variable is independent, then it is the probability of the event directly, otherwise, if the variable is dependent upon other variables, then the marginal probability is the probability of the event summed over all outcomes for the dependent variables, called the sum rule.

- Marginal Probability: The probability of an event irrespective of the outcomes of other random variables, e.g. P(A).

The joint probability is the probability of two (or more) simultaneous events, often described in terms of events A and B from two dependent random variables, e.g. X and Y. The joint probability is often summarized as just the outcomes, e.g. A and B.

- Joint Probability: Probability of two (or more) simultaneous events, e.g. P(A and B) or P(A, B).

The conditional probability is the probability of one event given the occurrence of another event, often described in terms of events A and B from two dependent random variables e.g. $X$ and $Y$.

- Conditional Probability: Probability of one (or more) event given the occurrence of another event, e.g. $P(A given B)$ or $P(A | B)$.

The joint probability can be calculated using the conditional probability; for example:

- $P(A, B) = P(A | B) * P(B)$

This is called the product rule. Importantly, the joint probability is symmetrical, meaning that:

- $P(A, B) = P(B, A)$

The conditional probability can be calculated using the joint probability; for example:

- $P(A | B) = P(A, B) / P(B)$

The conditional probability is not symmetrical; for example:

- $P(A | B) != P(B | A)$