# Naive Bayes Algorithm (Classification)

## Summary

* The **Naive Bayes algorithm** is a machine learning technique utilized for both binary and multi-class classification.


* The algorithm is fundamentally rooted in core **probability concepts**, particularly the distinction between **independent** and **dependent events**.


* **Bayes' Theorem** is derived using the logic of **conditional probability**, which evaluates the likelihood of an outcome based on a previous occurrence.


* When applied to machine learning, the algorithm calculates probabilities to predict a **dependent feature** (the output) based on multiple given **independent features**.


* The denominator in the Bayes equation remains constant across all output classes for a given dataset, so it is mathematically ignored during the final prediction phase to simplify calculations.



---

## Probability Concepts

### Independent Events

**Independent events** occur when one outcome does not influence or alter the probability of another outcome. A classic example is rolling a standard six-sided dice. The probability of rolling a specific number, such as a 1, will always be 1/6. Subsequent rolls do not change this probability; rolling a 2 or a 3 will similarly always maintain a 1/6 chance.

### Dependent Events

**Dependent events** describe scenarios where the outcome of the first event directly changes the probability of subsequent events. For instance, consider drawing marbles from a bag containing 3 orange marbles and 2 yellow marbles. The initial probability of drawing an orange marble is 3/5. Once that orange marble is removed, only 4 marbles remain in the bag. Consequently, the probability of drawing a yellow marble next shifts to 2/4 (or 1/2).

### Conditional Probability

**Conditional probability** represents the likelihood of an event occurring given that another specific event has already taken place. This is denoted mathematically as $P(B|A)$. The joint probability of both events occurring consecutively can be calculated using the formula: 

$$P(A \text{ and } B) = P(A) * P(B|A)$$

.

---

## Deriving Bayes' Theorem

Because of the principles of conditional probability, the joint probability of two events can be written in two mathematically equivalent ways: $P(A \text{ and } B) = P(A) * P(B|A)$, and conversely, $P(B \text{ and } A) = P(B) * P(A|B)$.

Equating these two formulas forms the foundation of **Bayes' Theorem**:

$$P(A|B) = \frac{P(A) * P(B|A)}{P(B)}$$



* **$P(A|B)$**: The probability of event A given B has occurred.


* **$P(A)$**: The prior probability of event A.


* **$P(B)$**: The prior probability of event B.


* **$P(B|A)$**: The probability of event B given A has occurred.



---

## Naive Bayes in Machine Learning

In classification problems, the goal is to predict a **dependent feature** ($y$) based on the input of multiple **independent features** ($x_1, x_2, x_3$). To adapt Bayes' Theorem for this purpose, the formula is expanded to handle multiple variables:

$$P(y|x_1, x_2, x_3) = \frac{P(y) * P(x_1, x_2, x_3|y)}{P(x_1, x_2, x_3)}$$



To calculate the numerator, the algorithm makes a "naive" assumption that all independent features contribute independently to the output. It is expanded to: $P(y) * P(x_1|y) * P(x_2|y) * P(x_3|y)$.

Because the denominator—$P(x_1) * P(x_2) * P(x_3)$—will always yield the exact same constant value regardless of whether the target $y$ is true or false, it can be safely removed from the calculation when comparing final output probabilities.

---

## Example: Play Tennis Dataset

To demonstrate how the Naive Bayes algorithm operates on real data, we can manually process a prediction using a sample dataset regarding whether individuals will play tennis. The objective is to calculate the probability of playing tennis ("Yes" or "No") given a new test data point where **Outlook = Sunny** and **Temperature = Hot**.

* **Step 1 - Calculate Prior Output Probabilities**: Out of the 14 total records in the dataset, there are 9 "Yes" outcomes and 5 "No" outcomes. Therefore, $P(\text{Yes})$ is 9/14, and $P(\text{No})$ is 5/14.


* **Step 2 - Calculate Conditional Feature Probabilities**: Looking at the dataset for the "Sunny" feature, $P(\text{Sunny}|\text{Yes})$ is 2/9, and $P(\text{Sunny}|\text{No})$ is 3/5. For the "Hot" feature, $P(\text{Hot}|\text{Yes})$ is 2/9, and $P(\text{Hot}|\text{No})$ is 2/5.


* **Step 3 - Compute Proportional Outcomes**: For a "Yes" outcome, multiply the relevant probabilities: $(9/14) * (2/9) * (2/9)$, which equals roughly 0.031. For a "No" outcome, the calculation is $(5/14) * (3/5) * (2/5)$, which equals 0.085.


* **Step 4 - Normalize to Percentages**: To convert these proportional figures into standardized percentages, divide each by their sum. The final probability for "Yes" is $0.031 / (0.031 + 0.085)$, equating to roughly 27%. The probability for "No" is $0.085 / (0.031 + 0.085)$, equating to 73%.


* **Final Prediction**: Because 73% is the clear mathematical majority, the algorithm confidently predicts that the individual is **not going to play tennis** under Sunny and Hot conditions.