# Naive Bayes Classifier

## Understanding Bayes Theorem

Before diving into the Naive Bayes algorithm, it's essential to understand the basics of Naive Bayes theorem.

### Probability

Probability is a way to measure how likely an event is to happen. It ranges from 0 (impossible) to 1 (certain). The formula to calculate probability is:

$$
P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes}}
$$

### Dependent and Independent Variables

When dealing with events, they can either be dependent or independent:

- **Independent Variables:** These are variables where the outcome of one doesn’t affect the other. For example, flipping a coin and rolling a die.
  
- **Dependent Variables:** Dependent variables mean the outcome of one event does affect the other. For example, drawing two children from a group without replacing the first.

### Bayes Theorem

The Bayes theorem is used to calculate the conditional probability of an event occuring when another event has already occured.

Mathematically, it is represented as:  
$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

Where:
- P(A|B) is the probability of event A given that B has occured.
- P(B|A) is the probability of event B given that A has occured.
- P(A) is the probability of event A happening, independent of B.
- P(B) is the probability of event B happening, independent of A.

#### Example:

Let’s say we want to calculate the probability that it rains given that the sky is cloudy.

We know the following:

- P(A): The probability that it rains on any given day is 30%, so P(A) = 0.3.
- P(B|A): The probability that the sky is cloudy when it's raining is 80%, so P(B|A) = 0.8.
- P(B): The probability that the sky is cloudy, whether it's raining or not, is 50%, so P(B) = 0.5.

Now, to calculate the probability that it rains when the sky is cloudy, i.e. P(A|B), we use Bayes theorem.

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

Substituting the values:

$$
P(A|B) = \frac{0.8 \cdot 0.3}{0.5} = \frac{0.24}{0.5} = 0.48
$$

So, there’s a 48% chance of rain given that the sky is cloudy.

---

## Naive Bayes Calssifier 

Naive Bayes Classifier is based on Bayes Theorem. In Naive Bayes, we categorize the output into yes or no/ span or not spam based on several features. Here, we assume that these several features are independent of each other. 

Unlike the examples taken above to understand Bayes Theorem where a single event is dependent on one or more features, Naive Bayes assumes that multiple features, that are independent of each other, individually contribute to the overall probability of an event.

Even though there is a naive assumption of independence, Naive Bayes performs very well in real-world tasks like text classification and span detection.

Let us assume we have an event whose probability depends on $X_1, X_2, \dots, X_n$ features.  The probability of that event occurring (yes) and not occurring (no) can be given by applying Bayes' Theorem.

$$
P(\text{Yes} | X_1, X_2, \dots, X_n) = \frac{P(X_1, X_2, \dots, X_n | \text{Yes}) \cdot P(\text{Yes})}{P(X_1, X_2, \dots, X_n)}
$$

$$
P(\text{No} | X_1, X_2, \dots, X_n) = \frac{P(X_1, X_2, \dots, X_n | \text{No}) \cdot P(\text{No})}{P(X_1, X_2, \dots, X_n)}
$$

Here, the denominator $P(X_1, X_2, \dots, X_n)$ is constant in both "Yes" and "No" cases and hence it is ignored.

$$
P(\text{Yes} | X_1, X_2, \dots, X_n) = P(X_1, X_2, \dots, X_n | \text{Yes}) \cdot P(\text{Yes})
$$

$$
P(\text{No} | X_1, X_2, \dots, X_n) = P(X_1, X_2, \dots, X_n | \text{No}) \cdot P(\text{No})
$$

Since we know that the features are independent, the above formulae can be written as:

$$
P(\text{Yes} | X_1, X_2, \dots, X_n) = P(X_1 | \text{Yes}) \cdot P(X_2 | \text{Yes}) \cdot \dots \cdot P(X_n | \text{Yes}) \cdot P(\text{Yes}) 
$$

$$
P(\text{No} | X_1, X_2, \dots, X_n) = P(X_1 | \text{No}) \cdot P(X_2 | \text{No}) \cdot \dots \cdot P(X_n | \text{No}) \cdot P(\text{No})
$$

### Naive Bayes Classifier Example: Predicting Fever

Imagine we want to predict if a person will have a fever based on whether they have COVID and Flu. We can use the Naive Bayes Classifier for this.

#### Problem

We are trying to determine the likelihood of a person having fever or no fever based on two factors:
1. Does the person have COVID?
2. Does the person have the Flu?

Our goal is to find the probability of fever or no fever based on these conditions.

#### Applying Bayes' Theorem

We want to calculate the probability that a person has fever given they have both COVID and Flu. Using Bayes' Theorem, we calculate the following:

- Fever: 
   $$
   P(\text{Fever} | \text{COVID}, \text{Flu}) = \frac{P(\text{COVID}, \text{Flu} | \text{Fever}) \cdot P(\text{Fever})}{P(\text{COVID}, \text{Flu})}
   $$

- No Fever: 
   $$
   P(\text{No Fever} | \text{COVID}, \text{Flu}) = \frac{P(\text{COVID}, \text{Flu} | \text{No Fever}) \cdot P(\text{No Fever})}{P(\text{COVID}, \text{Flu})}
   $$

Since we are only comparing the probabilities of fever vs no fever, we can ignore the denominator \(P(\text{COVID}, \text{Flu})\), which is the same in both cases. This simplifies our formulas to:

$$
P(\text{Fever} | \text{COVID}, \text{Flu}) \propto P(\text{COVID}, \text{Flu} | \text{Fever}) \cdot P(\text{Fever})
$$
$$
P(\text{No Fever} | \text{COVID}, \text{Flu}) \propto P(\text{COVID}, \text{Flu} | \text{No Fever}) \cdot P(\text{No Fever})
$$

#### Naive Bayes Assumption

Now, Naive Bayes assumes that each of the features (in this case, COVID and Flu) are independent of each other. This means we can split the joint probability like this:

$$
P(\text{Fever} | \text{COVID}, \text{Flu}) \propto P(\text{COVID} | \text{Fever}) \cdot P(\text{Flu} | \text{Fever}) \cdot P(\text{Fever})
$$

$$
P(\text{No Fever} | \text{COVID}, \text{Flu}) \propto P(\text{COVID} | \text{No Fever}) \cdot P(\text{Flu} | \text{No Fever}) \cdot P(\text{No Fever})
$$

#### Example Data

Let’s use some data to calculate the probabilities:
- We have 10 people.
- 7 people have fever, and 3 people have no fever.

Among those with fever:
- 4 have COVID, and 3 do not.
- 3 have Flu, and 4 do not.

Among those with no fever:
- 1 has COVID, and 2 do not.
- 1 has Flu, and 2 do not.

#### Calculating the Probabilities

##### Step 1: Calculate the prior probabilities

- The probability of having fever:
  $$
  P(\text{Fever}) = \frac{7}{10}
  $$
- The probability of having no fever:
  $$
  P(\text{No Fever}) = \frac{3}{10}
  $$

##### Step 2: Calculate the conditional probabilities

For fever:
- The probability of having COVID given fever:
  $$
  P(\text{COVID} | \text{Fever}) = \frac{4}{7}
  $$
- The probability of having Flu given fever:
  $$
  P(\text{Flu} | \text{Fever}) = \frac{3}{7}
  $$

For no fever:
- The probability of having COVID given no fever:
  $$
  P(\text{COVID} | \text{No Fever}) = \frac{1}{3}
  $$
- The probability of having Flu given no fever:
  $$
  P(\text{Flu} | \text{No Fever}) = \frac{1}{3}
  $$

##### Step 3: Compute the final probabilities

Now we can calculate the probabilities for both fever and no fever.

For fever:
$$
P(\text{Fever} | \text{COVID}, \text{Flu}) \propto P(\text{COVID} | \text{Fever}) \cdot P(\text{Flu} | \text{Fever}) \cdot P(\text{Fever})
$$
Substitute the values:
$$
P(\text{Fever} | \text{COVID}, \text{Flu}) \propto \frac{4}{7} \cdot \frac{3}{7} \cdot \frac{7}{10} = \frac{12}{70}
$$

For no fever:
$$
P(\text{No Fever} | \text{COVID}, \text{Flu}) \propto P(\text{COVID} | \text{No Fever}) \cdot P(\text{Flu} | \text{No Fever}) \cdot P(\text{No Fever})
$$
Substitute the values:
$$
P(\text{No Fever} | \text{COVID}, \text{Flu}) \propto \frac{1}{3} \cdot \frac{1}{3} \cdot \frac{3}{10} = \frac{1}{30}
$$

##### Step 4: Compare the results

- The probability of having fever is  $\frac{12}{70} = 0.171 $
- The probability of having no fever is $\frac{1}{30} = 0.033 $

Since \( 0.171 > 0.033 \), the classifier predicts that the person is more likely to have fever.

#### Conclusion

Based on the Naive Bayes Classifier, the person with both COVID and Flu is more likely to have fever than no fever. This shows how Naive Bayes can help us classify based on independent features and their probabilities.

This explanation simplifies the calculation process while still demonstrating how Naive Bayes works in a real-world classification scenario!

---