#  Naive Bayes

### Q1. What is Bayes' theorem?

Bayes' theorem is a fundamental concept in probability theory and statistics that describes the probability of an event, given prior knowledge or information. It's named after the 18th-century British mathematician and theologian, Thomas Bayes, who developed the foundational principles behind this theorem. Bayes' theorem is particularly useful in situations involving conditional probability, where you have some initial information or evidence and want to update your beliefs based on new data.

The key idea behind Bayes' theorem is that you can update your belief about the probability of event A occurring, given new evidence (event B), by considering both the prior probability of A and the likelihood of B occurring given A.

This theorem has applications in various fields, including statistics, machine learning, and artificial intelligence. It's a fundamental building block for Bayesian inference, which is a powerful method for updating probabilities as new information becomes available, making it a valuable tool for decision-making and probabilistic reasoning.

### Q2. What is the formula for Bayes' theorem?

The theorem can be expressed as follows, where A and B are events, and P(B) is not zero:

P(A|B) = ( P(B|A) * P(A) ) / P(B)

In words, this can be interpreted as:

- \( P(A|B) \): The probability of event A occurring given that event B has occurred.
- \( P(B|A) \): The probability of event B occurring given that event A has occurred.
- \( P(A) \): The prior probability of event A occurring.
- \( P(B) \): The prior probability of event B occurring.

### Q3. How is Bayes' theorem used in practice?

Imagine you have a regular six-sided dice, and you're interested in finding the probability of rolling a 6, given that you know the outcome is an even number.

1. **Prior Probability (Initial Belief)**:
   - The probability of rolling a 6 on a fair six-sided dice is  P(6) = 1/6
  P(even) = 3/6
   - The probability of rolling an even number on a fair six-sided dice is P(even) = 3/6 because there are three even numbers (2, 4, and 6) out of the total six possible outcomes.

2. **New Evidence**:
   - We know that the outcome is an even number, which means we have new evidence P(even) = 1

We want to find the **conditional probability** of rolling a 6 given that the outcome is an even number

Using Bayes' theorem:
P(6|even) = P(even|6) * P(6) / P(even)

Plug in the values:

P(6|even) = (1 * (1/6)) / (3/6)

Simplify:

  P(6|even) = 1/3

So, the probability of rolling a 6, given that the outcome is an even number, is P(6|even) = 1/3.

### Q4. What is the relationship between Bayes' theorem and conditional probability?

Bayes' theorem and conditional probability are closely related concepts, and Bayes' theorem can be thought of as a way to compute conditional probabilities when you have additional information or evidence.

**Conditional Probability:** Conditional probability is the probability of an event (A) occurring given that another event (B) has already occurred. It's denoted as P(A|B), and it measures the likelihood of event A happening within the context of event B.

**Bayes' Theorem:** Bayes' theorem provides a way to compute conditional probabilities when you know the reverse conditional probability and the prior probabilities of the individual events involved. The theorem allows you to update your beliefs about the likelihood of an event, given new evidence (B), by considering the initial beliefs (prior probabilities) and the likelihood of observing the new evidence if the event of interest (A) were true.

### Q5. How do you choose which type of Naive Bayes classifier to use for any given problem?


1. **Gaussian Naive Bayes**:
   - "Gaussian Naive Bayes is suitable for features that can be modeled with a normal distribution, whether they are continuous or discrete."
   - Gaussian Naive Bayes can handle features following a normal distribution, even if they are discrete.

2. **Multinomial Naive Bayes**:
   - "Multinomial Naive Bayes is often used for problems where features can be represented as counts or frequencies, such as text classification, but it can also be applied to other types of data."
   - Multinomial Naive Bayes has broader applicability beyond text classification, including cases where features can be represented as counts or frequencies, such as image classification.

3. **Bernoulli Naive Bayes**:

    - Assumption: Assumes binary (0/1) features, where each feature represents the presence (1) or absence (0) of a particular attribute.
     - Useful for binary data, such as text classification where you're interested in whether a specific word occurs in a document (1 if it does, 0 if it doesn't). Also applicable to other types of binary data, like user preferences.


### Q6. Assignment:

You have a dataset with two features, X1 and X2, and two possible classes, A and B. You want to use Naive
Bayes to classify a new instance with features X1 = 3 and X2 = 4. The following table shows the frequency of
each feature value for each class:

|Class| X1=1 | X1=2 |X1=3| X2=1 |X2=2| X2=3 |X2=4|
|-----|------|------|----|------|-----|-----|-----|
|    A |3 |3| 4| 4| 3| 3| 3|
|B |2| 2| 1| 2| 2| 2| 3|



Assuming equal prior probabilities for each class, which class would Naive Bayes predict the new instance
to belong to?


The Naive Bayes classifier will predict the class of a new instance by calculating the posterior probability of each class, given the features of the new instance. The class with the highest posterior probability will be the class that the Naive Bayes classifier predicts.

In this case, the prior probabilities of the two classes are equal. So, we need to calculate the conditional probability of the new instance belonging to class A, given the features X1 = 3 and X2 = 4, and the conditional probability of the new instance belonging to class B, given the features X1 = 3 and X2 = 4.

The conditional probability of the new instance belonging to class A can be calculated as follows:

```
P(A|X1=3, X2=4) = P(X1=3, X2=4|A) * P(A) / P(X1=3, X2=4)
```

The probability of the features X1 = 3 and X2 = 4 occurring together in class A is 4/10. The prior probability of class A is 1/2. The probability of the features X1 = 3 and X2 = 4 occurring together in the entire dataset is 7/10. So, the conditional probability of the new instance belonging to class A can be calculated as follows:

```
P(A|X1=3, X2=4) = (4/10) * (1/2) / (7/10) = 2/7
```

The conditional probability of the new instance belonging to class B can be calculated as follows:

```
P(B|X1=3, X2=4) = P(X1=3, X2=4|B) * P(B) / P(X1=3, X2=4)
```

The probability of the features X1 = 3 and X2 = 4 occurring together in class B is 3/10. The prior probability of class B is 1/2. The probability of the features X1 = 3 and X2 = 4 occurring together in the entire dataset is 7/10. So, the conditional probability of the new instance belonging to class B can be calculated as follows:

```
P(B|X1=3, X2=4) = (3/10) * (1/2) / (7/10) = 3/14
```

Since the conditional probability of the new instance belonging to class A is greater than the conditional probability of the new instance belonging to class B, the Naive Bayes classifier will predict that the new instance belongs to class A.

## The End