### Q1: What is Bayes' Theorem?

A1. Bayes' Theorem is a fundamental theorem in probability theory that describes how to update the probability of a hypothesis based on new evidence. It is widely used in various fields, including statistics, machine learning, and decision-making.

### Q2: What is the formula for Bayes' Theorem?

A2. The formula for Bayes' Theorem is:

$$
P(H|E) = \frac{P(E|H) \cdot P(H)}{P(E)}
$$

Where:
-  P(H|E) is the posterior probability: the probability of hypothesis H given the evidence  E.
-  P(E|H) is the likelihood: the probability of the evidence E given that the hypothesis H is true.
-  P(H) is the prior probability: the initial probability of the hypothesis H before seeing the evidence E.
-  P(E) is the marginal likelihood: the probability of observing the evidence E.

### Q3: How is Bayes' Theorem used in practice?

A3. Bayes' Theorem is used in various practical applications, including:
- **Spam Filtering:** To calculate the probability that an email is spam based on the presence of certain words.
- **Medical Diagnosis:** To update the probability of a disease given the presence of certain symptoms.
- **Machine Learning:** In classification algorithms like Naive Bayes to predict the class of new instances based on prior observations.

### Q4: What is the relationship between Bayes' Theorem and Conditional Probability?

A4. Bayes' Theorem is essentially an application of conditional probability. It provides a way to calculate the probability of one event given the probability of another event. Specifically, it allows us to reverse conditional probabilities, going from P(A|B) to P(B|A), and incorporates prior knowledge (prior probability) into the calculation.

### Q5: How do you choose which type of Naive Bayes classifier to use for any given problem?

A5. The choice of Naive Bayes classifier depends on the nature of the features in the dataset:
- **Gaussian Naive Bayes:** Use when the features are continuous and follow a Gaussian (normal) distribution.
- **Multinomial Naive Bayes:** Use when the features represent counts or frequencies (e.g., word counts in text classification).
- **Bernoulli Naive Bayes:** Use when the features are binary (e.g., presence or absence of a feature).

### Q6: You have a dataset with two features, X1 and X2, and two possible classes, A and B. You want to use Naive Bayes to classify a new instance with features X1 = 3 and X2 = 4. The following table shows the frequency of each feature value for each class:

Class |X1=1| X1=2| X1=3| X2=1| X2=2| X2=3| X2=4|
------|----|-----|-----|-----|-----|-----|-----|
A     |   3|    3|    4|    4|    3|    3|    3|
B     |   2|    2|    1|    2|    2|    2|    3|

Assuming equal prior probabilities for each class, which class would Naive Bayes predict the new instance to belong to?

A6. Given the dataset with features  X1 and X2 and classes  A and B, we want to classify a new instance with X1 = 3 and X2 = 4 using Naive Bayes.

#### Frequencies:
- Class A:
  -  X1=3 : 4 occurrences
  - x2=4 : 3 occurrences
- Class B:
  - X1=3 : 1 occurrence
  - X2=4 : 3 occurrences

#### Step-by-Step Calculation:
- **Prior Probabilities (Assumed equal):**
  
  P(A) = P(B) = 0.5
  

- **Likelihoods:**
  - $$ P(X1=3 | A) = \frac{4}{10} = 0.4 $$
  - $$ P(X2=4 | A) = \frac{3}{10} = 0.3 $$
  - $$ P(X1=3 | B) = \frac{1}{7} \approx 0.143 $$
  - $$ P(X2=4 | B) = \frac{3}{7} \approx 0.429 $$

- **Posterior Probabilities:**
  - For Class A:
   $$
    P(A | X1=3, X2=4) \propto P(X1=3 | A) \times P(X2=4 | A) \times P(A)
   $$
    $$
    P(A | X1=3, X2=4) \propto 0.4 \times 0.3 \times 0.5 = 0.06
   $$
  - For Class B:
    $$
    P(B | X1=3, X2=4) \propto P(X1=3 | B) \times P(X2=4 | B) \times P(B)
    $$
    $$
    P(B | X1=3, X2=4) \propto 0.143 \times 0.429 \times 0.5 \approx 0.0307
   $$

#### Conclusion:
Since  P(A | X1=3, X2=4) = 0.06 is greater than  P(B | X1=3, X2=4) = 0.0307 , Naive Bayes would predict the new instance to belong to **Class A**.