### Q1. Probability that an employee is a smoker given they use the health insurance plan

We are given:
- 70% of employees use the health insurance plan, \( P(Plan) = 0.7 \).
- 40% of the employees who use the plan are smokers, \( P(Smoker|Plan) = 0.4 \).

The question asks for \( P(Smoker|Plan) \), which is directly provided as **40%** or **0.4**. 

This probability means that given an employee uses the company's health insurance plan, there is a 40% chance that the employee is a smoker.

### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

**Bernoulli Naive Bayes** and **Multinomial Naive Bayes** differ primarily in how they handle the feature space:

- **Bernoulli Naive Bayes**:
  - This classifier is used when the features are binary (0 or 1), meaning it expects the presence or absence of features.
  - It is suitable for problems like text classification, where the model checks whether a particular word is present in a document or not.
  - It is ideal for cases where features represent binary outcomes, such as whether a word appears in an email.

- **Multinomial Naive Bayes**:
  - This classifier is used for discrete (count-based) features. It is typically applied in text classification tasks where the frequency of words (count data) is the feature.
  - It assumes that features represent counts of occurrences, making it appropriate when the features are not binary but reflect the number of times a feature occurs (e.g., word frequency in a document).

Thus, **Bernoulli Naive Bayes** is suited for binary feature vectors, while **Multinomial Naive Bayes** handles frequency-based feature vectors.

### Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes assumes binary features and does not inherently handle missing values. When faced with missing data in real-world scenarios:
- One option is to fill or impute the missing values with 0 or 1 based on the most likely occurrence (for instance, filling missing words as "absent").
- Another approach is to use imputation techniques (mean/mode imputation) to replace missing data, though this assumes some prior knowledge about how missingness relates to the problem.

Handling missing values effectively in Bernoulli Naive Bayes requires pre-processing, as the model does not handle missing data natively.

### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, **Gaussian Naive Bayes** can be used for **multi-class classification**. The Gaussian Naive Bayes model is versatile and can classify data into multiple categories by calculating the likelihood of each class and predicting the class with the highest posterior probability. 

Gaussian Naive Bayes assumes that the continuous features in the dataset are normally distributed, and it is effective for multi-class problems where each class can have its own normal distribution for each feature.

### Q5. Assignment

#### Data Preparation:
Download the **Spambase Data Set** from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains features extracted from email messages to predict whether an email is spam or not.

#### Implementation:

You will implement three Naive Bayes classifiers:
- **Bernoulli Naive Bayes**
- **Multinomial Naive Bayes**
- **Gaussian Naive Bayes**

Here’s how to implement these classifiers using **scikit-learn** in Python:

```python
import pandas as pd
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split

# Load dataset
data = pd.read_csv('spambase.data', header=None)
X = data.iloc[:, :-1]  # Features
y = data.iloc[:, -1]   # Target (spam or not spam)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# 10-fold cross-validation
for model, name in [(bernoulli_nb, "Bernoulli"), (multinomial_nb, "Multinomial"), (gaussian_nb, "Gaussian")]:
    print(f"\n{name} Naive Bayes:")
    accuracy = cross_val_score(model, X_train, y_train, cv=10, scoring='accuracy').mean()
    print(f"Accuracy: {accuracy}")

# Fit models
bernoulli_nb.fit(X_train, y_train)
multinomial_nb.fit(X_train, y_train)
gaussian_nb.fit(X_train, y_train)

# Make predictions
y_pred_bernoulli = bernoulli_nb.predict(X_test)
y_pred_multinomial = multinomial_nb.predict(X_test)
y_pred_gaussian = gaussian_nb.predict(X_test)

# Calculate metrics
def print_metrics(y_test, y_pred, model_name):
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred)
    recall = recall_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    print(f"\n{model_name} Naive Bayes:")
    print(f"Accuracy: {accuracy}")
    print(f"Precision: {precision}")
    print(f"Recall: {recall}")
    print(f"F1 Score: {f1}")

# Metrics for each classifier
print_metrics(y_test, y_pred_bernoulli, "Bernoulli")
print_metrics(y_test, y_pred_multinomial, "Multinomial")
print_metrics(y_test, y_pred_gaussian, "Gaussian")
```

#### Results:
For each classifier, you will report:
- **Accuracy**: Measures how often the classifier is correct.
- **Precision**: The ratio of true positives to the sum of true positives and false positives.
- **Recall**: The ratio of true positives to the sum of true positives and false negatives.
- **F1 Score**: The harmonic mean of precision and recall, balancing the two.

#### Discussion:
Based on the results from the above code, you would observe that:
- **Multinomial Naive Bayes** often performs best on text classification problems like this because it is suited for count-based features (like word frequencies in emails).
- **Bernoulli Naive Bayes** might perform well if you convert the features to binary, i.e., presence or absence of certain words.
- **Gaussian Naive Bayes** might underperform because it assumes normally distributed features, which is rarely the case with text data.

One limitation of Naive Bayes you might observe is its assumption of feature independence, which may not hold in real-world datasets, leading to suboptimal performance in some cases.

#### Conclusion:
The **Multinomial Naive Bayes** classifier is expected to perform best due to its suitability for handling count-based features like word frequencies in emails. For future work, you could explore ways to handle feature dependence (such as using more complex models like logistic regression or support vector machines) and apply feature engineering techniques to improve performance further.