## Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

Let's use conditional probability to find the probability that an employee is a smoker given that he/she uses the health insurance plan.

Let:
- \( A \) be the event that an employee uses the health insurance plan.
- \( B \) be the event that an employee is a smoker.

The probability that an employee uses the health insurance plan is \( P(A) = 0.70 \) (given as 70%).

The probability that an employee who uses the health insurance plan is a smoker is \( P(B|A) = 0.40 \) (given as 40%).

The probability of an employee being a smoker given that he/she uses the health insurance plan, denoted as \( P(B|A) \), is calculated using the formula:

\[ P(B|A) = \frac{P(A \cap B)}{P(A)} \]

In this case, \( P(A \cap B) \) is the probability that an employee both uses the health insurance plan and is a smoker.

\[ P(B|A) = \frac{P(A \cap B)}{P(A)} \]

\[ P(B|A) = 0.4/0.7 \]

Now, calculate:

\[ P(B|A) = 4/7 \]

So, the probability that an employee is a smoker given that he/she uses the health insurance plan is 4/7.

## Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

The key difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the nature of the features and the underlying assumptions about the data.

1. **Bernoulli Naive Bayes:**
   - **Feature Type:** It is suitable for binary feature data, where features can take on only two values (typically 0 and 1).
   - **Example Use Cases:** Text classification where features represent the presence (1) or absence (0) of certain words in a document. It's often used in document classification tasks.

2. **Multinomial Naive Bayes:**
   - **Feature Type:** It is designed for discrete data, specifically for features that represent counts or frequencies (e.g., word counts in a document).
   - **Example Use Cases:** Text classification where features are word frequencies in a document. It is commonly used in natural language processing tasks such as spam filtering, sentiment analysis, and topic classification.

In summary, Bernoulli Naive Bayes is suitable for binary feature data, while Multinomial Naive Bayes is designed for discrete feature data, typically representing counts or frequencies. The choice between them depends on the nature of the data you are working with and the assumptions that align with your specific classification problem.

## Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes assumes binary feature data, where features can take on values of 0 or 1. When dealing with missing values in Bernoulli Naive Bayes, there are a few common approaches:

1. **Imputation with Zero or One:**
   - If a missing value is encountered, it can be imputed (filled in) with either 0 or 1, depending on the context and the nature of the data. This assumes that the missing value can be reasonably considered as the absence (0) or presence (1) of a certain feature.

2. **Ignoring Missing Values:**
   - Another approach is to simply ignore instances with missing values during the training and classification steps. This means that instances with missing values won't contribute to the calculation of probabilities for the corresponding features.

3. **Imputation with Feature Statistics:**
   - Missing values can be imputed based on the statistics of the observed values for the specific feature. For example, you might replace missing values with the mean or mode of the observed values for that feature.

It's important to note that the choice of how to handle missing values depends on the nature of the data and the specific characteristics of the problem at hand. The chosen approach should be reasonable and aligned with the assumptions of the Bernoulli Naive Bayes model for binary feature data.

## Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. The Gaussian Naive Bayes algorithm is an extension of the Naive Bayes algorithm for situations where the features are continuous and assumed to follow a Gaussian (normal) distribution. While it's commonly used for binary classification problems, it can be adapted for multi-class classification as well.

In the context of multi-class classification, the Gaussian Naive Bayes algorithm operates by estimating the parameters of the Gaussian distribution (mean and variance) for each class and each feature. When making predictions for a new instance, it calculates the probability of the instance belonging to each class based on the Gaussian distribution parameters and then assigns the class with the highest probability.

In summary, Gaussian Naive Bayes can be applied to handle multi-class classification problems, making it a versatile algorithm for scenarios where features are continuous and assumed to be normally distributed.

## Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.
Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score
Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
Conclusion:
Summarise your findings and provide some suggestions for future work.

In [5]:
import pandas as pd
from sklearn.model_selection import cross_validate
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import StratifiedKFold

# Load the Spambase dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
names = ["word_freq_make", "word_freq_address", "word_freq_all", "word_freq_3d", "word_freq_our", "word_freq_over",
         "word_freq_remove", "word_freq_internet", "word_freq_order", "word_freq_mail", "word_freq_receive",
         "word_freq_will", "word_freq_people", "word_freq_report", "word_freq_addresses", "word_freq_free",
         "word_freq_business", "word_freq_email", "word_freq_you", "word_freq_credit", "word_freq_your",
         "word_freq_font", "word_freq_000", "word_freq_money", "word_freq_hp", "word_freq_hpl", "word_freq_george",
         "word_freq_650", "word_freq_lab", "word_freq_labs", "word_freq_telnet", "word_freq_857", "word_freq_data",
         "word_freq_415", "word_freq_85", "word_freq_technology", "word_freq_1999", "word_freq_parts",
         "word_freq_pm", "word_freq_direct", "word_freq_cs", "word_freq_meeting", "word_freq_original",
         "word_freq_project", "word_freq_re", "word_freq_edu", "word_freq_table", "word_freq_conference",
         "char_freq_;", "char_freq_(", "char_freq_[", "char_freq_!", "char_freq_$", "char_freq_#", "capital_run_length_average",
         "capital_run_length_longest", "capital_run_length_total", "spam_class"]
data = pd.read_csv(url, names=names, delimiter=",")
X = data.drop("spam_class", axis=1)
y = data["spam_class"]

# Define classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Define performance metrics
scoring_metrics = {
    'accuracy': make_scorer(accuracy_score),
    'precision': make_scorer(precision_score),
    'recall': make_scorer(recall_score),
    'f1': make_scorer(f1_score)
}

# Define cross-validation strategy
cv = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

# Evaluate classifiers using cross-validation
bernoulli_scores = cross_validate(bernoulli_nb, X, y, cv=cv, scoring=scoring_metrics)
multinomial_scores = cross_validate(multinomial_nb, X, y, cv=cv, scoring=scoring_metrics)
gaussian_scores = cross_validate(gaussian_nb, X, y, cv=cv, scoring=scoring_metrics)

# Display results
print("Bernoulli Naive Bayes:")
print("Accuracy:", bernoulli_scores['test_accuracy'].mean())
print("Precision:", bernoulli_scores['test_precision'].mean())
print("Recall:", bernoulli_scores['test_recall'].mean())
print("F1 Score:", bernoulli_scores['test_f1'].mean())
print("\n")

print("Multinomial Naive Bayes:")
print("Accuracy:", multinomial_scores['test_accuracy'].mean())
print("Precision:", multinomial_scores['test_precision'].mean())
print("Recall:", multinomial_scores['test_recall'].mean())
print("F1 Score:", multinomial_scores['test_f1'].mean())
print("\n")

print("Gaussian Naive Bayes:")
print("Accuracy:", gaussian_scores['test_accuracy'].mean())
print("Precision:", gaussian_scores['test_precision'].mean())
print("Recall:", gaussian_scores['test_recall'].mean())
print("F1 Score:", gaussian_scores['test_f1'].mean())


Bernoulli Naive Bayes:
Accuracy: 0.8856762237102707
Precision: 0.8855034508139028
Recall: 0.8157853196527229
F1 Score: 0.8490373364494272


Multinomial Naive Bayes:
Accuracy: 0.7902612468169388
Precision: 0.7406952001397835
Recall: 0.7214801772812822
F1 Score: 0.7305907760768595


Gaussian Naive Bayes:
Accuracy: 0.8202555880411204
Precision: 0.698905668336373
Recall: 0.9575344544957805
F1 Score: 0.8078451270743713


**Discussion:**
- **Best Performance:** Bernoulli Naive Bayes achieved the highest accuracy, precision, and F1 score among the three variants. It performed well in capturing both true positives and true negatives.
  
- **Multinomial Naive Bayes:** While it showed reasonable performance, it performed slightly lower than Bernoulli Naive Bayes. It's worth noting that the Multinomial variant is designed for discrete data and may not be as well-suited for this dataset with binary features.

- **Gaussian Naive Bayes:** While having a high recall, indicating it identifies a high proportion of actual spams, it has lower precision. This means it may classify some non-spam instances as spam. The Gaussian variant assumes a normal distribution for features, which might not be an ideal assumption for this dataset.

**Limitations of Naive Bayes:**
- **Assumption of Independence:** Naive Bayes assumes that features are independent, which might not hold true in real-world scenarios.
  
- **Sensitive to Feature Correlation:** If features are highly correlated, it can negatively impact the performance of Naive Bayes.

- **Limited Expressiveness:** Naive Bayes may not capture complex relationships in the data, especially when interactions between features are essential for accurate predictions.

In conclusion, the choice of the best variant depends on the specific characteristics of the dataset. In this case, Bernoulli Naive Bayes performed the best overall, likely because the dataset contains binary features. Understanding the nature of the features and the underlying assumptions of each variant is crucial for selecting an appropriate Naive Bayes model.