**Q1.** A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

**Answer:**

Apologies for the confusion in the previous response. Let's solve the problem in brief:

Given information:
- 70% of the employees use the company's health insurance plan (P(A) = 0.70).
- 40% of the employees who use the plan are smokers (P(B|A) = 0.40).

We are asked to find the probability that an employee is a smoker given that he/she uses the health insurance plan (P(B|A)).

Using Bayes' theorem, the formula for finding $P(B|A)$ is:

$ P(B|A) = \frac{P(A|B) \cdot P(B)}{P(A)} $

Since we don't have direct information on $P(B)$ or $P(A|B)$, we cannot compute $P(B|A)$ precisely without making additional assumptions or having more data.

However, if we assume that the probability of being a smoker is the same whether an employee uses the health insurance plan or not (i.e., $P(B|A) = P(B|\neg A))$, we can estimate $P(B|A)$ as follows:

$ P(B|A) = \frac{P(B) \cdot P(A|B)}{P(A)} $

Using the assumption that \(P(B) = P(B|A)\), we can simplify further:

$ P(B|A) = \frac{P(B) \cdot P(A|B)}{P(A)} = \frac{0.40 \cdot 0.70}{0.70} = 0.40 $

Under this assumption, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.40 or 40%. However, please note that this value depends on the assumption we made about $P(B) = P(B|A)$ and may not represent the true probability without additional data.

**Q2.** What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

**Answer:**

The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the type of data they are designed to handle and the assumptions they make about the distribution of features.

1. **Bernoulli Naive Bayes**:
   - Data Type: Bernoulli Naive Bayes is suitable for binary data, where features are represented as 0s and 1s (e.g., presence or absence of a particular attribute).
   - Assumption: It assumes that each feature is conditionally independent of others given the class label. It models the presence or absence of each feature independently for each class.
   - Use Case: Bernoulli Naive Bayes is commonly used in text classification tasks, document classification, and problems where the input data is binary or can be transformed into binary features.

2. **Multinomial Naive Bayes**:
   - Data Type: Multinomial Naive Bayes is suitable for discrete data, where features represent counts or frequencies (e.g., word counts in a document).
   - Assumption: It assumes that each feature follows a multinomial distribution. It models the likelihood of observing each feature value (count or frequency) for each class.
   - Use Case: Multinomial Naive Bayes is frequently used in text classification tasks, such as spam filtering, sentiment analysis, and document categorization, where the input features are word frequencies or other discrete counts.

In summary, the key distinction between Bernoulli Naive Bayes and Multinomial Naive Bayes is the type of data they can handle and the distributional assumptions they make. If the input features are binary (0/1) and represent the presence or absence of attributes, Bernoulli Naive Bayes is a suitable choice. On the other hand, if the input features are discrete counts or frequencies, Multinomial Naive Bayes is more appropriate. Both classifiers are popular choices in text-related tasks due to their simplicity, efficiency, and ability to handle high-dimensional data like word frequencies in documents.

**Q3.** How does Bernoulli Naive Bayes handle missing values?

**Answer:**

Bernoulli Naive Bayes handles missing values in a straightforward manner. When dealing with missing values in the input data, Bernoulli Naive Bayes simply ignores the missing values during the calculation of probabilities.

Here's how Bernoulli Naive Bayes handles missing values:

1. **Training Phase**:
   - During the training phase, Bernoulli Naive Bayes estimates the probabilities of the presence or absence of each feature for each class based on the available training data.
   - If a particular feature is missing for a specific instance in the training data, that instance is simply excluded from the calculation of probabilities for that feature and class.
   - In other words, missing values are treated as if the corresponding features are not present, and the model proceeds with the available features.

2. **Testing Phase**:
   - During the testing phase, when predicting the class of a new instance that contains missing values for some features, the model also ignores those missing values.
   - The model makes predictions based on the available features, using the estimated probabilities from the training phase.
   - If a feature is missing in the new instance, the model assumes that the corresponding feature is not present (0) when calculating the probabilities for that class.

By ignoring missing values and treating them as if the corresponding features are not present, Bernoulli Naive Bayes ensures that predictions can still be made even when some feature values are not available for a given instance. However, it's important to note that handling missing values in this way may introduce biases in the model, particularly if the missing values are not missing completely at random (MCAR) or are informative about the class label. In such cases, more sophisticated techniques, such as imputation or advanced probabilistic models, may be necessary to handle missing data effectively.

**Q4.** Can Gaussian Naive Bayes be used for multi-class classification?

**Answer:**

Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is a variant of the Naive Bayes algorithm that assumes that the features follow a Gaussian (normal) distribution within each class. It is commonly used for continuous numerical data.

In the case of multi-class classification, where there are more than two classes, Gaussian Naive Bayes can be extended to handle multiple classes by applying the "one-vs-all" (OvA) or "one-vs-one" (OvO) strategy.

1. **One-vs-All (OvA)**:
   - In the OvA strategy, a separate binary Gaussian Naive Bayes classifier is trained for each class. Each classifier distinguishes between instances of one class (positive class) and instances of all the other classes (negative classes).
   - During prediction, the instance is classified by all the binary classifiers, and the class with the highest probability is chosen as the final prediction.

2. **One-vs-One (OvO)**:
   - In the OvO strategy, a separate binary Gaussian Naive Bayes classifier is trained for every pair of classes. For N classes, there are N * (N-1) / 2 binary classifiers.
   - During prediction, each binary classifier votes for its corresponding class. The class with the most votes is chosen as the final prediction.

Both the OvA and OvO strategies are popular approaches for extending binary classifiers like Gaussian Naive Bayes to handle multi-class classification problems. They allow us to use binary classifiers to perform multi-class classification efficiently. The choice between OvA and OvO may depend on the dataset size, computational resources, and specific characteristics of the problem.

In conclusion, Gaussian Naive Bayes can be adapted for multi-class classification by using either the one-vs-all or one-vs-one strategy, allowing it to handle problems with more than two classes effectively.

**Q5.** Assignment:
Data preparation:

Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.


Implementation:

Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each classifier.


Results:

Report the following performance metrics for each classifier:
- Accuracy
- Precision
- Recall
- F1 score

Discussion:

Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?


Conclusion:

Summarise your findings and provide some suggestions for future work.

In [1]:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [14]:
# Load the dataset from the UCI Machine Learning Repository
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
data = pd.read_csv(url, header=None)

In [15]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,48,49,50,51,52,53,54,55,56,57
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [16]:
# Split the dataset into features (X) and target (y)
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

In [17]:
# Create instances of the three Naive Bayes classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

In [18]:
# Perform 10-fold cross-validation and get performance metrics for each classifier
classifiers = [bernoulli_nb, multinomial_nb, gaussian_nb]
clf_names = ["Bernoulli Naive Bayes", "Multinomial Naive Bayes", "Gaussian Naive Bayes"]

In [19]:
for clf, clf_name in zip(classifiers, clf_names):
    accuracy = cross_val_score(clf, X, y, cv=10, scoring='accuracy').mean()
    precision = cross_val_score(clf, X, y, cv=10, scoring='precision').mean()
    recall = cross_val_score(clf, X, y, cv=10, scoring='recall').mean()
    f1_score = cross_val_score(clf, X, y, cv=10, scoring='f1').mean()

    print(f"\nResults for {clf_name}:")
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1 Score: {f1_score:.4f}")


Results for Bernoulli Naive Bayes:
Accuracy: 0.8839
Precision: 0.8870
Recall: 0.8152
F1 Score: 0.8481

Results for Multinomial Naive Bayes:
Accuracy: 0.7863
Precision: 0.7393
Recall: 0.7215
F1 Score: 0.7283

Results for Gaussian Naive Bayes:
Accuracy: 0.8218
Precision: 0.7104
Recall: 0.9570
F1 Score: 0.8131


THe accuracy of Bernoulli Naive Bayes is better than other two  models. One limitation of Naive Bayes is its strong independence assumption, which might not hold in real-world data. Also, it can struggle when dealing with features that are highly correlated or when the class distributions are imbalanced. Nevertheless, Naive Bayes can be a fast and effective choice for text classification and other similar tasks when the independence assumption is approximately met.