# Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

To solve this problem, we need to use Bayes' theorem, which relates conditional probabilities. Let's define:

A: an employee uses the company's health insurance plan
B: an employee is a smoker

We want to find the probability of an employee being a smoker given that he/she uses the health insurance plan, which is P(B|A).

We know that 70% of the employees use the health insurance plan, which means P(A) = 0.7.

We also know that 40% of the employees who use the plan are smokers, which means P(B|A) = 0.4.

Bayes' theorem states that: P(B|A) = P(A|B) * P(B) / P(A)

We need to find P(B), which is the probability of an employee being a smoker regardless of whether they use the health insurance plan or not. We can use the law of total probability to calculate it:

P(B) = P(B|A) * P(A) + P(B|A') * P(A')

where A' means an employee does not use the health insurance plan. We can assume that the percentage of non-users of the plan who are smokers is negligible, so P(B|A') ≈ 0. Therefore:

P(B) ≈ P(B|A) * P(A) + 0

P(B) ≈ 0.4 * 0.7 = 0.28

Now we can plug in all the values into Bayes' theorem:

P(B|A) = P(A|B) * P(B) / P(A)

P(B|A) = P(A and B) / P(A)

P(B|A) = P(B|A) * P(A) / P(A)

P(B|A) = 0.4 * 0.7 / 0.7

P(B|A) = 0.4

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.4 or 40%.

# Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Bernoulli Naive Bayes and Multinomial Naive Bayes are both variants of the Naive Bayes algorithm, which is a popular algorithm for classification tasks in machine learning. While they are both based on the same underlying principles, there are some differences in the way they handle data.

Bernoulli Naive Bayes is typically used when the features are binary & it takes only two values, 0 & 1. It is commonly used in text classification tasks, where each feature represents the presence or absence of a particular word in a document. In Bernoulli Naive Bayes, each feature is modeled as a binary random variable, with the assumption that each feature is conditionally independent given the class. This means that the presence or absence of one feature does not affect the probability of the presence or absence of any other feature. The algorithm then calculates the conditional probability of each class given the presence or absence of each feature, using Bayes' theorem.

Multinomial Naive Bayes, on the other hand, is used when the features are discrete & it takes some non-negative integer values. It is commonly used in text classification tasks, where each feature represents the count of a particular word in a document. In Multinomial Naive Bayes, each feature is modeled as a multinomial random variable, with the assumption that each feature is conditionally independent given the class. This means that the count of one feature does not affect the probability of the count of any other feature. The algorithm then calculates the conditional probability of each class given the count of each feature, using Bayes' theorem.

In summary, Bernoulli Naive Bayes is used for binary features, while Multinomial Naive Bayes is used for discrete count features. Both algorithms assume that each feature is conditionally independent given the class, and both calculate the conditional probability of each class given the features using Bayes' theorem.

# Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes is a classification algorithm that is commonly used in natural language processing tasks such as text classification. It is a variant of the Naive Bayes algorithm that assumes that the features are binary or Boolean, indicating whether a particular feature is present or not.

In the case of missing values in the input data, Bernoulli Naive Bayes handles them by simply ignoring the missing values and treating them as if they were not present in the data. This is because the algorithm assumes that the features are independent of each other, and therefore the absence of a particular feature does not affect the probability of the presence of another feature.

However, it is important to note that the presence or absence of certain features can have a significant impact on the classification accuracy of the algorithm. Therefore, it is recommended to handle missing values in the input data by imputing correct values, such as the mean or median value of that desired feature before applying the Bernoulli Naive Bayes algorithm.

# Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification tasks. The algorithm can be extended to handle multiple classes by using the "one-vs-all" or "one-vs-rest" strategy, where the algorithm trains multiple binary classifiers, one for each class, and then combines their results to make the final prediction.

In the "one-vs-all" strategy, for each class, the algorithm considers all instances of that class as positive, as well as, negative examples. It then trains a binary classifier for each class using the Gaussian Naive Bayes algorithm. During prediction, the algorithm applies each classifier to the input instance and selects the class with the highest probability as the final prediction.

Alternatively, in the "one-vs-rest" strategy, the algorithm considers each class separately and treats it as the positive, as well as, negative class. It then trains a binary classifier for each class using the Gaussian Naive Bayes algorithm. During prediction, the algorithm applies each classifier to the input instance and selects the class with the highest probability as the final prediction.

Overall, Gaussian Naive Bayes is a powerful and efficient algorithm for multi-class classification tasks, especially in situations where the feature variables are continuous and have a Gaussian distribution.

# Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.
Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score
Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
Conclusion:
Summarise your findings and provide some suggestions for future work.

In [2]:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB

# Load the data
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data', header=None)
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values

# Bernoulli Naive Bayes classifier
bnb = BernoulliNB()
bnb_scores = cross_val_score(bnb, X, y, cv=10)

# Multinomial Naive Bayes classifier
mnb = MultinomialNB()
mnb_scores = cross_val_score(mnb, X, y, cv=10)

# Gaussian Naive Bayes classifier
gnb = GaussianNB()
gnb_scores = cross_val_score(gnb, X, y, cv=10)

# Print the results
print("Bernoulli Naive Bayes:")
print("Accuracy: {:.2f}".format(bnb_scores.mean()))
print("Precision: {:.2f}".format(cross_val_score(bnb, X, y, cv=10, scoring='precision').mean()))
print("Recall: {:.2f}".format(cross_val_score(bnb, X, y, cv=10, scoring='recall').mean()))
print("F1 Score: {:.2f}".format(cross_val_score(bnb, X, y, cv=10, scoring='f1').mean()))

print("\nMultinomial Naive Bayes:")
print("Accuracy: {:.2f}".format(mnb_scores.mean()))
print("Precision: {:.2f}".format(cross_val_score(mnb, X, y, cv=10, scoring='precision').mean()))
print("Recall: {:.2f}".format(cross_val_score(mnb, X, y, cv=10, scoring='recall').mean()))
print("F1 Score: {:.2f}".format(cross_val_score(mnb, X, y, cv=10, scoring='f1').mean()))

print("\nGaussian Naive Bayes:")
print("Accuracy: {:.2f}".format(gnb_scores.mean()))
print("Precision: {:.2f}".format(cross_val_score(gnb, X, y, cv=10, scoring='precision').mean()))
print("Recall: {:.2f}".format(cross_val_score(gnb, X, y, cv=10, scoring='recall').mean()))
print("F1 Score: {:.2f}".format(cross_val_score(gnb, X, y, cv=10, scoring='f1').mean()))

Bernoulli Naive Bayes:
Accuracy: 0.88
Precision: 0.89
Recall: 0.82
F1 Score: 0.85

Multinomial Naive Bayes:
Accuracy: 0.79
Precision: 0.74
Recall: 0.72
F1 Score: 0.73

Gaussian Naive Bayes:
Accuracy: 0.82
Precision: 0.71
Recall: 0.96
F1 Score: 0.81
