**Q1**. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?

**Answer**:
To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we can use Bayes' theorem.

Let's define the events:

A= Employee uses the company's health insurance plan

B = Employee is a smoker

We are given the following probabilities:

P(A) = 0.70 (Probability that an employee uses the health insurance plan)

P(B | A) = 0.40 (Probability that an employee is a smoker given that he/she uses the health insurance plan)

We want to find:
P(B | A) = Probability that an employee is a smoker given that he/she uses the health insurance plan

Bayes' theorem states:

P(B | A) = (P(A | B) * P(B)) / P(A)

We have P(A) and P(B | A), but we need to find P(A | B). We can find it using the formula:

P(A | B) = (P(B | A) * P(A)) / P(B)

Now, let's calculate it step by step:

P(A | B) = (P(B | A) * P(A)) / P(B)

P(A | B) = (0.40 * 0.70) / P(B) (Substituting the given probabilities)

P(A | B) = 0.28 / P(B)

We still need to find P(B), the probability that an employee is a smoker. To do that, we can use the law of total probability:

P(B) = P(B | A) * P(A) + P(B | A') * P(A')

Where A' is the complement of event A (not using the health insurance plan), and P(A') = 1 - P(A).

We know that P(B | A') is not given, but we can calculate it using the complement rule:

P(B | A') = 1 - P(not B | A') (Note: "not B" means "not a smoker")

We are given that 40% of the employees who use the plan are smokers (P(B | A) = 0.40), so the remaining 60% are not smokers (1 - 0.40 = 0.60).

Now, we can calculate P(B | A'):

P(B | A') = 1 - P(not B | A')

P(B | A') = 1 - 0.60

P(B | A') = 0.40

Now, we can calculate P(B) using the law of total probability:

P(B) = P(B | A) * P(A) + P(B | A') * P(A')

P(B) = 0.40 * 0.70 + 0.40 * (1 - 0.70)

P(B) = 0.28 + 0.12

P(B) = 0.40

Now, we can find P(A | B):

P(A | B) = 0.28 / P(B)

P(A | B) = 0.28 / 0.40

P(A | B) = 0.7

So, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.7 or 70%.

**Q2**. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

**Answer**
The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the type of data they are designed to handle and the assumptions they make about the data.

**(I) Data Type:**


**Bernoulli Naive Bayes**: This classifier is used for binary data, where features take on only two values (0 or 1) to represent the absence or presence of a particular feature. It is well-suited for problems where each feature is a binary variable, such as text classification where each word is either present (1) or absent (0) in a document.

**(II) Multinomial Naive Bayes:** Multinomial Naive Bayes is used for discrete count data, where features represent the frequency or count of occurrences of various categories. It is commonly used in text classification tasks where features are word frequencies or word counts in a document, and each feature can take on multiple discrete values.

**(II) Feature Independence Assumption:**
Both Bernoulli and Multinomial Naive Bayes classifiers assume feature independence given the class label. This means that they treat each feature as conditionally independent of other features, given the class label. This "naive" assumption simplifies the modeling process and allows for more efficient and scalable classification.

**(III) Modeling Approach:**

**Bernoulli Naive Bayes**: In Bernoulli Naive Bayes, the presence (1) or absence (0) of a feature in a document is represented as binary indicators. The probability of each feature's occurrence is modeled using a Bernoulli distribution, which assumes that each feature has a binary distribution.

**Multinomial Naive Bayes**: In Multinomial Naive Bayes, the frequency or count of each feature's occurrence is represented as non-negative integers. The probability of each feature's occurrence is modeled using a Multinomial distribution, which assumes that each feature has a discrete count distribution.

**(IV) Application:**

**Bernoulli Naive Bayes**: Bernoulli Naive Bayes is commonly used in text classification problems where the presence or absence of words is essential, such as spam email classification or sentiment analysis.

**Multinomial Naive Bayes**: Multinomial Naive Bayes is well-suited for text classification tasks where the frequency of words matters, such as document categorization, topic classification, or language detection.

**Q3**. How does Bernoulli Naive Bayes handle missing values?

**Answer**:
Bernoulli Naive Bayes handles missing values by considering them as a separate category or class for each feature. When a feature has missing values, it means that the presence or absence of that feature is unknown for certain instances. In such cases, the missing value is treated as an additional category, distinct from both the presence (1) and absence (0) categories.

The key idea is to explicitly model the missing values as a separate category during the training phase. When calculating probabilities for each class during classification, the classifier considers the missing value category in addition to the presence and absence categories. This way, the presence, absence, and missing values are all taken into account for making predictions.

During training, the classifier estimates the probabilities of the presence (1), absence (0), and missing value categories for each feature given the class labels. The probabilities are calculated based on the frequency of each category in the training data.

During prediction, when encountering a missing value for a feature in a new instance, the classifier incorporates the probabilities of the missing value category along with the probabilities of the presence and absence categories to calculate the posterior probabilities for each class. The class with the highest posterior probability is then assigned to the new instance as the predicted class.

By treating missing values as an additional category, Bernoulli Naive Bayes is robust to handling instances with incomplete or missing data. It allows the classifier to make predictions even when some feature values are unknown, making it useful in real-world scenarios where data may not be complete for all instances.


**Q4**. Can Gaussian Naive Bayes be used for multi-class classification?

**Answer**:
Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is a variant of the Naive Bayes algorithm that assumes that the features follow a Gaussian (normal) distribution within each class. Despite its name, Gaussian Naive Bayes can handle both binary and multi-class classification problems.

In multi-class classification, there are more than two classes, and each instance is assigned to one of these multiple classes. Gaussian Naive Bayes extends naturally to handle multiple classes by applying Bayes' theorem independently for each class and then selecting the class with the highest posterior probability as the predicted class for a given instance.

The algorithm works as follows:

**Training:** During the training phase, Gaussian Naive Bayes estimates the mean and variance of each feature within each class. For each feature and each class, it calculates the mean and variance of the feature values for instances belonging to that class. This information is used to model the Gaussian distribution for each feature and class.

**Prediction**: Given a new instance with feature values, Gaussian Naive Bayes calculates the likelihood of the feature values for each class using the estimated mean and variance for that class. It then applies Bayes' theorem to calculate the posterior probability of each class given the observed feature values. The class with the highest posterior probability is assigned as the predicted class for the new instance.

The assumption of Gaussian distribution can work reasonably well for continuous or real-valued features. However, if the features do not follow a Gaussian distribution or have complex relationships, other variants of Naive Bayes (such as Multinomial Naive Bayes or Bernoulli Naive Bayes) or other classification algorithms might be more appropriate.

**Q5**. Assignment:

**Data preparation:**

Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

**Implementation**:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

**Results**:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score

**Discussion:**
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?

**Conclusion**:
Summarise your findings and provide some suggestions for future work.

In [7]:
import pandas as pd 

import pandas as pd

dataframe1 = pd.read_csv("spambase.data")
dataframe1.to_csv (dataframe1.to_csv('spambase.data', 
                  index = None))
data=dataframe1

In [8]:
data

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
0,0.21,0.28,0.50,0.0,0.14,0.28,0.21,0.07,0.00,0.94,...,0.000,0.132,0.0,0.372,0.180,0.048,5.114,101,1028,1
1,0.06,0.00,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.010,0.143,0.0,0.276,0.184,0.010,9.821,485,2259,1
2,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.137,0.0,0.137,0.000,0.000,3.537,40,191,1
3,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.135,0.0,0.135,0.000,0.000,3.537,40,191,1
4,0.00,0.00,0.00,0.0,1.85,0.00,0.00,1.85,0.00,0.00,...,0.000,0.223,0.0,0.000,0.000,0.000,3.000,15,54,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4595,0.31,0.00,0.62,0.0,0.00,0.31,0.00,0.00,0.00,0.00,...,0.000,0.232,0.0,0.000,0.000,0.000,1.142,3,88,0
4596,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.000,0.000,0.0,0.353,0.000,0.000,1.555,4,14,0
4597,0.30,0.00,0.30,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.102,0.718,0.0,0.000,0.000,0.000,1.404,6,118,0
4598,0.96,0.00,0.00,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.000,0.057,0.0,0.000,0.000,0.000,1.147,5,78,0


In [9]:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB

In [10]:
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

In [11]:
X

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.40,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278
0,0.21,0.28,0.50,0.0,0.14,0.28,0.21,0.07,0.00,0.94,...,0.0,0.000,0.132,0.0,0.372,0.180,0.048,5.114,101,1028
1,0.06,0.00,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.0,0.010,0.143,0.0,0.276,0.184,0.010,9.821,485,2259
2,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.0,0.000,0.137,0.0,0.137,0.000,0.000,3.537,40,191
3,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.0,0.000,0.135,0.0,0.135,0.000,0.000,3.537,40,191
4,0.00,0.00,0.00,0.0,1.85,0.00,0.00,1.85,0.00,0.00,...,0.0,0.000,0.223,0.0,0.000,0.000,0.000,3.000,15,54
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4595,0.31,0.00,0.62,0.0,0.00,0.31,0.00,0.00,0.00,0.00,...,0.0,0.000,0.232,0.0,0.000,0.000,0.000,1.142,3,88
4596,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.000,0.0,0.353,0.000,0.000,1.555,4,14
4597,0.30,0.00,0.30,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.0,0.102,0.718,0.0,0.000,0.000,0.000,1.404,6,118
4598,0.96,0.00,0.00,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.057,0.0,0.000,0.000,0.000,1.147,5,78


In [12]:
y

0       1
1       1
2       1
3       1
4       1
       ..
4595    0
4596    0
4597    0
4598    0
4599    0
Name: 1, Length: 4600, dtype: int64

In [13]:
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

In [14]:
# Evaluate classifiers using cross-validation
def evaluate_classifier(classifier, X, y):
    accuracy = cross_val_score(classifier, X, y, cv=10, scoring='accuracy').mean()
    precision = cross_val_score(classifier, X, y, cv=10, scoring='precision').mean()
    recall = cross_val_score(classifier, X, y, cv=10, scoring='recall').mean()
    f1_score = cross_val_score(classifier, X, y, cv=10, scoring='f1').mean()
    return accuracy, precision, recall, f1_score


In [15]:
# Evaluate each classifier
accuracy_bernoulli, precision_bernoulli, recall_bernoulli, f1_score_bernoulli = evaluate_classifier(bernoulli_nb, X, y)
accuracy_multinomial, precision_multinomial, recall_multinomial, f1_score_multinomial = evaluate_classifier(multinomial_nb, X, y)
accuracy_gaussian, precision_gaussian, recall_gaussian, f1_score_gaussian = evaluate_classifier(gaussian_nb, X, y)

In [16]:
print("Bernoulli Naive Bayes:")
print("Accuracy:", accuracy_bernoulli)
print("Precision:", precision_bernoulli)
print("Recall:", recall_bernoulli)
print("F1 Score:", f1_score_bernoulli)
print()

print("Multinomial Naive Bayes:")
print("Accuracy:", accuracy_multinomial)
print("Precision:", precision_multinomial)
print("Recall:", recall_multinomial)
print("F1 Score:", f1_score_multinomial)
print()

print("Gaussian Naive Bayes:")
print("Accuracy:", accuracy_gaussian)
print("Precision:", precision_gaussian)
print("Recall:", recall_gaussian)
print("F1 Score:", f1_score_gaussian)

Bernoulli Naive Bayes:
Accuracy: 0.8839130434782609
Precision: 0.886914139754535
Recall: 0.8151235504826666
F1 Score: 0.8480714616697421

Multinomial Naive Bayes:
Accuracy: 0.786086956521739
Precision: 0.7390291264847734
Recall: 0.7207971586424625
F1 Score: 0.7277511309974372

Gaussian Naive Bayes:
Accuracy: 0.8217391304347826
Precision: 0.7102746648832371
Recall: 0.9569394693704085
F1 Score: 0.8129997873786424


**Analysis:**

Accuracy: Bernoulli Naive Bayes achieved the highest accuracy among the three classifiers, followed by Gaussian Naive Bayes and then Multinomial Naive Bayes.

Precision: Bernoulli Naive Bayes has the highest precision, indicating a low false-positive rate among the predicted positive instances.

Recall: Gaussian Naive Bayes achieved the highest recall, indicating a low false-negative rate among the actual positive instances.

F1 Score: The F1 score considers both precision and recall, and Bernoulli Naive Bayes has the highest F1 score, which means it achieved a good balance between precision and recall.

**Discussion:**

Bernoulli Naive Bayes performed well in this scenario, likely due to the nature of the data or the characteristics of the features. It is well-suited for binary data and is commonly used in text classification tasks, such as spam email detection, where the presence or absence of specific words or features is crucial.

Multinomial Naive Bayes had slightly lower performance, which could be because it might not be the best fit for this particular dataset or because it assumes count data, which might not align perfectly with the characteristics of the features.

Gaussian Naive Bayes performed reasonably well, but its performance may vary depending on how well the continuous numerical features follow a Gaussian distribution. It is generally suitable for continuous data.

**Limitations:**

Naive Bayes assumes that features are conditionally independent given the class, which may not always hold true in real-world scenarios.

The performance of Naive Bayes models heavily depends on the independence assumption, and in cases where features are highly correlated, the model's performance may be affected.

Naive Bayes may not capture complex relationships between features, and more sophisticated algorithms might be required for certain datasets.

**Conclusion:**

In conclusion, based on the provided results, Bernoulli Naive Bayes performed the best among the three classifiers for the given dataset. However, the choice of the best variant of Naive Bayes or any classifier depends on the specific dataset and problem at hand. To make a more informed decision, further analysis, hyperparameter tuning, and potentially trying other classification algorithms could be explored. Naive Bayes classifiers can be useful and efficient in many scenarios, but it is essential to understand their assumptions and limitations when applying them to real-world problems.





