Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan? 

To find the probability that an employee is a smoker given that they use the health insurance plan, you can use conditional probability. Let's define some terms:

Let A be the event that an employee is a smoker.
Let B be the event that an employee uses the health insurance plan.

You are given:
P(B) = 0.70 (probability that an employee uses the health insurance plan)
P(A|B) = 0.40 (probability that an employee is a smoker given that they use the health insurance plan)

The formula for conditional probability is:
P(A|B) = P(A ∩ B) / P(B)

You want to find P(A|B), so you can rearrange the formula to solve for it:
P(A ∩ B) = P(A|B) * P(B)

Now plug in the values:
P(A ∩ B) = 0.40 * 0.70 
P(A ∩ B) = 0.28

Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes? 

Bernoulli Naive Bayes and Multinomial Naive Bayes are both variants of the Naive Bayes algorithm used in machine learning and natural language processing for classification tasks. They have some similarities but also key differences based on the nature of the data they are designed to handle.

1. Bernoulli Naive Bayes:

- Binary Data: Bernoulli Naive Bayes is typically used for binary or Boolean features, where each feature can take only one of two possible values (usually 0 or 1). It's suitable for situations where you want to model the presence or absence of certain features.
- Document Classification: It is commonly used in text classification tasks, where each word is treated as a feature, and the presence or absence of a word in a document is considered.
- Assumption: It assumes that the features are conditionally independent given the class label (hence the "Naive" in Naive Bayes), and it calculates the likelihood of a document belonging to a particular class based on the presence or absence of certain words/features.
2. Multinomial Naive Bayes:

- Count Data: Multinomial Naive Bayes is suitable for discrete count data, where features represent counts or frequencies of events. This is commonly used for text classification when considering the frequency of words in a document.
- Document Classification: Like Bernoulli Naive Bayes, it's also used in text classification, where the focus is on the frequency of words rather than just their presence or absence.
- Assumption: Similar to Bernoulli Naive Bayes, it assumes conditional independence between features given the class label. However, unlike Bernoulli Naive Bayes, it deals with multiple discrete features.

Q3. How does Bernoulli Naive Bayes handle missing values? 

Bernoulli Naive Bayes, like other Naive Bayes variants, handles missing values in a specific way due to its assumption of conditional independence between features given the class label. When dealing with missing values in Bernoulli Naive Bayes, there are a few common approaches:

1. Ignoring Missing Values: One straightforward approach is to simply ignore instances with missing values during training and classification. This can work if the missing values are relatively rare and randomly distributed across the dataset. However, this approach may lead to information loss, especially if missing values are not random.

2. Imputation: Another approach is to impute missing values with either the most common value (0 or 1) for the respective feature in the given class or some other method appropriate for binary data. Imputation can help retain information from instances with missing values, but it may introduce biases if the imputation method is not carefully chosen.

3. Special Category: You can treat missing values as a separate category or value for each feature. This means creating an additional category (say, "missing") for each feature and assigning missing values to this category. This approach explicitly accounts for missingness in the data, but it can also increase the dimensionality of the feature space.

Q4. Can Gaussian Naive Bayes be used for multi-class classification? 

Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is a variant of the Naive Bayes algorithm that assumes that the features follow a Gaussian (normal) distribution within each class. It's commonly used for continuous numeric features.

In the context of multi-class classification, Gaussian Naive Bayes can be extended to handle multiple classes by applying the Naive Bayes framework to calculate the probabilities of different classes given the features.

Here's how Gaussian Naive Bayes can be used for multi-class classification:

1. Training:

- For each class in the dataset, calculate the mean and standard deviation of each feature (assuming Gaussian distribution) based on the training data belonging to that class.
- Compute the class prior probabilities (the probability of each class occurring in the dataset).
2. Classification:

- Given a new instance with feature values, calculate the probability of the instance belonging to each class using the Gaussian probability density function for each feature and class.
- Multiply the probabilities of individual features to get the final likelihood for each class.
- Apply Bayes' theorem to calculate the posterior probabilities for each class.
- Assign the instance to the class with the highest posterior probability.

Q5. Assignment:

Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

Implementation:

Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

Results:

Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score

Discussion:

Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?

Conclusion:

Summarise your findings and provide some suggestions for future work.

In [8]:
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB

# Load the dataset
data = pd.read_csv('spambase.data', header=None)
X = data.iloc[:, :-1].values  # Features
y = data.iloc[:, -1].values   # Labels

In [9]:
data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,48,49,50,51,52,53,54,55,56,57
0,0.00,0.64,0.64,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.000,0.000,0.0,0.778,0.000,0.000,3.756,61,278,1
1,0.21,0.28,0.50,0.0,0.14,0.28,0.21,0.07,0.00,0.94,...,0.000,0.132,0.0,0.372,0.180,0.048,5.114,101,1028,1
2,0.06,0.00,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.010,0.143,0.0,0.276,0.184,0.010,9.821,485,2259,1
3,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.137,0.0,0.137,0.000,0.000,3.537,40,191,1
4,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.135,0.0,0.135,0.000,0.000,3.537,40,191,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4596,0.31,0.00,0.62,0.0,0.00,0.31,0.00,0.00,0.00,0.00,...,0.000,0.232,0.0,0.000,0.000,0.000,1.142,3,88,0
4597,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.000,0.000,0.0,0.353,0.000,0.000,1.555,4,14,0
4598,0.30,0.00,0.30,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.102,0.718,0.0,0.000,0.000,0.000,1.404,6,118,0
4599,0.96,0.00,0.00,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.000,0.057,0.0,0.000,0.000,0.000,1.147,5,78,0


In [13]:
# Initialize classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

In [14]:
# Perform 10-fold cross-validation and calculate metrics
def evaluate_classifier(classifier, name):
    accuracy = np.mean(cross_val_score(classifier, X, y, cv=10, scoring='accuracy'))
    precision = np.mean(cross_val_score(classifier, X, y, cv=10, scoring='precision'))
    recall = np.mean(cross_val_score(classifier, X, y, cv=10, scoring='recall'))
    f1_score = np.mean(cross_val_score(classifier, X, y, cv=10, scoring='f1'))
    print(f"{name} Naive Bayes:")
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall: {recall:.4f}")
    print(f"F1 Score: {f1_score:.4f}")
    print()

In [15]:
# Evaluate classifiers
evaluate_classifier(bernoulli_nb, "Bernoulli")
evaluate_classifier(multinomial_nb, "Multinomial")
evaluate_classifier(gaussian_nb, "Gaussian")

Bernoulli Naive Bayes:
Accuracy: 0.8839
Precision: 0.8870
Recall: 0.8152
F1 Score: 0.8481

Multinomial Naive Bayes:
Accuracy: 0.7863
Precision: 0.7393
Recall: 0.7215
F1 Score: 0.7283

Gaussian Naive Bayes:
Accuracy: 0.8218
Precision: 0.7104
Recall: 0.9570
F1 Score: 0.8131

