### Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?

To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we need to use Bayes' theorem, which states:

P(A|B) = P(B|A) * P(A) / P(B)

Where:

P(A|B) is the probability of event A occurring given that event B has occurred
P(B|A) is the probability of event B occurring given that event A has occurred
P(A) is the probability of event A occurring
P(B) is the probability of event B occurring
In this case, event A is "being a smoker" and event B is "using the health insurance plan".

We are given that:

P(B) = 0.7 (70% of employees use the health insurance plan)
P(A|B) = ?
P(B|A) = 0.4 (40% of employees who use the plan are smokers)
P(A) = ? (we don't have this information)
To find P(A), we need to use the law of total probability:

P(A) = P(A|B) * P(B) + P(A|not B) * P(not B)

Where:

P(A|not B) is the probability of event A occurring given that event B has not occurred
P(not B) is the probability of event B not occurring, which is equal to 1 - P(B)
We are not given P(A|not B), but we can assume that it is lower than P(A|B), since the use of health insurance plan may be positively associated with smoking. Therefore, we can use a conservative estimate and assume P(A|not B) = 0.2.

Using the above equation, we can calculate P(A):

P(A) = P(A|B) * P(B) + P(A|not B) * P(not B)
= P(A|B) * 0.7 + 0.2 * 0.3
= 0.49 + 0.06
= 0.55

Now we can use Bayes' theorem to find P(A|B):

P(A|B) = P(B|A) * P(A) / P(B)
= 0.4 * 0.55 / 0.7
= 0.314

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.314 or approximately 31.4%.

### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of Naive Bayes algorithm that are commonly used in text classification and natural language processing.

The main difference between these two algorithms is the type of input data they are designed to work with. Bernoulli Naive Bayes is used for binary or Boolean data, while Multinomial Naive Bayes is used for count-based data.

Bernoulli Naive Bayes assumes that each feature in the input data is binary or Boolean, meaning that it takes on one of two values (e.g., 0 or 1, false or true, etc.). It is commonly used for text classification tasks where the presence or absence of certain words or features is used as input. In this case, each document is represented as a binary vector, where each element corresponds to the presence or absence of a particular word or feature.

On the other hand, Multinomial Naive Bayes is used when the input data is represented as count-based features. In text classification, this means that each document is represented as a vector of word counts, where each element corresponds to the number of times a particular word appears in the document. This algorithm assumes that the features are discrete and follow a multinomial distribution, hence the name "Multinomial" Naive Bayes.

In summary, the key difference between Bernoulli Naive Bayes and Multinomial Naive Bayes is the type of input data they are designed to handle. Bernoulli Naive Bayes works with binary data, while Multinomial Naive Bayes works with count-based data.

### Q3. How does Bernoulli Naive Bayes handle missing values?

In Bernoulli Naive Bayes, missing values are typically handled by treating them as a separate category or class. This means that instead of assuming that the missing values are either 0 or 1, we assume that they are a third category, denoted as "?", for example.

When we train the Bernoulli Naive Bayes model, we estimate the probability of each feature being equal to 0 or 1, as well as the probability of each class (i.e., the target variable). To incorporate the missing values, we also estimate the probability of each feature being equal to "?".

During prediction, if a missing value is encountered for a particular feature, we can use the probabilities we estimated during training to calculate the probability of the instance belonging to each class, given that the value of the missing feature is unknown. Specifically, we calculate the probability of the instance belonging to each class as follows:

P(y | x1, ..., xn, ?) ∝ P(y) * ∏ P(xi | y) for i = 1 to n

where:

y is the class (target variable)
xi is the value of the i-th feature (either 0, 1, or ?)
? is the missing value
∏ is the product symbol
We compute this probability for each class and then select the class with the highest probability as the predicted class.

Note that this approach assumes that the missing values are missing completely at random (MCAR). If the missing values are not MCAR, other techniques such as imputation may be more appropriate.

### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is a variant of the Naive Bayes algorithm that is used when the features are continuous and follow a Gaussian (i.e., normal) distribution.

To use Gaussian Naive Bayes for multi-class classification, we can extend the binary classification model by applying it to each possible pair of classes, and then combining the results. Specifically, for a problem with k classes, we would train k*(k-1)/2 binary classifiers, each one trained to distinguish between two classes.

At prediction time, given a new instance, we would apply each binary classifier to it and determine which of the k classes it is most likely to belong to, based on the probabilities output by each binary classifier. We can use a voting scheme, such as selecting the class with the highest probability or using a weighted voting scheme, to combine the outputs of the binary classifiers and obtain the final predicted class.

Alternatively, we can use a one-vs-all (OvA) approach, where we train k binary classifiers, each one trained to distinguish between one class and the rest of the classes combined. At prediction time, we apply each binary classifier to the new instance and select the class that has the highest probability output by any of the binary classifiers.

Overall, while Gaussian Naive Bayes was originally designed for binary classification problems, it can be extended to handle multi-class classification problems.

### Q5. Assignment:
    
Data preparation:
    
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

Implementation:
    
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

Results:
    
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score

Discussion:
    
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?

Conclusion:
    
Summarise your findings and provide some suggestions for future work.

Note: This dataset contains a binary classification problem with multiple features. The dataset is
relatively small, but it can be used to demonstrate the performance of the different variants of Naive
Bayes on a real-world problem.


code for implementing Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using scikit-learn in Python, and evaluating their performance on the Spambase dataset:

In [4]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import cross_val_score
import pandas as pd

# Load the dataset
data = pd.read_csv('spambase.data', header=None)

# Split the dataset into features and labels
X = data.iloc[:,:-1]
y = data.iloc[:,-1]

# Initialize the Bernoulli Naive Bayes classifier
bnb = BernoulliNB()

# Fit the classifier on the training data
bnb.fit(X, y)

# Evaluate the performance of the classifier using 10-fold cross-validation
scores = cross_val_score(bnb, X, y, cv=10)

# Report the performance metrics
print("Bernoulli Naive Bayes Classifier:")
print("Accuracy:", scores.mean())
print("Precision:", precision_score(y, bnb.predict(X)))
print("Recall:", recall_score(y, bnb.predict(X)))
print("F1 Score:", f1_score(y, bnb.predict(X)))


Bernoulli Naive Bayes Classifier:
Accuracy: 0.8839380364047911
Precision: 0.8860911270983214
Recall: 0.815223386651958
F1 Score: 0.8491812697500718
