## Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?


To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we need to use Bayes' theorem:

P(smoker | uses plan) = P(uses plan | smoker) * P(smoker) / P(uses plan)

From the given information, we have:

P(uses plan) = 0.7
P(smoker | uses plan) = what we want to find
P(smoker) = 0.4
P(uses plan | smoker) = 1 (since all smokers use the plan)

To find P(uses plan), we can use the law of total probability:

P(uses plan) = P(uses plan | smoker) * P(smoker) + P(uses plan | non-smoker) * P(non-smoker)

We don't have the value of P(uses plan | non-smoker), but we know that it must be less than 1. Let's assume that 50% of non-smokers use the plan, so:

P(uses plan | non-smoker) = 0.5
P(non-smoker) = 0.6 (since P(smoker) + P(non-smoker) = 1)

Using these values, we can calculate P(uses plan):

P(uses plan) = 1 * 0.4 + 0.5 * 0.6 = 0.7

Now, we can use Bayes' theorem to find P(smoker | uses plan):

P(smoker | uses plan) = 1 * 0.4 / 0.7 = 0.57

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.57.

## Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?


Bernoulli Naive Bayes and Multinomial Naive Bayes are both variants of Naive Bayes that are used for text classification and other types of binary or multiclass classification problems. The main difference between them is the way they represent the input data:

Bernoulli Naive Bayes assumes that the input data is binary (i.e., each feature is either present or absent), and models the probability of each feature independently. It is commonly used for document classification, spam filtering, and other binary classification problems.
Multinomial Naive Bayes assumes that the input data is represented as counts (e.g., word frequencies in a document), and models the probability of each feature given each class using a multinomial distribution. It is commonly used for text classification, sentiment analysis, and other multiclass classification problems.

## Q3. How does Bernoulli Naive Bayes handle missing values?


Bernoulli Naive Bayes can handle missing values by treating them as a separate category of the feature. For example, if a document doesn't contain a certain word, we can represent it as a separate binary feature that is set to 1 if the word is absent and 0 if it is present. This way, the missing values don't affect the calculation of the probabilities, and we can still make predictions.

## Q4. Can Gaussian Naive Bayes be used for multi-class classification?

 Yes, Gaussian Naive Bayes can be used for multiclass classification. In this case, each class is modeled by a separate Gaussian distribution for each feature, and the prediction is made by choosing the class with the highest posterior probability.

## Q5. Assignment:
## Data preparation:
## Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.

## Implementation:
## Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each classifier.

## Results:
## Report the following performance metrics for each classifier:
- Accuracy
- Precision
- Recall
- F1 score

## Discussion:
## Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?

## Conclusion:
## Summarise your findings and provide some suggestions for future work.


To evaluate the performance of each classifier, you can use the scikit-learn library's cross_val_score() function with a 10-fold cross-validation. This function calculates and returns the accuracy score for each fold, which you can use to calculate the mean and standard deviation of the accuracy across all folds. Additionally, you can use the classification_report() function to calculate the precision, recall, and F1 score for each class (spam and not spam).

The results of your evaluation will help determine which variant of Naive Bayes performed the best. Generally, if the dataset has binary features, such as in the case of spam detection, Bernoulli Naive Bayes is a good choice. If the dataset has count-based features, such as word frequency, Multinomial Naive Bayes is the better option. Gaussian Naive Bayes is typically used for continuous data, and may not be the best choice for this particular dataset.

Limitations of Naive Bayes include its assumption of independence between features, which may not always hold true in real-world data. Additionally, Naive Bayes may not perform well when there are rare feature combinations in the data, or when there are overlapping features between classes.

Overall, it is important to consider the specific characteristics of the dataset and the problem at hand when choosing which variant of Naive Bayes to use. The results of your evaluation can help guide this decision and inform future work.