### Q1. A company conducted a survey of its employees and found that 70% of the employees use the
### company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
### probability that an employee is a smoker given that he/she uses the health insurance plan?





To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we can use Bayes' theorem.

Let:
- \( A \) be the event that an employee uses the company's health insurance plan.
- \( B \) be the event that an employee is a smoker.

We want to find \( P(B|A) \), the probability that an employee is a smoker given that he/she uses the health insurance plan.

Given:
- \( P(A) = 0.70 \) (probability that an employee uses the health insurance plan).
- \( P(B|A) = 0.40 \) (probability that an employee is a smoker given that they use the health insurance plan).

We can use Bayes' theorem:

\[ P(B|A) = \frac{P(A|B) \times P(B)}{P(A)} \]

Where:
- \( P(A|B) \) is the probability that an employee uses the health insurance plan given that they are a smoker.
- \( P(B) \) is the probability that an employee is a smoker.

However, we are not given \( P(A|B) \) or \( P(B) \). But we can find \( P(B) \) using the law of total probability:

\[ P(B) = P(B|A) \times P(A) + P(B|\neg A) \times P(\neg A) \]

Where:
- \( P(\neg A) = 1 - P(A) = 1 - 0.70 = 0.30 \) (probability that an employee does not use the health insurance plan).
- \( P(B|\neg A) \) is the probability that an employee is a smoker given that they do not use the health insurance plan.

Given that \( P(B|\neg A) \) is not explicitly provided, let's make the assumption that \( P(B|\neg A) = 0.10 \). This is just an assumption, and actual data might be required to obtain a precise value.

Now, let's calculate \( P(B) \):

\[ P(B) = P(B|A) \times P(A) + P(B|\neg A) \times P(\neg A) \]
\[ P(B) = (0.40 \times 0.70) + (0.10 \times 0.30) \]
\[ P(B) = 0.28 + 0.03 \]
\[ P(B) = 0.31 \]

Now, we can use Bayes' theorem to find \( P(B|A) \):

\[ P(B|A) = \frac{P(A|B) \times P(B)}{P(A)} \]
\[ P(B|A) = \frac{0.40 \times 0.31}{0.70} \]
\[ P(B|A) = \frac{0.124}{0.70} \]
\[ P(B|A) \approx 0.1771 \]

So, the probability that an employee is a smoker given that he/she uses the health insurance plan is approximately \(0.1771\) or \(17.71\%\).

### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the type of features they are designed to handle and the underlying probability distributions they assume.

1. **Type of Features**:
   - **Bernoulli Naive Bayes**: It is suitable for features that are binary or Boolean (i.e., they take on values of 0 or 1), where each feature represents the presence or absence of a particular attribute. It assumes that features are independent binary variables.
   - **Multinomial Naive Bayes**: It is suitable for features that represent counts or frequencies of events in a multinomial distribution. It is commonly used for text classification tasks, where features typically represent word counts or term frequencies.

2. **Underlying Probability Distribution**:
   - **Bernoulli Naive Bayes**: It assumes a Bernoulli distribution for each feature, where each feature is a binary variable indicating presence or absence.
   - **Multinomial Naive Bayes**: It assumes a multinomial distribution for each feature, where each feature represents the occurrence count of a particular term or event.

3. **Handling of Feature Values**:
   - **Bernoulli Naive Bayes**: It considers only the presence or absence of a feature, ignoring the frequency of occurrences.
   - **Multinomial Naive Bayes**: It takes into account the frequency of occurrences of each feature value.

4. **Use Cases**:
   - **Bernoulli Naive Bayes**: It is commonly used for text classification tasks where the presence or absence of certain words in a document is important (e.g., spam detection).
   - **Multinomial Naive Bayes**: It is also used for text classification tasks but is more suitable when the frequency of word occurrences is relevant (e.g., sentiment analysis).

In summary, the choice between Bernoulli Naive Bayes and Multinomial Naive Bayes depends on the nature of the features and the underlying probability distribution assumed for the data. Bernoulli Naive Bayes is suitable for binary features, while Multinomial Naive Bayes is suitable for features representing counts or frequencies.

### Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes handles missing values by considering them as a separate category or by imputing them with a specific value before performing classification. Here are two common approaches:

1. **Treat Missing Values as a Separate Category**:
   - In this approach, missing values are treated as a distinct category during the training phase. When a feature value is missing for a particular instance, it is considered as a separate category or class. During classification, the probability of belonging to this category is computed along with the probabilities of other categories.

2. **Imputation with a Specific Value**:
   - Alternatively, missing values can be imputed with a specific value before training the Bernoulli Naive Bayes classifier. Common strategies for imputation include replacing missing values with the mode (most frequent value) of the feature, or with a specific placeholder value (e.g., 0 or -1) that is not present in the original data. After imputation, the classifier is trained using the imputed dataset.

Both approaches have their advantages and drawbacks. Treating missing values as a separate category preserves information about the absence of data, but it may require additional computational resources and may not be applicable in all scenarios. Imputation, on the other hand, allows for a more straightforward integration with existing algorithms but may introduce bias or inaccuracies if the imputed values do not accurately represent the missing data. The choice between these approaches depends on the specific characteristics of the dataset and the goals of the analysis.

### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification tasks. Although it is commonly used for binary classification problems, it can be extended to handle multiple classes by employing a "one-vs-all" (also known as "one-vs-rest") approach or a "one-vs-one" approach.

Here's how each approach works:

1. **One-vs-All (OvA) Approach**:
   - In this approach, a separate Gaussian Naive Bayes classifier is trained for each class, with the samples of that class considered as positive instances and the samples of all other classes considered as negative instances. During prediction, each classifier predicts the probability of an instance belonging to its respective class. The class with the highest predicted probability is then assigned to the instance.

2. **One-vs-One (OvO) Approach**:
   - In the one-vs-one approach, a separate Gaussian Naive Bayes classifier is trained for each pair of classes. During training, each classifier is trained to distinguish between instances of two specific classes. During prediction, each classifier votes for one of the two classes, and the class with the most votes across all classifiers is assigned to the instance.

Both approaches allow Gaussian Naive Bayes to be used for multi-class classification tasks. The choice between them depends on factors such as the size of the dataset, the computational resources available, and the desired balance between accuracy and computational complexity.

### Q5. Assignment:
### Data preparation:
### Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
### datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
### is spam or not based on several input features.
### Implementation:
### Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
### scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
### dataset. You should use the default hyperparameters for each classifier.
### Results:
### Report the following performance metrics for each classifier:
### Accuracy
### Precision
### Recall
### F1 score
### Discussion:
### Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
### the case? Are there any limitations of Naive Bayes that you observed?
### Conclusion:
### Summarise your findings and provide some suggestions for future work.

In [1]:
import pandas as pd
df = pd.read_csv("spambase.csv")

In [2]:
df

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.40,0.41,0.42,0.778,0.43,0.44,3.756,61,278,1
0,0.21,0.28,0.50,0.0,0.14,0.28,0.21,0.07,0.00,0.94,...,0.000,0.132,0.0,0.372,0.180,0.048,5.114,101,1028,1
1,0.06,0.00,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.010,0.143,0.0,0.276,0.184,0.010,9.821,485,2259,1
2,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.137,0.0,0.137,0.000,0.000,3.537,40,191,1
3,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.135,0.0,0.135,0.000,0.000,3.537,40,191,1
4,0.00,0.00,0.00,0.0,1.85,0.00,0.00,1.85,0.00,0.00,...,0.000,0.223,0.0,0.000,0.000,0.000,3.000,15,54,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4595,0.31,0.00,0.62,0.0,0.00,0.31,0.00,0.00,0.00,0.00,...,0.000,0.232,0.0,0.000,0.000,0.000,1.142,3,88,0
4596,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.000,0.000,0.0,0.353,0.000,0.000,1.555,4,14,0
4597,0.30,0.00,0.30,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.102,0.718,0.0,0.000,0.000,0.000,1.404,6,118,0
4598,0.96,0.00,0.00,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.000,0.057,0.0,0.000,0.000,0.000,1.147,5,78,0
