## Q1. 
### A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

The probability that an employee is a smoker given that he/she uses the health insurance plan can be calculated using the conditional probability formula:

\[ P(\text{Smoker} | \text{Uses Health Insurance}) = \frac{P(\text{Smoker and Uses Health Insurance})}{P(\text{Uses Health Insurance})} \]

From the information provided:
- The probability that an employee uses the health insurance plan is 70%, denoted as \( P(\text{Uses Health Insurance}) = 0.70 \).
- The probability that an employee uses the health insurance plan and is a smoker is 40%, denoted as \( P(\text{Smoker and Uses Health Insurance}) = 0.40 \).

Now, plug these values into the formula:

\[ P(\text{Smoker} | \text{Uses Health Insurance}) = \frac{0.40}{0.70} \]

Calculate the result:

\[ P(\text{Smoker} | \text{Uses Health Insurance}) \approx \frac{4}{7} \]

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is approximately \( \frac{4}{7} \).

## Q2. 
### What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

The key difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the type of data they are designed to handle and the assumptions they make about the distribution of the features.

**1. Data Type:**

- **Bernoulli Naive Bayes:**
  - Designed for binary or Boolean data.
  - Each feature is treated as a binary variable, indicating the presence or absence of a particular attribute.
  - Commonly used for document classification tasks where the focus is on whether certain words are present in a document (bag-of-words representation).
  
- **Multinomial Naive Bayes:**
  - Designed for discrete data, often used for text classification tasks.
  - Suitable for problems where features represent counts or frequencies (e.g., word counts in a document).

**2. Feature Representation:**

- **Bernoulli Naive Bayes:**
  - Features are represented as binary values (0 or 1) indicating the absence or presence of a particular attribute.
  - Assumes that the absence of a feature is as informative as its presence.

- **Multinomial Naive Bayes:**
  - Features are represented as counts or frequencies.
  - Well-suited for problems where the order of occurrences or the frequency of features is important, such as in natural language processing.

**3. Independence Assumption:**

- Both Bernoulli and Multinomial Naive Bayes assume feature independence given the class label. However, they differ in how they model and handle feature values.

In summary, the choice between Bernoulli Naive Bayes and Multinomial Naive Bayes depends on the nature of the data and the specific requirements of the classification problem. If the features are binary and the focus is on presence/absence, Bernoulli Naive Bayes may be more appropriate. If the features are counts or frequencies and the order of occurrences matters, Multinomial Naive Bayes is a better choice.

## Q3. 
### How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes handles missing values by treating them as a specific category or by incorporating the missing values into the feature representation. The approach depends on the implementation or the specific requirements of the problem. Here are two common ways to handle missing values in Bernoulli Naive Bayes:

1. **Treating Missing Values as a Separate Category:**
   - One approach is to consider missing values as a distinct category, treating them as a separate state of the binary feature. This means that for each feature with a missing value, a separate category is created, and the presence or absence of the feature is not considered. The model then estimates the probability of each class given the presence, absence, or missing status of the feature.

2. **Incorporating Missing Values as a Feature State:**
   - Another approach is to incorporate the missing values into the feature representation. Instead of treating missing values as a separate category, the missing status is considered as an additional state of the binary feature. In this case, the model estimates the probability of each class given the presence or absence (including missing) of the feature.

The choice between these approaches depends on the nature of the data and the specific requirements of the classification problem. It's essential to consider the impact of missing values on the model's performance and the assumptions made about the missing data.

In practice, the handling of missing values may also depend on the implementation of the Bernoulli Naive Bayes algorithm in a particular library or software package. Some implementations may provide options to handle missing values in different ways, allowing users to choose the approach that best fits their needs.

## Q4. 
### Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is an extension of the Naive Bayes algorithm that assumes continuous-valued features follow a Gaussian (normal) distribution within each class. While the original Naive Bayes algorithm is often used for binary and multi-class classification problems, Gaussian Naive Bayes specifically handles continuous data.

In the context of multi-class classification, Gaussian Naive Bayes can be extended straightforwardly. The idea is to estimate the parameters (mean and variance) of the Gaussian distribution for each feature within each class. When making predictions for a new instance, the algorithm calculates the likelihood of the observed feature values under each class's Gaussian distribution and combines this information with prior probabilities to determine the most likely class.

The general formula for Gaussian Naive Bayes in the context of multi-class classification is as follows:

\[ P(C_k | x) = \frac{P(x | C_k) \cdot P(C_k)}{P(x)} \]

Here:
- \( P(C_k | x) \) is the posterior probability of class \(C_k\) given the features \(x\).
- \( P(x | C_k) \) is the likelihood of the features \(x\) given class \(C_k\), and it is modeled using the Gaussian distribution parameters.
- \( P(C_k) \) is the prior probability of class \(C_k\).
- \( P(x) \) is the marginal probability of the features \(x\), and it serves as a normalizing constant.

In summary, Gaussian Naive Bayes can be applied to multi-class classification problems by extending the algorithm to estimate Gaussian distribution parameters for each feature within each class and using these parameters to make predictions for new instances.

## Q5. Assignment:

### Data preparation:

Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

#### Implementation:

Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

#### Results:

Report the following performance metrics for each classifier:

Accuracy
Precision
Recall
F1 score

**Discussion:**


Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?


**Conclusion:**

Summarise your findings and provide some suggestions for future work.

In [None]:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import KFold

# Load the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
column_names = [...]  # Provide the column names based on the dataset description
data = pd.read_csv(url, header=None, names=column_names)

# Split the data into features (X) and target variable (y)
X = data.drop('target_column', axis=1)  # Adjust 'target_column' based on your dataset
y = data['target_column']

# Initialize classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

classifiers = [bernoulli_nb, multinomial_nb, gaussian_nb]
classifiers_names = ['Bernoulli Naive Bayes', 'Multinomial Naive Bayes', 'Gaussian Naive Bayes']

# Evaluation metrics
metrics = ['accuracy', 'precision', 'recall', 'f1']

# Perform 10-fold cross-validation for each classifier
for classifier, name in zip(classifiers, classifiers_names):
    kf = KFold(n_splits=10, shuffle=True, random_state=42)

    for metric in metrics:
        scores = cross_val_score(classifier, X, y, cv=kf, scoring=metric)
        print(f'{name} - {metric}: {scores.mean()}')

# Discuss the results, limitations, and conclusion
# ...


### Completed_10th_April_Assignment:
## __________________________