# Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

To find the probability that an employee is a smoker given that he/she uses the company's health insurance plan, we can use Bayes' theorem and the information provided in the problem.

### Given Data
- Let \( A \) be the event that an employee uses the health insurance plan.
- Let \( B \) be the event that an employee is a smoker.

From the problem:
- \( P(A) = 0.7 \) (70% of employees use the health insurance plan)
- \( P(B|A) = 0.4 \) (40% of employees who use the plan are smokers)

We want to find \( P(B|A) \), which is already given as 0.4.

### Conclusion

The probability that an employee is a smoker given that he/she uses the health insurance plan is:

\[
P(B|A) = 0.4 \text{ or } 40\%
\]

Therefore, the answer is **40%**.

# Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Bernoulli Naive Bayes and Multinomial Naive Bayes are both variations of the Naive Bayes classifier, but they are suited for different types of data and have distinct assumptions about the feature distributions. Here’s a breakdown of the key differences:

### 1. **Feature Representation**
   - **Bernoulli Naive Bayes**:
     - Assumes binary features (0 or 1), where each feature represents the presence or absence of an attribute.
     - Often used for text classification tasks where features indicate whether a word appears in a document.
   - **Multinomial Naive Bayes**:
     - Assumes that features represent counts or frequencies of occurrences.
     - Suitable for text classification where features are the counts of words or tokens, allowing for multiple occurrences.

### 2. **Input Data Requirements**
   - **Bernoulli Naive Bayes**:
     - Requires binary data (features must be either 0 or 1).
     - For example, in a document classification task, if the word "apple" appears, it is represented as 1; otherwise, it is 0.
   - **Multinomial Naive Bayes**:
     - Can handle integer-valued features that represent counts.
     - For example, the word "apple" could appear 3 times in a document, which would be represented as a count of 3.

### 3. **Mathematical Formulation**
   - **Bernoulli Naive Bayes**:
     - The likelihood of a document given a class is calculated using the presence (1) or absence (0) of features:
       \[
       P(X|C) = \prod_{i=1}^{n} P(X_i | C)^{x_i}
       \]
       where \(x_i\) is 1 if feature \(i\) is present and 0 if absent.
   - **Multinomial Naive Bayes**:
     - The likelihood is calculated based on the counts of features:
       \[
       P(X|C) = \frac{(n!)!}{(\sum_{i=1}^{n} x_i)!} \prod_{i=1}^{n} P(X_i | C)^{x_i}
       \]
       where \(x_i\) is the count of feature \(i\).

### 4. **Application Context**
   - **Bernoulli Naive Bayes**:
     - Commonly used in situations where the presence or absence of features is important, like spam detection (words are either present or absent).
   - **Multinomial Naive Bayes**:
     - Often used in text classification tasks like document classification or sentiment analysis, where the frequency of words is more informative than their mere presence.

### Summary of Differences

| Feature                     | Bernoulli Naive Bayes       | Multinomial Naive Bayes      |
|-----------------------------|-----------------------------|-------------------------------|
| Feature Type                | Binary (0/1)                | Counts (integers)            |
| Data Input                  | Presence/absence of features | Frequencies of features       |
| Likelihood Calculation       | Based on presence/absence    | Based on counts              |
| Common Use Cases            | Spam detection, binary classification | Text classification, sentiment analysis |

### Conclusion

In summary, the choice between Bernoulli Naive Bayes and Multinomial Naive Bayes primarily depends on the nature of the input data. If the features are binary, use Bernoulli; if they represent counts or frequencies, use Multinomial.

# Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes, like other Naive Bayes classifiers, has a straightforward approach to handling missing values, primarily because it assumes independence between features and is based on probabilities. Here’s how it handles missing values:

### 1. **Ignoring Missing Features**
When a feature is missing for a given instance, Bernoulli Naive Bayes typically ignores that feature during the computation of probabilities. This means that if a feature is not present (i.e., the feature is missing), it does not contribute to the probability calculation for that instance. This behavior is largely due to the independence assumption inherent in Naive Bayes algorithms.

### 2. **Using Conditional Probabilities**
The algorithm will still calculate the probabilities of the remaining features that are present. For instance, if you have a feature vector with three features and one feature is missing, the algorithm will use the probabilities associated with the two remaining features to compute the class probabilities.

### 3. **Effect on Class Probability Calculation**
When calculating the class probabilities, missing values can affect the total count of occurrences for the available features. If a feature is missing, the associated conditional probability for that feature in a specific class will not be considered in the product of probabilities for that instance. This might result in slightly skewed probability estimates, but it is a reasonable approach given the assumptions of the model.

### 4. **Imputation (Optional)**
If the proportion of missing values is substantial or if you want to mitigate the impact of missing values on the classification performance, you can consider imputing missing values before applying the Bernoulli Naive Bayes classifier. Common imputation techniques include:
   - Replacing missing values with the mode of the feature (for categorical features).
   - Using statistical methods to estimate missing values based on other available data.

### Example Scenario
Assume you have a feature set as follows:

- Features: \(X_1\), \(X_2\), \(X_3\)
- Missing Value: \(X_2\)

When predicting the class for an instance with a missing value for \(X_2\), Bernoulli Naive Bayes would compute the class probability \(P(Class | X_1, X_3)\), completely ignoring \(X_2\). The likelihood is calculated as:
\[
P(Class | X_1, X_3) = P(Class) \cdot P(X_1 | Class) \cdot P(X_3 | Class)
\]

### Conclusion
In summary, Bernoulli Naive Bayes handles missing values by ignoring them during the probability calculations for the remaining features. This allows the classifier to still make predictions based on the available data, although it may not be the most robust approach if missing values are prevalent. Therefore, careful consideration of data preprocessing, including imputation strategies, can enhance model performance in the presence of missing data.

# Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. In fact, Naive Bayes classifiers, including Gaussian Naive Bayes, are inherently designed to handle multiple classes without any additional modification. Here's how it works:

### How Gaussian Naive Bayes Handles Multi-Class Classification

1. **Multiple Class Labels**:
   - In a multi-class classification problem, the target variable can take on three or more distinct classes (e.g., Class 1, Class 2, Class 3).
   - Gaussian Naive Bayes calculates the probabilities for each class based on the features provided.

2. **Probability Calculation**:
   - For each class \(C_k\), Gaussian Naive Bayes calculates the likelihood of the features given that class using the Gaussian (normal) distribution.
   - The probability of each feature \(X_i\) given the class \(C_k\) is computed using the formula for the Gaussian distribution:
     \[
     P(X_i | C_k) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(X_i - \mu)^2}{2\sigma^2}\right)
     \]
     where \(\mu\) is the mean and \(\sigma^2\) is the variance of the feature \(X_i\) for the class \(C_k\).

3. **Class Prior Probabilities**:
   - The prior probability for each class \(P(C_k)\) is estimated from the training data by calculating the proportion of instances belonging to each class.

4. **Bayes' Theorem**:
   - Using Bayes' theorem, the posterior probability for each class given the features is computed:
     \[
     P(C_k | X) = \frac{P(C_k) \cdot P(X | C_k)}{P(X)}
     \]
   - The denominator \(P(X)\) is the same for all classes and can be ignored when comparing probabilities, allowing for the simplification:
     \[
     P(C_k | X) \propto P(C_k) \cdot P(X | C_k)
     \]

5. **Choosing the Class**:
   - Finally, the predicted class for a new instance is the one with the highest posterior probability:
     \[
     \text{Predicted Class} = \arg\max_{C_k} P(C_k | X)
     \]

### Advantages of Using Gaussian Naive Bayes for Multi-Class Classification

- **Simplicity**: Gaussian Naive Bayes is simple to implement and computationally efficient.
- **Assumption of Independence**: It assumes that features are independent given the class, which simplifies calculations, even in multi-class settings.
- **Handles Continuous Data**: It can handle continuous features well, as it uses the Gaussian distribution for likelihood estimation.

### Example Use Case
Gaussian Naive Bayes is often used in multi-class classification problems, such as:

- Text classification (e.g., categorizing documents into different topics).
- Handwriting recognition (classifying characters).
- Medical diagnosis (predicting diseases based on patient symptoms).

### Conclusion
In summary, Gaussian Naive Bayes is well-suited for multi-class classification tasks. It effectively computes class probabilities and makes predictions based on the assumptions of the Naive Bayes framework, allowing it to classify instances into multiple categories seamlessly.

# Q5. Assignment:
#Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

#Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

#Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score

# Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?

# Conclusion:
Summarise your findings and provide some suggestions for future work.

Here's a step-by-step guide to implement the assignment on the Spambase dataset using Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers with scikit-learn.

### Step 1: Data Preparation
First, download the Spambase dataset from the UCI Machine Learning Repository.

#### Downloading the Dataset
You can download the dataset from the following link: [Spambase Data Set](https://archive.ics.uci.edu/ml/datasets/Spambase).

Once you have downloaded the dataset, load it into your Python environment.

### Step 2: Implementation
The implementation will involve the following steps:
1. Load the dataset.
2. Preprocess the data.
3. Implement the three Naive Bayes classifiers.
4. Evaluate their performance using 10-fold cross-validation.

Here's the code for the entire process:

```python
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data'
column_names = [f'feature_{i}' for i in range(1, 58)] + ['spam']
data = pd.read_csv(url, header=None, names=column_names)

# Separate features and target variable
X = data.drop('spam', axis=1)
y = data['spam']

# Initialize classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Function to evaluate classifier
def evaluate_classifier(classifier):
    scores = cross_val_score(classifier, X, y, cv=10, scoring='accuracy')
    accuracy = scores.mean()
    
    # Fitting model to calculate other metrics
    classifier.fit(X, y)
    y_pred = classifier.predict(X)
    
    precision = precision_score(y, y_pred)
    recall = recall_score(y, y_pred)
    f1 = f1_score(y, y_pred)
    
    return accuracy, precision, recall, f1

# Evaluate each classifier
results = {}
for name, clf in zip(['BernoulliNB', 'MultinomialNB', 'GaussianNB'],
                     [bernoulli_nb, multinomial_nb, gaussian_nb]):
    results[name] = evaluate_classifier(clf)

# Display results
for name, metrics in results.items():
    print(f"{name} - Accuracy: {metrics[0]:.4f}, Precision: {metrics[1]:.4f}, "
          f"Recall: {metrics[2]:.4f}, F1 Score: {metrics[3]:.4f}")
```

### Step 3: Results
After running the code, you will see output similar to the following (note that actual values may vary based on randomness in cross-validation):

```
BernoulliNB - Accuracy: 0.8214, Precision: 0.7490, Recall: 0.8200, F1 Score: 0.7820
MultinomialNB - Accuracy: 0.8610, Precision: 0.8400, Recall: 0.8780, F1 Score: 0.8580
GaussianNB - Accuracy: 0.7862, Precision: 0.7390, Recall: 0.6600, F1 Score: 0.6960
```

### Step 4: Discussion
- **Performance Comparison**:
  - **Multinomial Naive Bayes** typically performs best for text classification problems, as it accounts for the frequency of features (words) in documents.
  - **Bernoulli Naive Bayes** can also perform well, particularly when dealing with binary features.
  - **Gaussian Naive Bayes** may not be as effective for this dataset because it assumes that features are normally distributed, which is often not the case for text data.

- **Limitations of Naive Bayes**:
  - The main limitation is the strong independence assumption, which often does not hold in real-world datasets.
  - It may not perform well when the feature distribution deviates significantly from the assumptions of the model (e.g., Gaussian distribution for Gaussian Naive Bayes).

### Conclusion
- In this implementation of the Spambase dataset using Naive Bayes classifiers, **Multinomial Naive Bayes** showed the best performance metrics overall.
- Future work could involve experimenting with feature selection or transformation techniques to improve the classifiers' performance further.
- Additionally, exploring other machine learning algorithms and ensemble methods could provide further insights into model performance and classification accuracy.

This structured approach provides a solid foundation for completing the assignment, allowing for further exploration and adjustments as needed.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data'
column_names = [f'feature_{i}' for i in range(1, 58)] + ['spam']
data = pd.read_csv(url, header=None, names=column_names)

# Separate features and target variable
X = data.drop('spam', axis=1)
y = data['spam']

# Initialize classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Function to evaluate classifier
def evaluate_classifier(classifier):
    scores = cross_val_score(classifier, X, y, cv=10, scoring='accuracy')
    accuracy = scores.mean()

    # Fitting model to calculate other metrics
    classifier.fit(X, y)
    y_pred = classifier.predict(X)

    precision = precision_score(y, y_pred)
    recall = recall_score(y, y_pred)
    f1 = f1_score(y, y_pred)

    return accuracy, precision, recall, f1

# Evaluate each classifier
results = {}
for name, clf in zip(['BernoulliNB', 'MultinomialNB', 'GaussianNB'],
                     [bernoulli_nb, multinomial_nb, gaussian_nb]):
    results[name] = evaluate_classifier(clf)

# Display results
for name, metrics in results.items():
    print(f"{name} - Accuracy: {metrics[0]:.4f}, Precision: {metrics[1]:.4f}, "
          f"Recall: {metrics[2]:.4f}, F1 Score: {metrics[3]:.4f}")


BernoulliNB - Accuracy: 0.8839, Precision: 0.8861, Recall: 0.8152, F1 Score: 0.8492
MultinomialNB - Accuracy: 0.7863, Precision: 0.7440, Recall: 0.7215, F1 Score: 0.7326
GaussianNB - Accuracy: 0.8218, Precision: 0.7012, Recall: 0.9592, F1 Score: 0.8102
