
## 1

To find this probability, we use conditional probability.

Given:
-  P(A): Probability that an employee uses the health insurance plan = 70% = 0.7
- P(B|A): Probability that an employee is a smoker given that he/she uses the health insurance plan = 40% = 0.4

We need to find  P(B|A), which is already given as 40% or 0.4.

Thus, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.4 or 40%.

## 2

- **Bernoulli Naive Bayes**:
  - Suitable for binary/boolean features (0 or 1).
  - Each feature is modeled as a Bernoulli (binary) distribution.
  - Used for tasks where features represent the presence or absence of certain characteristics, such as text classification with binary term occurrence (word present or not).

- **Multinomial Naive Bayes**:
  - Suitable for discrete counts.
  - Each feature is modeled as a multinomial distribution.
  - Used for tasks where features represent counts or frequencies, such as text classification with term frequency (how many times a word appears).

## 3

Bernoulli Naive Bayes, like other Naive Bayes classifiers, does not inherently handle missing values. Missing values must be preprocessed before applying the classifier. Common strategies include:
- Imputation: Replace missing values with the mean, median, or mode of the feature.
- Removal: Remove records with missing values if the dataset is large enough.

## 4

Yes, Gaussian Naive Bayes can be used for multi-class classification. It models each feature with a Gaussian (normal) distribution and can handle multiple classes by applying the Naive Bayes rule to each class.



## 5

In [3]:

import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import StandardScaler

# Load the dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
data = pd.read_csv(url, header=None)

# Split the data into features and target
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Standardize features for GaussianNB
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Initialize classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# 10-fold cross-validation
kf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)

# Function to evaluate and print performance metrics
def evaluate_classifier(clf, X, y):
    accuracy = cross_val_score(clf, X, y, cv=kf, scoring='accuracy').mean()
    precision = cross_val_score(clf, X, y, cv=kf, scoring='precision').mean()
    recall = cross_val_score(clf, X, y, cv=kf, scoring='recall').mean()
    f1 = cross_val_score(clf, X, y, cv=kf, scoring='f1').mean()
    
    print(f'Accuracy: {accuracy:.4f}')
    print(f'Precision: {precision:.4f}')
    print(f'Recall: {recall:.4f}')
    print(f'F1 Score: {f1:.4f}')

print("Bernoulli Naive Bayes:")
evaluate_classifier(bernoulli_nb, X, y)

print("\nMultinomial Naive Bayes:")
evaluate_classifier(multinomial_nb, X, y)

print("\nGaussian Naive Bayes:")
evaluate_classifier(gaussian_nb, X_scaled, y)


Bernoulli Naive Bayes:
Accuracy: 0.8857
Precision: 0.8855
Recall: 0.8158
F1 Score: 0.8490

Multinomial Naive Bayes:
Accuracy: 0.7903
Precision: 0.7407
Recall: 0.7215
F1 Score: 0.7306

Gaussian Naive Bayes:
Accuracy: 0.8166
Precision: 0.6951
Recall: 0.9542
F1 Score: 0.8041


### Discussion:

1. **Results**:
   - The performance of each classifier is printed out in terms of Accuracy, Precision, Recall, and F1 Score.

2. **Which variant performed the best and why**:
   - Based on the specific dataset, one variant might perform better than others due to the nature of the data.
   - For instance, if the dataset contains binary features, Bernoulli Naive Bayes might perform better.
   - If the dataset contains term frequencies, Multinomial Naive Bayes might be more suitable.
   - Gaussian Naive Bayes is ideal for normally distributed features.

3. **Limitations of Naive Bayes**:
   - **Independence Assumption**: Assumes that features are independent, which is rarely true in real-world data.
   - **Zero Probability**: If a feature value was not present in the training data, the model assigns zero probability. Laplace smoothing is often used to mitigate this.
   - **Sensitivity to irrelevant features**: The model can be sensitive to irrelevant features, impacting performance.

### Conclusion:

1. **Summary**:
   - Different variants of Naive Bayes classifiers were evaluated on the Spambase dataset.
   - The performance metrics for each variant (Bernoulli, Multinomial, and Gaussian) were reported.

2. **Future Work**:
   - Further tuning of hyperparameters for each Naive Bayes variant.
   - Feature engineering to improve the quality of input features.
   - Consideration of other models that can capture dependencies between features (e.g., Random Forest, SVM).
