Q1. Probability Calculation using Bayes' Theorem
Given:

70% of employees use the company's health insurance plan: 
𝑃
(
𝐻
)
=
0.70
P(H)=0.70
40% of the employees who use the plan are smokers: 
𝑃
(
𝑆
∣
𝐻
)
=
0.40
P(S∣H)=0.40
We need to find the probability that an employee is a smoker given that they use the health insurance plan, which is 
𝑃
(
𝑆
∣
𝐻
)
P(S∣H).

Since 
𝑃
(
𝑆
∣
𝐻
)
P(S∣H) is already given as 0.40, there is no further calculation needed.

𝑃
(
𝑆
∣
𝐻
)
=
0.40
P(S∣H)=0.40





Q2. Difference between Bernoulli Naive Bayes and Multinomial Naive Bayes
Bernoulli Naive Bayes: Assumes that all features are binary (0s and 1s). It is particularly suited for binary/boolean features (e.g., word occurrence in text classification).

Multinomial Naive Bayes: Used for discrete count features (e.g., word counts in text classification). It assumes that features follow a multinomial distribution and is particularly suited for cases where feature vectors represent counts.



Q3. Handling Missing Values in Bernoulli Naive Bayes
Bernoulli Naive Bayes does not handle missing values inherently. If there are missing values, they need to be imputed or dealt with before fitting the model. Common strategies for handling missing values include:

Imputation with the mean, median, or mode.
Removal of records with missing values.
Use of algorithms that can handle missing values directly.




Q4. Can Gaussian Naive Bayes be used for Multi-class Classification?
Yes, Gaussian Naive Bayes can be used for multi-class classification. It extends naturally to multi-class problems by applying the same principles to each class independently.

In [2]:
#Q5
import pandas as pd
from sklearn.model_selection import cross_val_score, train_test_split, StratifiedKFold
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load the dataset
data = pd.read_csv('spambase.data', header=None)

# Separate features and labels
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Define classifiers
classifiers = {
    'BernoulliNB': BernoulliNB(),
    'MultinomialNB': MultinomialNB(),
    'GaussianNB': GaussianNB()
}

# Define performance metrics
metrics = {
    'Accuracy': accuracy_score,
    'Precision': precision_score,
    'Recall': recall_score,
    'F1 Score': f1_score
}

# 10-fold cross-validation
kf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
results = {name: {metric: [] for metric in metrics} for name in classifiers}

for name, clf in classifiers.items():
    for train_index, test_index in kf.split(X, y):
        X_train, X_test = X.iloc[train_index], X.iloc[test_index]
        y_train, y_test = y.iloc[train_index], y.iloc[test_index]
        clf.fit(X_train, y_train)
        y_pred = clf.predict(X_test)
        
        for metric, func in metrics.items():
            if metric == 'Accuracy':
                results[name][metric].append(func(y_test, y_pred))
            else:
                results[name][metric].append(func(y_test, y_pred, average='binary'))

# Calculate mean and standard deviation of each metric for each classifier
summary = {name: {metric: (np.mean(scores), np.std(scores)) for metric, scores in scores.items()} for name, scores in results.items()}

# Print the results
for name, metrics in summary.items():
    print(f"Results for {name}:")
    for metric, (mean, std) in metrics.items():
        print(f"{metric}: {mean:.4f} ± {std:.4f}")
    print()


NameError: name 'np' is not defined