In [None]:
Q1. Probability of being a Smoker given the Use of Health Insurance Plan:

Given:

    Probability of using the health insurance plan, P(Use of Plan)=0.7P(Use of Plan)=0.7
    Probability of being a smoker given the use of the plan, P(Smoker | Use of Plan)=0.4P(Smoker | Use of Plan)=0.4

To find the probability of being a smoker given the use of the health insurance plan (P(Smoker | Use of Plan)P(Smoker | Use of Plan)), we can use Bayes' theorem:

P(Smoker | Use of Plan)=P(Use of Plan | Smoker)⋅P(Smoker)P(Use of Plan)P(Smoker | Use of Plan)=P(Use of Plan)P(Use of Plan | Smoker)⋅P(Smoker)​

P(Smoker | Use of Plan)=P(Use of Plan | Smoker)⋅P(Smoker)P(Use of Plan | Smoker)⋅P(Smoker)+P(Use of Plan | Non-Smoker)⋅P(Non-Smoker)P(Smoker | Use of Plan)=P(Use of Plan | Smoker)⋅P(Smoker)+P(Use of Plan | Non-Smoker)⋅P(Non-Smoker)P(Use of Plan | Smoker)⋅P(Smoker)​

P(Smoker | Use of Plan)=0.4⋅P(Smoker)0.4⋅P(Smoker)+P(Use of Plan | Non-Smoker)⋅(1−P(Smoker))P(Smoker | Use of Plan)=0.4⋅P(Smoker)+P(Use of Plan | Non-Smoker)⋅(1−P(Smoker))0.4⋅P(Smoker)​

Since the survey data doesn't provide information about P(Use of Plan | Non-Smoker)P(Use of Plan | Non-Smoker), we can't compute the exact numerical value without additional information.

Q2. Difference between Bernoulli Naive Bayes and Multinomial Naive Bayes:

    Bernoulli Naive Bayes:
        Used for binary data (e.g., presence or absence of a feature).
        Assumes that features are binary variables.
        Suitable for document classification tasks, where each term is either present or absent.

    Multinomial Naive Bayes:
        Used for discrete data (e.g., word counts in text classification).
        Assumes that features represent counts of occurrences.
        Commonly used in text classification, where features are the frequency of words.

Q3. Handling Missing Values in Bernoulli Naive Bayes:

    In Bernoulli Naive Bayes, missing values can be handled by treating them as if they are absent.
    For example, if a feature is binary (present/absent) and a value is missing, it can be treated as if the feature is absent.
    The model assumes that the absence of evidence in the presence of other evidence is informative.

Q4. Use of Gaussian Naive Bayes for Multi-Class Classification:

    Gaussian Naive Bayes is primarily designed for continuous features and is suitable for binary and multi-class classification problems.
    It can be used for multi-class classification by extending the binary classification approach to handle multiple classes.
    Each class is modeled with a Gaussian distribution for each feature.
    The class with the highest posterior probability is predicted.

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.preprocessing import Binarizer

# Load the Spambase dataset
file_path = "spambase.data"
column_names = [f'X{i}' for i in range(1, 58)] + ['is_spam']
spambase_data = pd.read_csv(file_path, header=None, names=column_names)

# Separate features (X) and target variable (y)
X = spambase_data.iloc[:, :-1].values
y = spambase_data['is_spam'].values

# Binarize the data for Bernoulli Naive Bayes
binarizer = Binarizer()
X_binary = binarizer.fit_transform(X)

# Create classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Evaluate classifiers using 10-fold cross-validation
def evaluate_classifier(classifier, X, y):
    accuracy = np.mean(cross_val_score(classifier, X, y, cv=10, scoring='accuracy'))
    precision = np.mean(cross_val_score(classifier, X, y, cv=10, scoring='precision'))
    recall = np.mean(cross_val_score(classifier, X, y, cv=10, scoring='recall'))
    f1 = np.mean(cross_val_score(classifier, X, y, cv=10, scoring='f1'))
    return accuracy, precision, recall, f1

# Evaluate classifiers
accuracy_bernoulli, precision_bernoulli, recall_bernoulli, f1_bernoulli = evaluate_classifier(bernoulli_nb, X_binary, y)
accuracy_multinomial, precision_multinomial, recall_multinomial, f1_multinomial = evaluate_classifier(multinomial_nb, X, y)
accuracy_gaussian, precision_gaussian, recall_gaussian, f1_gaussian = evaluate_classifier(gaussian_nb, X, y)

# Report results
print("Results for Bernoulli Naive Bayes:")
print("Accuracy:", accuracy_bernoulli)
print("Precision:", precision_bernoulli)
print("Recall:", recall_bernoulli)
print("F1 Score:", f1_bernoulli)
print()

print("Results for Multinomial Naive Bayes:")
print("Accuracy:", accuracy_multinomial)
print("Precision:", precision_multinomial)
print("Recall:", recall_multinomial)
print("F1 Score:", f1_multinomial)
print()

print("Results for Gaussian Naive Bayes:")
print("Accuracy:", accuracy_gaussian)
print("Precision:", precision_gaussian)
print("Recall:", recall_gaussian)
print("F1 Score:", f1_gaussian)


Results for Bernoulli Naive Bayes:
Accuracy: 0.8839380364047911
Precision: 0.8869617393737383
Recall: 0.8152389047416673
F1 Score: 0.8481249015095276

Results for Multinomial Naive Bayes:
Accuracy: 0.7863496180326323
Precision: 0.7393175533565436
Recall: 0.7214983911116508
F1 Score: 0.7282909724016348

Results for Gaussian Naive Bayes:
Accuracy: 0.8217730830896915
Precision: 0.7103733928118492
Recall: 0.9569516119239877
F1 Score: 0.8130660909542995
