1.To find the probability that an employee is a smoker given that they use the company's health insurance plan, we can use the information provided and the concept of conditional probability.

Let:

𝐴
A be the event that an employee is a smoker.
𝐵
B be the event that an employee uses the company's health insurance plan.
We are given:

𝑃
(
𝐵
)
=
0.70
P(B)=0.70 (70% of the employees use the health insurance plan)
𝑃
(
𝐴
∣
𝐵
)
=
0.40
P(A∣B)=0.40 (40% of the employees who use the plan are smokers)
We need to find 
𝑃
(
𝐴
∣
𝐵
)
P(A∣B), which is already given as 0.40. Therefore, the probability that an employee is a smoker given that they use the health insurance plan is:

𝑃
(
𝐴
∣
𝐵
)
=
0.40
P(A∣B)=0.40

2.Bernoulli Naive Bayes:

Assumes binary feature vectors (0/1 values) indicating the presence or absence of a feature.
Suitable for text classification tasks where the feature vector represents binary occurrence (e.g., whether a word appears in a document or not).
Models the distribution of the binary features directly.
Multinomial Naive Bayes:

Assumes feature vectors represent counts or frequencies of events (e.g., word counts in a document).
Suitable for text classification tasks where the feature vector represents term frequency (e.g., the number of times a word appears in a document).
Models the distribution of the counts using the multinomial distribution.

3.In Bernoulli Naive Bayes, if a feature (word or term) is missing in a given document, it is typically treated as if the term is absent (0). The classifier does not explicitly handle missing values; instead, it assumes that if a feature is not present, its value is 0. This simplifies the handling of missing features as no additional imputation or special treatment is needed.

4.Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes models the distribution of continuous features using a Gaussian (normal) distribution and can be applied to classify instances into multiple classes. In multi-class classification, the algorithm calculates the posterior probability for each class and assigns the instance to the class with the highest posterior probability. The steps involve:

Calculating the prior probability for each class.
Calculating the likelihood of the features given each class using the Gaussian distribution.
Using Bayes' theorem to compute the posterior probabilities for each class.
Assigning the class with the highest posterior probability to the instance.
Thus, Gaussian Naive Bayes is well-suited for problems where the features are continuous and the goal is to classify instances into more than two classes

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB

# Load the dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data'
column_names = [f'feature_{i}' for i in range(1, 58)] + ['label']
data = pd.read_csv(url, header=None, names=column_names)

# Split the data into features and target
X = data.iloc[:, :-1]
y = data['label']
from sklearn.model_selection import KFold

# Define a function to evaluate performance using cross-validation
def evaluate_model(model, X, y):
    kf = KFold(n_splits=10, shuffle=True, random_state=42)
    y_pred = cross_val_predict(model, X, y, cv=kf)
    accuracy = accuracy_score(y, y_pred)
    precision = precision_score(y, y_pred)
    recall = recall_score(y, y_pred)
    f1 = f1_score(y, y_pred)
    return accuracy, precision, recall, f1

# Initialize models
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Evaluate models
bernoulli_results = evaluate_model(bernoulli_nb, X, y)
multinomial_results = evaluate_model(multinomial_nb, X, y)
gaussian_results = evaluate_model(gaussian_nb, X, y)

# Print results
print(f'BernoulliNB - Accuracy: {bernoulli_results[0]}, Precision: {bernoulli_results[1]}, Recall: {bernoulli_results[2]}, F1 Score: {bernoulli_results[3]}')
print(f'MultinomialNB - Accuracy: {multinomial_results[0]}, Precision: {multinomial_results[1]}, Recall: {multinomial_results[2]}, F1 Score: {multinomial_results[3]}')
print(f'GaussianNB - Accuracy: {gaussian_results[0]}, Precision: {gaussian_results[1]}, Recall: {gaussian_results[2]}, F1 Score: {gaussian_results[3]}')
from sklearn.model_selection import KFold

# Define a function to evaluate performance using cross-validation
def evaluate_model(model, X, y):
    kf = KFold(n_splits=10, shuffle=True, random_state=42)
    y_pred = cross_val_predict(model, X, y, cv=kf)
    accuracy = accuracy_score(y, y_pred)
    precision = precision_score(y, y_pred)
    recall = recall_score(y, y_pred)
    f1 = f1_score(y, y_pred)
    return accuracy, precision, recall, f1

# Initialize models
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Evaluate models
bernoulli_results = evaluate_model(bernoulli_nb, X, y)
multinomial_results = evaluate_model(multinomial_nb, X, y)
gaussian_results = evaluate_model(gaussian_nb, X, y)

# Print results
print(f'BernoulliNB - Accuracy: {bernoulli_results[0]}, Precision: {bernoulli_results[1]}, Recall: {bernoulli_results[2]}, F1 Score: {bernoulli_results[3]}')
print(f'MultinomialNB - Accuracy: {multinomial_results[0]}, Precision: {multinomial_results[1]}, Recall: {multinomial_results[2]}, F1 Score: {multinomial_results[3]}')
print(f'GaussianNB - Accuracy: {gaussian_results[0]}, Precision: {gaussian_results[1]}, Recall: {gaussian_results[2]}, F1 Score: {gaussian_results[3]}')
import numpy as np

# Create a summary DataFrame
results = pd.DataFrame({
    'Classifier': ['BernoulliNB', 'MultinomialNB', 'GaussianNB'],
    'Accuracy': [bernoulli_results[0], multinomial_results[0], gaussian_results[0]],
    'Precision': [bernoulli_results[1], multinomial_results[1], gaussian_results[1]],
    'Recall': [bernoulli_results[2], multinomial_results[2], gaussian_results[2]],
    'F1 Score': [bernoulli_results[3], multinomial_results[3], gaussian_results[3]]
})

print(results)
