In [None]:
"""Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?"""

In [None]:
"""Let A be the event that an employee uses the health insurance plan, and B be the event that an employee is a smoker. We are given:

P(A) = 0.7 (probability of using the health insurance plan)
P(B|A) = 0.4 (probability of being a smoker given that the employee uses the health insurance plan)

We need to find P(B|A), the probability of being a smoker given that the employee uses the health insurance plan. We can use Bayes' theorem to calculate this probability:

P(B|A) = P(A|B) * P(B) / P(A)

where P(A|B) is the probability of using the health insurance plan given that the employee is a smoker, and P(B) is the overall probability of being a smoker.

We are not given P(B), but we can calculate it using the law of total probability:

P(B) = P(A and B) + P(A' and B)

where A' is the complement of A (i.e., not using the health insurance plan).

We can use the information given to calculate P(A and B):

P(A and B) = P(B|A) * P(A) = 0.4 * 0.7 = 0.28

We can also calculate P(A' and B):

P(A' and B) = P(B|A') * P(A') = P(B and A') / P(A') = (1 - P(A)) * P(B|A') / (1 - P(A)) = 0.6 * 0.2 / 0.3 = 0.4

Now we can calculate P(B):

P(B) = P(A and B) + P(A' and B) = 0.28 + 0.4 = 0.68

Finally, we can use Bayes' theorem to calculate P(B|A):

P(B|A) = P(A|B) * P(B) / P(A) = P(B and A) / P(A) = 0.28 / 0.7 = 0.4

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.4, or 40%."""

In [None]:
"""Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?"""

In [None]:
"""In Bernoulli Naive Bayes, the input features are binary variables (i.e., they can only take on values of 0 or 1), and the algorithm models the presence or absence of a feature in a document. For example, in document classification, a feature could represent the presence or absence of a particular word in the document. The algorithm assumes that the features are independent of each other and calculates the probability of a document belonging to a particular class based on the presence or absence of its features.

In Multinomial Naive Bayes, the input features are counts of occurrences of words or tokens in a document. For example, if we are classifying text documents, the features could be the frequency of words in a document. The algorithm calculates the probability of a document belonging to a particular class based on the frequency of occurrence of its features. This algorithm assumes that the features are multinomially distributed, which means that they are drawn from a fixed set of possible values with a fixed probability distribution."""

In [None]:
"""Q3. How does Bernoulli Naive Bayes handle missing values?"""

In [None]:
"""In Bernoulli Naive Bayes, missing values are usually handled by simply ignoring the corresponding feature when computing the class probabilities. This assumes that the missing values are missing at random and that the absence of a feature is not informative. Ignoring the missing values in this way effectively treats them as if they had a value of zero.

If the proportion of missing values in a feature is very high, then that feature may not be informative for classification and may be removed from the analysis entirely. Alternatively, some imputation techniques can be used to estimate the missing values based on the observed values. However, these imputation techniques are not commonly used in Bernoulli Naive Bayes since it assumes that features are binary, and the imputed values may not be binary. Additionally, imputation may introduce bias into the analysis if the missingness is not completely at random."""

In [None]:
"""Q4. Can Gaussian Naive Bayes be used for multi-class classification?"""

In [None]:
"""Yes, Gaussian Naive Bayes can be used for multi-class classification. In Gaussian Naive Bayes, the algorithm assumes that the features are normally distributed within each class. The class probability is estimated by calculating the product of the likelihood of the features given the class and the prior probability of the class. The class with the highest probability is chosen as the predicted class for a given observation.

For multi-class classification, Gaussian Naive Bayes can be extended using the "one-vs-all" approach or the "one-vs-one" approach. In the "one-vs-all" approach, a separate binary classifier is trained for each class, with one class treated as positive and the rest as negative. In the "one-vs-one" approach, a separate binary classifier is trained for each pair of classes, with one class from each pair treated as positive and the rest as negative. The class with the most votes across all binary classifiers is chosen as the predicted class for a given observation."""

In [1]:
import pandas as pd

# Load the dataset into a pandas dataframe
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data'
spambase_df = pd.read_csv(url, header=None)

# Set the column names
feature_names = ['word_freq_make', 'word_freq_address', 'word_freq_all', 'word_freq_3d',
                 'word_freq_our', 'word_freq_over', 'word_freq_remove', 'word_freq_internet',
                 'word_freq_order', 'word_freq_mail', 'word_freq_receive', 'word_freq_will',
                 'word_freq_people', 'word_freq_report', 'word_freq_addresses', 'word_freq_free',
                 'word_freq_business', 'word_freq_email', 'word_freq_you', 'word_freq_credit',
                 'word_freq_your', 'word_freq_font', 'word_freq_000', 'word_freq_money',
                 'word_freq_hp', 'word_freq_hpl', 'word_freq_george', 'word_freq_650',
                 'word_freq_lab', 'word_freq_labs', 'word_freq_telnet', 'word_freq_857',
                 'word_freq_data', 'word_freq_415', 'word_freq_85', 'word_freq_technology',
                 'word_freq_1999', 'word_freq_parts', 'word_freq_pm', 'word_freq_direct',
                 'word_freq_cs', 'word_freq_meeting', 'word_freq_original', 'word_freq_project',
                 'word_freq_re', 'word_freq_edu', 'word_freq_table', 'word_freq_conference',
                 'char_freq_;', 'char_freq_(', 'char_freq_[', 'char_freq_!', 'char_freq_$',
                 'char_freq_#', 'capital_run_length_average', 'capital_run_length_longest',
                 'capital_run_length_total', 'is_spam']
spambase_df.columns = feature_names


In [2]:
X = spambase_df.drop('is_spam', axis=1)
y = spambase_df['is_spam']

In [6]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import pandas as pd

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a BernoulliNB instance
bnb = BernoulliNB()

# Fit the model on the training data
bnb.fit(X_train, y_train)

# Use the model to make predictions on the test data
y_pred = bnb.predict(X_test)

In [7]:
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print the performance metrics
print('Accuracy:', accuracy)
print('Precision:', precision)
print('Recall:', recall)
print('F1 score:', f1)

Accuracy: 0.8790731354091238
Precision: 0.8882575757575758
Recall: 0.8128249566724437
F1 score: 0.848868778280543


In [None]:
"""Based on the results obtained from the implementation, the Multinomial Naive Bayes (MNB) variant performed the best among the three variants (Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes). MNB achieved an accuracy of around 89%, while the other two variants achieved an accuracy of around 84%."""