# Q1. A company conducted a survey of its employees and found that 70% of the employees use thecompany's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

In [22]:
# Given probabilities
P_A = 0.70  # Probability that an employee uses the health insurance plan
P_B_given_A = 0.40  # Probability that an employee is a smoker given that they use the health insurance plan

# We need to calculate P(B), the overall probability that an employee is a smoker
# However, we don't have P(B|¬A), so we'll assume a reasonable value for the sake of calculation
P_B_not_A = 0.10  # Assumed probability that an employee is a smoker given that they do not use the plan

# Calculate P(¬A), the probability that an employee does not use the health insurance plan
P_not_A = 1 - P_A

# Calculate P(B), the overall probability that an employee is a smoker
P_B = P_B_given_A * P_A + P_B_not_A * P_not_A

# Now we can calculate P(B|A) using Bayes' theorem
# However, since we are given P(B|A) and we are looking for P(B), we can simplify our calculation
P_B

0.31

In [23]:
# Calculate P(B|¬A) and P(¬A)
P_B_not_A = 0.10  # Probability that an employee is a smoker given that they do not use the health insurance plan
P_not_A = 1 - 0.70  # Probability that an employee does not use the health insurance plan

# Calculate P(B)
P_B = P_B_not_A * P_not_A + P_B_given_A * 0.70

# Calculate P(B|A)
P_B_given_A = (P_B_given_A * 0.70) / P_B
P_B_given_A

0.9032258064516128

# Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Ans=Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes algorithm used in machine learning, particularly for text classification. The key difference between them lies in the type of data they are designed to handle.

Bernoulli Naive Bayes:

Data Type: Bernoulli Naive Bayes is suitable for binary-valued features, where each feature is a binary variable (0 or 1). It is commonly used for text classification problems where the presence or absence of words in a document is considered.
Example Application: Document classification where the presence or absence of specific words is relevant (e.g., spam detection where words indicate spammy content).
Multinomial Naive Bayes:

Data Type: Multinomial Naive Bayes is designed for discrete data, specifically for data with a count of occurrences (integer frequencies). It is commonly used for text classification where the features represent word counts or term frequencies.
Example Application: Document classification where the frequency of words in a document is important (e.g., sentiment analysis using word counts).

# Q3. How does Bernoulli Naive Bayes handle missing values?

Ans=Bernoulli Naive Bayes handles missing values by assuming that the missing values are equivalent to the feature being absent. In the context of Bernoulli Naive Bayes, features are binary, representing the presence (1) or absence (0) of a particular attribute.

When dealing with missing values:

Assumption of Absence: If a feature's value is missing, Bernoulli Naive Bayes assumes that the corresponding attribute is not present, and the feature is effectively treated as if it has a value of 0.

Probability Calculation: When calculating probabilities during the classification process, the algorithm considers the probability of each feature being 0 (absent) or 1 (present). If a feature is missing, its contribution is treated as if it were 0.

Bayesian Inference: The classification decision is made using Bayes' theorem, considering the probabilities of observed and missing features for each class. The class with the highest probability is chosen as the predicted class.

# Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Ans=Yes, Gaussian Naive Bayes can be used for multi-class classification. The Gaussian Naive Bayes algorithm is an extension of the Naive Bayes algorithm that assumes that the features follow a Gaussian (normal) distribution. While it is commonly used for binary and two-class classification problems, it can also be adapted for multi-class classification scenarios.

In the case of multi-class classification, the algorithm can be extended to handle more than two classes by using the "one-vs-all" (OvA) or "one-vs-one" (OvO) strategy. Here's a brief explanation of these strategies:

One-vs-All (OvA): For 
�
K classes, 
�
K separate binary classifiers are trained. Each classifier is trained to distinguish between instances of one class and instances of all other classes. During prediction, the class that is assigned the highest probability by its respective classifier is chosen as the final predicted class.

One-vs-One (OvO): For 
�
K classes, 
�
(
�
−
1
)
2
2
K(K−1)
​
  binary classifiers are trained, each distinguishing between pairs of classes. During prediction, each classifier "votes" for a class, and the class with the most votes is chosen as the final predicted class.

Both OvA and OvO strategies can be applied to Gaussian Naive Bayes to handle multi-class classification problems. The choice between these strategies often depends on factors such as the size of the dataset, the computational cost of training multiple classifiers, and the inherent characteristics of the problem.

# Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.


Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score

Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?

In [2]:
import zipfile
import os

# Define the path to the zip file
zip_file_path = 'spambase.zip'

# Define the directory to extract the files to
extract_folder = 'spambase_data'

# Create a folder to extract the files to if it doesn't exist
if not os.path.exists(extract_folder):
    os.makedirs(extract_folder)

# Extract the contents of the zip file
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extract_folder)

# List the contents of the extraction folder
extracted_files = os.listdir(extract_folder)
extracted_files

['spambase.names', 'spambase.DOCUMENTATION', 'spambase.data']

In [3]:
import pandas as pd

# Load the data from the spambase.data file
file_path = 'spambase_data/spambase.data'

In [7]:
df = pd.read_csv(file_path, header=None)
df.head()
column_names = [
    'word_freq_make', 'word_freq_address', 'word_freq_all', 'word_freq_3d', 'word_freq_our',
    'word_freq_over', 'word_freq_remove', 'word_freq_internet', 'word_freq_order', 'word_freq_mail',
    'word_freq_receive', 'word_freq_will', 'word_freq_people', 'word_freq_report', 'word_freq_addresses',
    'word_freq_free', 'word_freq_business', 'word_freq_email', 'word_freq_you', 'word_freq_credit',
    'word_freq_your', 'word_freq_font', 'word_freq_000', 'word_freq_money', 'word_freq_hp',
    'word_freq_hpl', 'word_freq_george', 'word_freq_650', 'word_freq_lab', 'word_freq_labs',
    'word_freq_telnet', 'word_freq_857', 'word_freq_data', 'word_freq_415', 'word_freq_85',
    'word_freq_technology', 'word_freq_1999', 'word_freq_parts', 'word_freq_pm', 'word_freq_direct',
    'word_freq_cs', 'word_freq_meeting', 'word_freq_original', 'word_freq_project', 'word_freq_re',
    'word_freq_edu', 'word_freq_table', 'word_freq_conference', 'char_freq_;', 'char_freq_(', 'char_freq_[',
    'char_freq_!', 'char_freq_$', 'char_freq_#', 'capital_run_length_average', 'capital_run_length_longest',
    'capital_run_length_total', 'is_spam'
]

In [8]:
df = pd.read_csv(file_path, header=None, names=column_names)
df.head()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total,is_spam
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


The data has been successfully loaded with appropriate column names. The dataset contains features representing the frequency of certain words and characters in emails, as well as statistics on capital letters usage, with the last column indicating whether an email is spam (is_spam = 1) or not (is_spam = 0).

Next, I will implement the Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using scikit-learn and evaluate them using 10-fold cross-validation.

In [9]:
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.preprocessing import MinMaxScaler


In [10]:
# Separate features and target variable
X = df.drop('is_spam', axis=1)
y = df['is_spam']


In [11]:
# Scale the features to be between 0 and 1 for MultinomialNB
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)


In [12]:
# Initialize classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

In [13]:
# Perform 10-fold cross-validation for each classifier
bernoulli_scores = cross_val_score(bernoulli_nb, X, y, cv=10)
multinomial_scores = cross_val_score(multinomial_nb, X_scaled, y, cv=10)
gaussian_scores = cross_val_score(gaussian_nb, X, y, cv=10)


In [14]:
# Calculate the mean accuracy and standard deviation for each classifier
bernoulli_mean = bernoulli_scores.mean()
bernoulli_std = bernoulli_scores.std()
multinomial_mean = multinomial_scores.mean()
multinomial_std = multinomial_scores.std()
gaussian_mean = gaussian_scores.mean()
gaussian_std = gaussian_scores.std()

(bernoulli_mean, bernoulli_std, multinomial_mean, multinomial_std, gaussian_mean, gaussian_std)

(0.8839380364047911,
 0.046658190126965045,
 0.8748146750919551,
 0.03282069829990614,
 0.8217730830896915,
 0.07715751692895616)

The 10-fold cross-validation results for the three Naive Bayes classifiers are as follows:

Bernoulli Naive Bayes: Mean accuracy of 0.8839 with a standard deviation of 0.0467.

Multinomial Naive Bayes: Mean accuracy of 0.8748 with a standard deviation of 0.0328.

Gaussian Naive Bayes: Mean accuracy of 0.8218 with a standard deviation of 0.0772.

These results indicate that the Bernoulli Naive Bayes classifier performs the best on this dataset in terms of mean accuracy.

In [15]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [16]:
# Perform 10-fold cross-validation and obtain predictions for each fold
bernoulli_predictions = cross_val_predict(bernoulli_nb, X, y, cv=10)
multinomial_predictions = cross_val_predict(multinomial_nb, X_scaled, y, cv=10)
gaussian_predictions = cross_val_predict(gaussian_nb, X, y, cv=10)

In [17]:
# Calculate performance metrics for each classifier
bernoulli_accuracy = accuracy_score(y, bernoulli_predictions)
bernoulli_precision = precision_score(y, bernoulli_predictions)
bernoulli_recall = recall_score(y, bernoulli_predictions)
bernoulli_f1 = f1_score(y, bernoulli_predictions)


In [18]:
multinomial_accuracy = accuracy_score(y, multinomial_predictions)
multinomial_precision = precision_score(y, multinomial_predictions)
multinomial_recall = recall_score(y, multinomial_predictions)
multinomial_f1 = f1_score(y, multinomial_predictions)


In [19]:
gaussian_accuracy = accuracy_score(y, gaussian_predictions)
gaussian_precision = precision_score(y, gaussian_predictions)
gaussian_recall = recall_score(y, gaussian_predictions)
gaussian_f1 = f1_score(y, gaussian_predictions)

In [20]:
(bernoulli_accuracy, bernoulli_precision, bernoulli_recall, bernoulli_f1, multinomial_accuracy, multinomial_precision, multinomial_recall, multinomial_f1, gaussian_accuracy, gaussian_precision, gaussian_recall, gaussian_f1)

(0.8839382742881983,
 0.8813357185450209,
 0.815223386651958,
 0.8469914040114614,
 0.8748098239513149,
 0.8992898644286637,
 0.7683397683397684,
 0.8286734086853063,
 0.8217778743751358,
 0.7004440855874041,
 0.9569773855488142,
 0.8088578088578089)

The performance metrics for each classifier are as follows:

Bernoulli Naive Bayes:

Accuracy: 0.8839382742881983
Precision: 0.8813357185450209
Recall: 0.815223386651958
F1 score: 0.8469914040114614
Multinomial Naive Bayes:

Accuracy: 0.8748098239513149
Precision: 0.8992898644286637
Recall: 0.7683397683397684
F1 score: 0.8286734086853063
Gaussian Naive Bayes:

Accuracy: 0.8217778743751358
Precision: 0.7004440855874041
Recall: 0.9569773855488142
F1 score: 0.8088578088578089
These metrics provide a comprehensive view of each model's performance, considering both the positive and negative classes.



Conclusion:
The Bernoulli Naive Bayes classifier performed the best in terms of accuracy and F1 score, while the Multinomial Naive Bayes classifier had the highest precision. The performance of the Gaussian Naive Bayes classifier was lower compared to the other two classifiers.

The reason for the superior performance of the Bernoulli Naive Bayes classifier could be attributed to the nature of the dataset. The Bernoulli Naive Bayes classifier is well-suited for binary feature data, which aligns with the binary nature of the spam classification task. On the other hand, the Multinomial Naive Bayes classifier, which is designed for count-based features, also performed well due to the nature of the word frequency features in the dataset.

One limitation of Naive Bayes classifiers is the assumption of independence between features, which may not hold true in real-world data. Additionally, Naive Bayes classifiers may not perform well with highly correlated features, and they are sensitive to the presence of irrelevant or noisy features.

Overall, the results indicate that the choice of Naive Bayes classifier should be based on the nature of the features and the specific characteristics of the dataset. In future work, it would be beneficial to explore feature engineering techniques to improve the performance of the classifiers and to consider the use of ensemble methods to further enhance the predictive power.