#### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

In [None]:
Ans-

To solve this problem, we can use Bayes' theorem which relates conditional probabilities. Let's define the events:

A = an employee uses the company's health insurance plan
B = an employee is a smoker

We want to find the probability of an employee being a smoker given that he/she uses the health insurance plan. We can express this as P(B|A).

Using Bayes' theorem, we have:

P(B|A) = P(A|B) * P(B) / P(A)

where P(A|B) is the probability of an employee using the health insurance plan given that he/she is a smoker, 
P(B) is the overall probability of an employee being a smoker, and P(A) is the overall probability of an employee using the health insurance plan.

From the information given in the problem, we know that:

P(A) = 0.70
P(B|A) = 0.40
P(B) = ?

To find P(B), we need to use the law of total probability. We can express P(B) as:

P(B) = P(B|A) * P(A) + P(B|not A) * P(not A)

where P(B|not A) is the probability of an employee being a smoker given that he/she does not use the health insurance plan, and P(not A) is the overall probability of an employee not using the health insurance plan.
We don't have this information directly, so we need to make an assumption.

Assumption: Let's assume that the probability of an employee being a smoker is the same whether they use the health insurance plan or not.
This may not be a realistic assumption, but it's the best we can do with the information given.

With this assumption, we have:

P(B|not A) = P(B)
P(not A) = 1 - P(A) = 0.30

Substituting these values into the equation for P(B), we get:

P(B) = P(B|A) * P(A) + P(B) * P(not A)
P(B) = 0.40 * 0.70 + P(B) * 0.30
0.70 * P(B) = 0.28
P(B) = 0.4

Now we can substitute all the values into Bayes' theorem:

P(B|A) = P(A|B) * P(B) / P(A)
P(B|A) = 0.40 * 0.4 / 0.70
P(B|A) = 0.229

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.229 or approximately 22.9%.

#### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

In [None]:
Ans-

Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes algorithm used in machine learning for classification problems.

The main difference between them lies in the type of data they are designed to handle.

Bernoulli Naive Bayes is used when the input variables (features) are binary (i.e., have two possible values), such as presence or absence of a certain feature.
It assumes that each feature is independent of the others, and the probability of the output (class) given the features can be computed using the product of the conditional probabilities of each feature.

On the other hand, Multinomial Naive Bayes is used when the input variables are discrete counts (i.e., integer values), such as the frequency of occurrence of certain words in a document.
It also assumes that the features are independent, but instead of computing the product of conditional probabilities, it computes the joint probability of all the features given the class.

In summary, Bernoulli Naive Bayes is suitable for binary data, while Multinomial Naive Bayes is suitable for discrete count data.
However, both algorithms share the same underlying principle of using Bayes' theorem to calculate the probability of the class given the features, and both make the Naive Bayes assumption of feature independence.

#### Q3. How does Bernoulli Naive Bayes handle missing values?

In [None]:
Ans-

Bernoulli Naive Bayes assumes that each feature is binary (i.e., has two possible values) and independent of the other features.
If a feature is missing for a particular data point, it is typically represented as a third category (neither 0 nor 1) or simply omitted from the computation.

In the case of missing values, one common approach is to impute the missing values with a default value, such as the mode (most common value) of the feature across the available data points. 
Another approach is to use a more advanced imputation method, such as k-nearest neighbors or matrix completion, to estimate the missing values based on the available data.

However, imputing missing values can introduce bias into the model and may not always improve performance.
Another approach is to use a modified version of the Bernoulli Naive Bayes algorithm that can handle missing values directly, such as the "Augmented Bernoulli Naive Bayes" or "BN2" algorithm. 
This algorithm treats the missing values as an additional category and learns separate parameters for the presence and absence of each feature, as well as for the missing values.
This allows the model to handle missing values without introducing bias or distorting the probability estimates.

#### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

In [None]:
Ans-

Yes, Gaussian Naive Bayes can be used for multi-class classification.

In the case of multi-class classification, Gaussian Naive Bayes assumes that the features follow a Gaussian (normal) distribution for each class.
For each class, it estimates the mean and variance of the features and uses them to compute the probability of each class given the features.
The class with the highest probability is then chosen as the predicted class.

There are different approaches to handle multi-class classification with Gaussian Naive Bayes. 
One common approach is to use the "one-vs-all" (or "one-vs-rest") strategy, where the classifier trains one binary classifier for each class, treating all the other classes as a single "rest" class. 
Alternatively, a "one-vs-one" strategy can be used, where the classifier trains a binary classifier for each pair of classes and uses a voting scheme to determine the predicted class.

Overall, Gaussian Naive Bayes can be a useful algorithm for multi-class classification problems, particularly when the number of features is relatively small and they are not strongly correlated with each other.

In [None]:
Q5. Assignment:

Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score

Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?

Conclusion:
Summarise your findings and provide some suggestions for future work.

In [1]:
#Ans-

import pandas as pd
data=pd.read_csv('spambase.data',header=None)
data.rename(columns={57:'is_spam'}, inplace=True)

In [2]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,48,49,50,51,52,53,54,55,56,is_spam
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [9]:
#splitting data

x=data.iloc[:,:-1]
y=data.iloc[:,-1]

In [11]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,recall_score,precision_score,f1_score
from sklearn.naive_bayes import BernoulliNB,MultinomialNB,GaussianNB

In [12]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.33, random_state=42)

In [13]:
#BernouliNB

bnb=BernoulliNB()
bnb.fit(x_train,y_train)

In [15]:
bnb_preds=bnb.predict(x_test)

In [16]:
# Compute the evaluation metrics
bnb_accuracy = accuracy_score(y_test, bnb_preds)
bnb_precision = precision_score(y_test, bnb_preds)
bnb_recall = recall_score(y_test, bnb_preds)
bnb_f1 = f1_score(y_test, bnb_preds)

# Print the evaluation metrics
print("Bernoulli Naive Bayes Accuracy: ", bnb_accuracy)
print("Bernoulli Naive Bayes Precision: ", bnb_precision)
print("Bernoulli Naive Bayes Recall: ", bnb_recall)
print("Bernoulli Naive Bayes F1 Score: ", bnb_f1)

Bernoulli Naive Bayes Accuracy:  0.8821593153390388
Bernoulli Naive Bayes Precision:  0.8927335640138409
Bernoulli Naive Bayes Recall:  0.8151658767772512
Bernoulli Naive Bayes F1 Score:  0.8521882741535921


In [17]:
#MultinomialNB

mnb=MultinomialNB()
mnb.fit(x_train,y_train)

In [18]:
mnb_preds=mnb.predict(x_test)

In [19]:
# Compute the evaluation metrics
mnb_accuracy = accuracy_score(y_test, mnb_preds)
mnb_precision = precision_score(y_test, mnb_preds)
mnb_recall = recall_score(y_test, mnb_preds)
mnb_f1 = f1_score(y_test, mnb_preds)

# Print the evaluation metrics
print("Multinomial Naive Bayes Accuracy: ", mnb_accuracy)
print("Multinomial Naive Bayes Precision: ", mnb_precision)
print("Multinomial Naive Bayes Recall: ", mnb_recall)
print("Multinomial Naive Bayes F1 Score: ", mnb_f1)

Multinomial Naive Bayes Accuracy:  0.7853851217906518
Multinomial Naive Bayes Precision:  0.7688266199649737
Multinomial Naive Bayes Recall:  0.693522906793049
Multinomial Naive Bayes F1 Score:  0.7292358803986712


In [20]:
#Gaussian NB

gnb=GaussianNB()
gnb.fit(x_train,y_train)

In [21]:
gnb_preds=gnb.predict(x_test)

In [22]:
# Compute the evaluation metrics
gnb_accuracy = accuracy_score(y_test, gnb_preds)
gnb_precision = precision_score(y_test, gnb_preds)
gnb_recall = recall_score(y_test, gnb_preds)
gnb_f1 = f1_score(y_test, gnb_preds)

# Print the evaluation metrics
print("Gaussian Naive Bayes Accuracy: ", gnb_accuracy)
print("Gaussian Naive Bayes Precision: ", gnb_precision)
print("Gaussian Naive Bayes Recall: ", gnb_recall)
print("Gaussian Naive Bayes F1 Score: ", gnb_f1)


Gaussian Naive Bayes Accuracy:  0.8229098090849243
Gaussian Naive Bayes Precision:  0.7171837708830548
Gaussian Naive Bayes Recall:  0.9494470774091627
Gaussian Naive Bayes F1 Score:  0.8171312032630863


## Comaparison

In [27]:
print("Bernoulli NB Performance")
print("Bernoulli Naive Bayes Accuracy: ", bnb_accuracy)
print("Bernoulli Naive Bayes Precision: ", bnb_precision)
print("Bernoulli Naive Bayes Recall: ", bnb_recall)
print("Bernoulli Naive Bayes F1 Score: ", bnb_f1,'\n')

print("Multinomial NB Performance")
print("Multinomial Naive Bayes Accuracy: ", mnb_accuracy)
print("Multinomial Naive Bayes Precision: ", mnb_precision)
print("Multinomial Naive Bayes Recall: ", mnb_recall)
print("Multinomial Naive Bayes F1 Score: ", mnb_f1,'\n')

print("Gaussian NB Performance")
print("Gaussian Naive Bayes Accuracy: ", gnb_accuracy)
print("Gaussian Naive Bayes Precision: ", gnb_precision)
print("Gaussian Naive Bayes Recall: ", gnb_recall)
print("Gaussian Naive Bayes F1 Score: ", gnb_f1,'\n')

Bernoulli NB Performance
Bernoulli Naive Bayes Accuracy:  0.8821593153390388
Bernoulli Naive Bayes Precision:  0.8927335640138409
Bernoulli Naive Bayes Recall:  0.8151658767772512
Bernoulli Naive Bayes F1 Score:  0.8521882741535921 

Multinomial NB Performance
Multinomial Naive Bayes Accuracy:  0.7853851217906518
Multinomial Naive Bayes Precision:  0.7688266199649737
Multinomial Naive Bayes Recall:  0.693522906793049
Multinomial Naive Bayes F1 Score:  0.7292358803986712 

Gaussian NB Performance
Gaussian Naive Bayes Accuracy:  0.8229098090849243
Gaussian Naive Bayes Precision:  0.7171837708830548
Gaussian Naive Bayes Recall:  0.9494470774091627
Gaussian Naive Bayes F1 Score:  0.8171312032630863 



### Discussion

In [None]:
Based on the results obtained, Bernoulli Naive Bayes performed the best, with an accuracy of 0.882, precision of 0.893, recall of 0.815 and an F1 score of 0.852.
It was followed by Gaussian Naive Bayes, with an accuracy of 0.823, precision of 0.717, recall of 0.949 and an F1 score of 0.817. 
Multinomial Naive Bayes had the lowest performance with an accuracy of 0.785, precision of 0.769, recall of 0.694 and an F1 score of 0.729.

The reason Bernoulli Naive Bayes performed the best could be because the data used had binary features, and Bernoulli Naive Bayes works well with binary data as it models the occurrence or non-occurrence of features in the data. 
Gaussian Naive Bayes, on the other hand, works well with continuous data, and it performed well in this case because the features in the data had continuous values.

One of the limitations of Naive Bayes that was observed in this case is the assumption of independence between the features, which may not always hold true in real-world scenarios.
Additionally, Naive Bayes may not perform well in cases where there are rare events or when there is insufficient training data.

Overall, the choice of the Naive Bayes variant to use would depend on the nature of the data being analyzed, 
and it is important to evaluate the performance of each variant to determine which one is most suitable for the task at hand.

In [None]:
#Conclusion
In conclusion, the study evaluated the performance of three variants of Naive Bayes - Bernoulli, Multinomial, and Gaussian - on the Spambase dataset.
The results showed that Bernoulli Naive Bayes performed the best, followed by Gaussian Naive Bayes, while Multinomial Naive Bayes had the lowest performance. 
The choice of the Naive Bayes variant to use would depend on the nature of the data being analyzed.

For future work, more advanced machine learning algorithms could be evaluated and compared to Naive Bayes to determine which one performs the best on the Spambase dataset. 
Additionally, feature selection techniques could be applied to reduce the dimensionality of the data and improve the performance of the models.
Finally, the study could be extended to evaluate the performance of Naive Bayes on other datasets with different types of features and different distributions.