In [None]:
Q1. A company conducted a survey of its employees and found that 70% of the employees use the
companys health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?
Ans:
To solve this problem, we can use Bayes theorem, which states:

P(A|B) = P(B|A) * P(A) / P(B)

Where P(A|B) is the conditional probability of A given B, P(B|A) is the conditional probability of B given A, P(A) is the prior probability of A, and P(B) is the prior probability of B.

In this case, we want to find the probability that an employee is a smoker given that he/she uses the health insurance plan. Let's define the events:

A = employee is a smoker
B = employee uses the health insurance plan

Using the information given in the problem, we can find the probabilities:

P(B) = 70% = 0.7 (prior probability of using the health insurance plan)
P(A|B) = ? (conditional probability of being a smoker given that the employee uses the health insurance plan)
P(B|A) = 40% = 0.4 (conditional probability of using the health insurance plan given that the employee is a smoker)
P(A) = ? (prior probability of being a smoker)

We dont know the prior probability of being a smoker (P(A)), but we can find it using the law of total probability:
P(A) = P(A|B) * P(B) + P(A|not B) * P(not B)

Where not B means not using the health insurance plan. We know that 70% of the employees use the health insurance plan, so:

P(not B) = 30% = 0.3

We dont know the conditional probability of being a smoker given that the employee does not use the health insurance plan (P(A|not B)),
but we can assume that it is lower than the conditional probability of being a smoker given that the employee uses the health insurance plan (P(A|B)).

Lets assume that P(A|not B) = 20% = 0.2. We can now find the prior probability of being a smoker:
P(A) = P(A|B) * P(B) + P(A|not B) * P(not B) = 0.4 * 0.7 + 0.2 * 0.3 = 0.34

Now we can use Bayes theorem to find the conditional probability of being a smoker given that the employee uses the health insurance plan:

P(A|B) = P(B|A) * P(A) / P(B) = 0.4 * 0.34 / 0.7 = 0.195

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.195, or about 19.5%.

In [None]:
Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?
Ans:
Bernoulli Naive Bayes and Multinomial Naive Bayes are two popular variations of the Naive Bayes algorithm, a probabilistic classification algorithm based on Bayes theorem.

The main difference between the two lies in their input feature representation and the assumptions they make about the data.

Bernoulli Naive Bayes: It is used when the input features are binary (i.e., each feature can take on only one of two possible values, usually 0 or 1). 
Bernoulli Naive Bayes assumes that each feature is conditionally independent of the others given the class variable. 
It is often used in text classification, where the presence or absence of a word is used as a binary feature.

Multinomial Naive Bayes: It is used when the input features are counts or frequencies (e.g., the number of times a word appears in a document). 
Multinomial Naive Bayes assumes that the input features are conditionally independent of each other given the class variable and 
that the feature counts follow a multinomial distribution.
It is also commonly used in text classification, where the frequency of a word is used as a feature.

In [None]:
Q3. How does Bernoulli Naive Bayes handle missing values?
Ans:
Bernoulli Naive Bayes is a classification algorithm that assumes that each feature is conditionally independent of the others given the class variable.
In the case of missing values, the algorithm usually treats them as a separate category or ignores them altogether, depending on the specific implementation.

One common approach is to treat missing values as a separate category, which can be assigned a specific value such as -1 or NaN. 
The algorithm then considers this category as a distinct feature and calculates the probabilities accordingly.
In this way, the presence or absence of a value can still be used as a feature.

Another approach is to simply ignore the missing values and only consider the available features. 
This can be appropriate if the missing values are rare and do not significantly affect the overall classification accuracy. 
However, if the missing values are frequent and potentially informative, ignoring them may result in biased or incomplete results.

In practice, the choice of how to handle missing values in Bernoulli Naive Bayes depends on the specific application and the characteristics of the data.
It is important to carefully evaluate the impact of missing values on the classification performance and select an appropriate approach accordingly.

In [None]:
Q4. Can Gaussian Naive Bayes be used for multi-class classification?
Ans:
Yes, Gaussian Naive Bayes can be used for multi-class classification, where the goal is to classify instances into one of three or more classes.

In Gaussian Naive Bayes, each feature is assumed to follow a Gaussian (normal) distribution, and the algorithm calculates the mean and variance of each feature for each class in the training data. 
During classification, the algorithm then calculates the probability of the instance belonging to each class based on the likelihood of the features given the class and the prior probability of the class.

For multi-class classification, the algorithm can be extended to handle more than two classes by using a "one-vs-all" or "one-vs-one" strategy.
In the "one-vs-all" strategy, the algorithm trains a separate binary classifier for each class, which distinguishes that class from all the others.
During classification, the algorithm then selects the class with the highest probability score among all the binary classifiers.
In the "one-vs-one" strategy, the algorithm trains a binary classifier for each pair of classes, 
and the class with the most votes from all the binary classifiers is selected as the final prediction.

In [None]:
Q5. Assignment:
    
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score

Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?

Conclusion:
Summarise your findings and provide some suggestions for future work.

In [39]:
import pandas as pd
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.model_selection import cross_validate
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [37]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data'
df=pd.read_csv(url,header=None)

In [38]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4601 entries, 0 to 4600
Data columns (total 58 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       4601 non-null   float64
 1   1       4601 non-null   float64
 2   2       4601 non-null   float64
 3   3       4601 non-null   float64
 4   4       4601 non-null   float64
 5   5       4601 non-null   float64
 6   6       4601 non-null   float64
 7   7       4601 non-null   float64
 8   8       4601 non-null   float64
 9   9       4601 non-null   float64
 10  10      4601 non-null   float64
 11  11      4601 non-null   float64
 12  12      4601 non-null   float64
 13  13      4601 non-null   float64
 14  14      4601 non-null   float64
 15  15      4601 non-null   float64
 16  16      4601 non-null   float64
 17  17      4601 non-null   float64
 18  18      4601 non-null   float64
 19  19      4601 non-null   float64
 20  20      4601 non-null   float64
 21  21      4601 non-null   float64
 22  

In [48]:
X=df.iloc[:,:-1]
y = df.iloc[:, -1]

### train test split
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
     X, y, test_size=0.33, random_state=42)

# Define the classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Evaluate the classifiers using 10-fold cross-validation
scoring = ["accuracy", "precision", "recall", "f1"]
bernoulli_nb_scores = cross_validate(bernoulli_nb, X, y, cv=10, scoring=scoring)
multinomial_nb_scores = cross_validate(multinomial_nb, X, y, cv=10, scoring=scoring)
gaussian_nb_scores = cross_validate(gaussian_nb, X, y, cv=10, scoring=scoring)

# Print the results
print("Bernoulli Naive Bayes:")
print("Accuracy:", bernoulli_nb_scores["test_accuracy"].mean())
print("Precision:", bernoulli_nb_scores["test_precision"].mean())
print("Recall:", bernoulli_nb_scores["test_recall"].mean())
print("F1 score:", bernoulli_nb_scores["test_f1"].mean())

print("\nMultinomial Naive Bayes:")
print("Accuracy:", multinomial_nb_scores["test_accuracy"].mean())
print("Precision:", multinomial_nb_scores["test_precision"].mean())
print("Recall:", multinomial_nb_scores["test_recall"].mean())
print("F1 score:", multinomial_nb_scores["test_f1"].mean())

print("\nGaussian Naive Bayes:")
print("Accuracy:", gaussian_nb_scores["test_accuracy"].mean())
print("Precision:", gaussian_nb_scores["test_precision"].mean())
print("Recall:", gaussian_nb_scores["test_recall"].mean())
print("F1 score:", gaussian_nb_scores["test_f1"].mean())

Bernoulli Naive Bayes:
Accuracy: 0.8839380364047911
Precision: 0.8869617393737383
Recall: 0.8152389047416673
F1 score: 0.8481249015095276

Multinomial Naive Bayes:
Accuracy: 0.7863496180326323
Precision: 0.7393175533565436
Recall: 0.7214983911116508
F1 score: 0.7282909724016348

Gaussian Naive Bayes:
Accuracy: 0.8217730830896915
Precision: 0.7103733928118492
Recall: 0.9569516119239877
F1 score: 0.8130660909542995


In [None]:
According to the results, both Bernoulli and Gaussian Naive Bayes classifiers achieved similar performance on this dataset, 
with slightly higher accuracy for the Bernoulli  variant. 
Gaussian Naive Bayes higher recall than the other two variants.