#### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?


We can use Bayes' theorem to calculate the probability of an employee being a smoker given that he/she uses the health insurance plan. Let S represent the event that an employee is a smoker and H represent the event that an employee uses the health insurance plan. Then, we have:

P(S|H) = P(H|S) * P(S) / P(H)

We know that P(H) is the probability of an employee using the health insurance plan, which is given to be 0.70. We also know that P(S) is the probability of an employee being a smoker, which is not given in the problem. However, we can use the information given in the problem to calculate it:

P(S) = P(S|H) * P(H) + P(S|H') * P(H')

where H' is the event that an employee does not use the health insurance plan. We can assume that P(S|H') is much smaller than P(S|H), since smoking is likely to be correlated with health problems and the use of health insurance. Therefore, we can approximate P(S) as:

P(S) ≈ P(S|H) * P(H)

Substituting the known values, we get:

P(S) = 0.40 * 0.70 = 0.28

Now, we can substitute all the known values into Bayes' theorem to get:

P(S|H) = 0.40 * 0.70 / 0.70 = 0.40

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.40 or 40%.

#### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Bernoulli Naive Bayes and Multinomial Naive Bayes are two types of Naive Bayes classifiers that are commonly used for text classification tasks. The main difference between the two lies in the way they represent the data.

Bernoulli Naive Bayes assumes that each feature (or word) is binary, meaning that it is either present or absent in the document. For example, in a spam classification task, the features could be the presence or absence of certain words in the email. If a particular word is present in the email, its corresponding feature value would be 1, and if it is absent, the value would be 0. Bernoulli Naive Bayes calculates the likelihood of each feature given the class, and uses these probabilities to classify new instances.

Multinomial Naive Bayes, on the other hand, assumes that each feature represents a count of the number of times it occurs in the document. For example, in a sentiment analysis task, the features could be the frequency of certain words in a given review. If a particular word occurs twice in the review, its corresponding feature value would be 2. Multinomial Naive Bayes calculates the likelihood of each feature given the class, taking into account the frequency of each feature, and uses these probabilities to classify new instances.

In summary, the main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the way they represent the data - binary for Bernoulli and count-based for Multinomial. The choice between the two depends on the specific problem at hand and the characteristics of the data. If the data is binary in nature and the focus is on presence/absence of features, Bernoulli Naive Bayes may be more appropriate. If the data represents frequency counts, Multinomial Naive Bayes may be a better choice.

#### Q3. How does Bernoulli Naive Bayes handle missing values?


Bernoulli Naive Bayes assumes that each feature is binary, meaning that it is either present or absent in the document. If a feature is missing (i.e., its value is unknown), it can be treated as if it is absent. This is known as the "missing-at-random" assumption, which assumes that the probability of a missing value is independent of the true value of the feature, given the class.

In practice, when using Bernoulli Naive Bayes for classification, missing values can be handled by simply ignoring them and treating them as if they were absent. This is because the probability of a missing value occurring in a document is relatively low, and so the impact of missing values on classification accuracy is usually small. However, if missing values occur frequently or are systematically related to the class variable, the model's accuracy may be compromised.

In cases where missing values occur frequently or are systematically related to the class variable, more advanced techniques can be used to handle missing values. For example, imputation methods can be used to estimate missing values based on other features in the dataset, or more complex models such as Decision Trees or Random Forests can be used to handle missing values.

#### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is a probabilistic algorithm that can be used for classification problems where the input features are continuous-valued. In the case of multi-class classification, Gaussian Naive Bayes assumes that the conditional probability distribution of the input features given the class label follows a Gaussian distribution. It then calculates the posterior probability of each class given the input features using Bayes' theorem and chooses the class with the highest probability as the predicted class. One way to handle multi-class classification using Gaussian Naive Bayes is to use the "one-vs-all" or "one-vs-rest" approach. In this approach, a separate binary classification model is trained for each class label. Each binary classifier predicts whether an input belongs to that class or not. The final prediction is made by selecting the class with the highest probability among all the binary classifiers.

#### Q5. Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/ datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.

Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

In [16]:
import pandas as pd
import numpy as np

In [17]:
df = pd.read_csv('./spambase.data')
df

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
0,0.21,0.28,0.50,0.0,0.14,0.28,0.21,0.07,0.00,0.94,...,0.000,0.132,0.0,0.372,0.180,0.048,5.114,101,1028,1
1,0.06,0.00,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.010,0.143,0.0,0.276,0.184,0.010,9.821,485,2259,1
2,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.137,0.0,0.137,0.000,0.000,3.537,40,191,1
3,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.135,0.0,0.135,0.000,0.000,3.537,40,191,1
4,0.00,0.00,0.00,0.0,1.85,0.00,0.00,1.85,0.00,0.00,...,0.000,0.223,0.0,0.000,0.000,0.000,3.000,15,54,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4595,0.31,0.00,0.62,0.0,0.00,0.31,0.00,0.00,0.00,0.00,...,0.000,0.232,0.0,0.000,0.000,0.000,1.142,3,88,0
4596,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.000,0.000,0.0,0.353,0.000,0.000,1.555,4,14,0
4597,0.30,0.00,0.30,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.102,0.718,0.0,0.000,0.000,0.000,1.404,6,118,0
4598,0.96,0.00,0.00,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.000,0.057,0.0,0.000,0.000,0.000,1.147,5,78,0


In [18]:
X = df.drop('1',axis = 1)
y = df['1']
X.shape,y.shape

((4600, 57), (4600,))

In [19]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=44,test_size=0.3)
X_train.shape,X_test.shape,y_train.shape,y_test.shape

((3220, 57), (1380, 57), (3220,), (1380,))

In [27]:
from sklearn.naive_bayes import BernoulliNB,MultinomialNB
from sklearn.model_selection import KFold,cross_val_score
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report

## BernolluNB

model = BernoulliNB()
model.fit(X_train,y_train)

y_pred = model.predict(X_test)

print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
print(classification_report(y_test,y_pred))

validation  = KFold(n_splits=5)
cross_score = np.mean(cross_val_score(BernoulliNB(),X_train,y_train,cv = validation,scoring = 'accuracy'))
print(f'Mean of Accuracy of k = 5 Cross validation is {cross_score}')

[[778  48]
 [114 440]]
0.8826086956521739
              precision    recall  f1-score   support

           0       0.87      0.94      0.91       826
           1       0.90      0.79      0.84       554

    accuracy                           0.88      1380
   macro avg       0.89      0.87      0.88      1380
weighted avg       0.88      0.88      0.88      1380

Mean of Accuracy of k = 5 Cross validation is 0.8872670807453418


In [28]:
## BernolluNB

model = MultinomialNB()
model.fit(X_train,y_train)

y_pred = model.predict(X_test)

print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))
print(classification_report(y_test,y_pred))

validation  = KFold(n_splits=5)
cross_score = np.mean(cross_val_score(MultinomialNB(),X_train,y_train,cv = validation,scoring = 'accuracy'))
print(f'Mean of Accuracy of k = 5 Cross validation is {cross_score}')

[[684 142]
 [170 384]]
0.7739130434782608
              precision    recall  f1-score   support

           0       0.80      0.83      0.81       826
           1       0.73      0.69      0.71       554

    accuracy                           0.77      1380
   macro avg       0.77      0.76      0.76      1380
weighted avg       0.77      0.77      0.77      1380

Mean of Accuracy of k = 5 Cross validation is 0.7878881987577641


##### In bernoulliNB the accuracy of Model on test data is 0.89 while in Multinomial accuracy of model is 0.77

In this particular Dataset BernoulliNB Model Perform Wwll whenever we get this kind of problem we should use this algorithm for prediction.