1) A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?

We can solve this problem using Bayes' theorem, which states that:

P(A|B) = P(B|A) * P(A) / P(B)

where P(A|B) is the conditional probability of A given B, P(B|A) is the conditional probability of B given A, P(A) is the prior probability of A, and P(B) is the prior probability of B.

In this case, we want to find the probability that an employee is a smoker given that he/she uses the health insurance plan, which can be written as:

P(smoker | uses insurance) = P(uses insurance | smoker) * P(smoker) / P(uses insurance)

We are given that 70% of the employees use the health insurance plan, which means that P(uses insurance) = 0.7. We are also given that 40% of the employees who use the plan are smokers, which means that P(uses insurance | smoker) = 0.4. Finally, we know that the overall percentage of smokers among all employees is not given.

To find P(smoker), we can use the law of total probability, which states that:

P(smoker) = P(smoker | uses insurance) * P(uses insurance) + P(smoker | does not use insurance) * P(does not use insurance)

We are not given the value of P(smoker | does not use insurance), but we can assume that it is lower than 40%, since smokers may be more likely to use the insurance plan than non-smokers. For the sake of simplicity, let's assume that P(smoker | does not use insurance) = 20%, which means that P(does not use insurance) = 0.3 (since 1 - 0.7 = 0.3).

Now we can plug in all the values into Bayes' theorem:

P(smoker | uses insurance) = 0.4 * P(smoker) / 0.7

P(smoker) = P(smoker | uses insurance) * P(uses insurance) + 0.2 * 0.3

Solving this system of equations gives us:

P(smoker | uses insurance) = 0.4 * (0.4 * 0.7 + 0.2 * 0.3) / 0.7 = 0.26

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.26, or 26%.

2) What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Bernoulli Naive Bayes and Multinomial Naive Bayes are both variants of the Naive Bayes algorithm, which is a probabilistic classification algorithm based on Bayes' theorem. However, they differ in the way they handle input features.

In Bernoulli Naive Bayes, the input features are binary (i.e., they take on values of either 0 or 1), and the algorithm models the probability of each feature given each class. This means that it assumes that the presence or absence of a particular feature is equally important in predicting the class label, regardless of the number of times that feature occurs in the document. For example, in a spam detection task, a feature might be the presence or absence of the word "viagra" in an email.

In contrast, in Multinomial Naive Bayes, the input features are discrete counts (i.e., they take on integer values), and the algorithm models the frequency of each feature given each class. This means that it takes into account how many times a particular feature occurs in the document, and assumes that the frequency of occurrence is important in predicting the class label. For example, in a text classification task, a feature might be the number of times a particular word occurs in a document.

In summary, the key difference between Bernoulli Naive Bayes and Multinomial Naive Bayes is the type of input features they handle: binary features for Bernoulli Naive Bayes and count-based features for Multinomial Naive Bayes. The choice between the two variants depends on the nature of the input features and the specific task at hand

3) How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes assumes that each feature is binary (i.e., it takes on values of either 0 or 1), so missing values are usually handled by assigning a default value, such as 0 or 1, depending on the context of the problem.

One common approach for handling missing values in Bernoulli Naive Bayes is to impute them with the most common value in the training data for that feature. For example, if a particular feature is the presence or absence of a certain word in an email, missing values can be imputed with the value that occurs most frequently in the training data. Alternatively, missing values can be imputed with a value that is unlikely to occur in the data, such as -1, to differentiate them from valid values of 0 and 1.

Another approach is to ignore missing values and treat them as a separate category or class. This approach is more common in other variants of Naive Bayes, such as Multinomial Naive Bayes or Gaussian Naive Bayes, where features take on non-binary values.

However, the choice of how to handle missing values in Bernoulli Naive Bayes ultimately depends on the specific context of the problem and the amount of missing data. If the amount of missing data is small, imputing them with the most common value may be a reasonable approach. If the amount of missing data is large or if imputation is not appropriate, ignoring missing values and treating them as a separate category may be a better option.

4) Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification.

In the case of multi-class classification, the algorithm extends the basic Gaussian Naive Bayes algorithm to handle more than two classes. This is typically done using the one-vs-all (OvA) approach, where the algorithm trains a separate binary classifier for each class, and then combines the results to make the final prediction.

In the OvA approach, the probability of each class given the input features is estimated by training a separate Gaussian Naive Bayes classifier for each class, where the class is treated as the target variable and the input features are assumed to be normally distributed. During prediction, the classifier with the highest predicted probability is chosen as the predicted class label.

While Gaussian Naive Bayes can be used for multi-class classification, it is generally less accurate than more sophisticated algorithms, such as support vector machines or random forests, especially in cases where the input features are highly correlated or when there are non-linear relationships between the features and the target variable

5) Assignment:
Data preparation:

Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

Implementation:

Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

Results:

Report the following performance metrics for each classifier:

Accuracy

Precision

Recall

F1 score

Discussion:

Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?

Conclusion:

Summarise your findings and provide some suggestions for future work.

In [66]:
import pandas as pd
df=pd.read_csv('spambase.data',delimiter=',', header=None)
data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,48,49,50,51,52,53,54,55,56,57
0,0.00,0.64,0.64,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.000,0.000,0.0,0.778,0.000,0.000,3.756,61,278,1
1,0.21,0.28,0.50,0.0,0.14,0.28,0.21,0.07,0.00,0.94,...,0.000,0.132,0.0,0.372,0.180,0.048,5.114,101,1028,1
2,0.06,0.00,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.010,0.143,0.0,0.276,0.184,0.010,9.821,485,2259,1
3,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.137,0.0,0.137,0.000,0.000,3.537,40,191,1
4,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.135,0.0,0.135,0.000,0.000,3.537,40,191,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4596,0.31,0.00,0.62,0.0,0.00,0.31,0.00,0.00,0.00,0.00,...,0.000,0.232,0.0,0.000,0.000,0.000,1.142,3,88,0
4597,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.000,0.000,0.0,0.353,0.000,0.000,1.555,4,14,0
4598,0.30,0.00,0.30,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.102,0.718,0.0,0.000,0.000,0.000,1.404,6,118,0
4599,0.96,0.00,0.00,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.000,0.057,0.0,0.000,0.000,0.000,1.147,5,78,0


In [45]:
X=df.iloc[:,:-1]
y=df.iloc[:,-1]

In [46]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

In [47]:
X_train.shape,X_test.shape

((3680, 57), (921, 57))

In [58]:
bnb.fit(X_train,y_train)

In [59]:
mnb.fit(X_train,y_train)

In [50]:
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import GaussianNB,MultinomialNB,BernoulliNB


In [60]:
gnb.fit(X_train,y_train)

In [53]:
bnb=BernoulliNB()
bnb_scores=cross_val_score(bnb, X_train,y_train,cv=5)


In [26]:
mnb = MultinomialNB()
mnb_scores = cross_val_score(mnb, X_train, y_train, cv=5)

In [54]:
gnb=GaussianNB()
gnb_scores=cross_val_score(gnb,X_train,y_train,cv=5)

In [56]:
from sklearn.metrics import recall_score,precision_score,f1_score

In [61]:
y_pred=bnb.predict(X_test)

In [62]:
y_pred1=mnb.predict(X_test)

In [64]:
y_pred2=gnb.predict(X_test)

In [65]:

print( bnb_scores.mean())
print( precision_score(y_test, y_pred))
print( recall_score(y_test, y_pred))
print( f1_score(y_test, y_pred))
print(mnb_scores.mean())
print(precision_score(y_test, y_pred1))
print(recall_score(y_test, y_pred1))
print(f1_score(y_test, y_pred1))
print(gnb_scores.mean())
print(precision_score(y_test, y_pred2))
print(recall_score(y_test, y_pred2))
print(f1_score(y_test, y_pred2))


0.8858695652173914
0.9069767441860465
0.8
0.8501362397820164
0.7959239130434783
0.7643835616438356
0.7153846153846154
0.7390728476821192
0.8190217391304347
0.7192982456140351
0.9461538461538461
0.8172757475083057
