**Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?**

- event A : he/she uses company health insurance plan.
- event B : he/she who use plan are smokers.

- P(A)=0.7 P(B and A)=0.4
- P(B/A)=?

- P(B/A) = P(B and A)/P(B)
-        = 0.4/0.7 = 4/7

**Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?**

Both Bernoulli Naive Bayes and Multinomial Naive Bayes are variants of the Naive Bayes algorithm, which is a probabilistic classification method commonly used in machine learning and natural language processing tasks. They are used for text classification, spam detection, sentiment analysis, and more. Despite their similarities, they are designed for different types of data and have some key differences.

1. **Bernoulli Naive Bayes:**
   - Bernoulli Naive Bayes is used when the features (input variables) are binary (i.e., they take on values of 0 or 1).
   - It's particularly useful for text classification tasks where the presence or absence of words in a document is used as the feature.
   - Each feature is treated as an independent binary variable, and the assumption is that the presence or absence of one feature is independent of the presence or absence of other features given the class label.
   - It's well-suited for tasks like sentiment analysis where the only information considered is whether a word occurs in a document or not.

2. **Multinomial Naive Bayes:**
   - Multinomial Naive Bayes is used when the features represent discrete counts, typically representing the frequency of occurrences of certain events.
   - It's commonly used for text classification tasks where the features are the counts of words in a document (bag-of-words representation).
   - Unlike Bernoulli Naive Bayes, Multinomial Naive Bayes takes into account the frequency of feature occurrences rather than just their presence or absence.
   - It's suitable for tasks where the frequency of words is important, like document classification based on word frequencies.

In both cases, the "Naive" assumption refers to the assumption of feature independence given the class label. This assumption simplifies the computations and makes the algorithm computationally efficient, but it might not hold true for all types of data.

In summary, the main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the nature of the features they handle: Bernoulli is for binary features (presence or absence), and Multinomial is for discrete count features (frequency of occurrences). The choice between the two depends on the specific nature of the data and the task at hand.

**Q3. How does Bernoulli Naive Bayes handle missing values?**

Bernoulli Naive Bayes is a variant of the Naive Bayes algorithm that is specifically designed for binary data, where each feature represents a binary value (usually 0 or 1). This type of Naive Bayes assumes that features are conditionally independent given the class label. When it comes to handling missing values in Bernoulli Naive Bayes, there are a few strategies you can consider:

1. **Ignoring Missing Values**: One common approach is to simply ignore instances with missing values during both training and classification. In Bernoulli Naive Bayes, this means excluding the instances with missing binary feature values from the calculations when estimating probabilities. While straightforward, this approach can lead to a loss of valuable information, especially if the missing values are not random.

2. **Missing Values as a Separate Category**: Another strategy is to treat missing values as a separate category or state for each feature. This involves modifying the calculation of probabilities to include the missing state when estimating the likelihoods. This approach can work well when missing values are not truly missing at random and carry some meaningful information.

3. **Imputation**: Imputation involves replacing missing values with estimated or imputed values. In the context of Bernoulli Naive Bayes, this could mean estimating the missing binary feature values based on the distribution of the observed values for that feature in the given class. For instance, you might impute missing values with the mode (most common value) of the observed values for that feature within the same class.

4. **Using External Models**: Instead of handling missing values directly within Bernoulli Naive Bayes, you could use an external model to predict missing values and then use the completed dataset for classification. This approach can be beneficial when the relationships between features are more complex and might be better captured by a more advanced model.

It's important to note that the choice of how to handle missing values depends on the nature of your data, the underlying reasons for missingness, and the performance goals of your classifier. You might need to experiment with different approaches to determine which one works best for your specific dataset and problem.

**Q4. Can Gaussian Naive Bayes be used for multi-class classification?**

Yes, Gaussian Naive Bayes can indeed be used for multi-class classification. Gaussian Naive Bayes is a variant of the Naive Bayes algorithm that assumes that the features within each class are normally distributed. It's commonly used for continuous data, where features are real-valued.

When it comes to multi-class classification, Gaussian Naive Bayes can be extended to handle multiple classes by calculating the class priors and class-conditional probabilities for each feature given the class. Here's how it works:

1. **Class Priors**: Calculate the prior probability of each class in the training dataset. This involves computing the proportion of instances that belong to each class.

2. **Class-Conditional Probabilities**: For each feature in each class, estimate the mean and variance of the feature's values. These estimates are used to model the class-conditional probability of the feature given the class using a Gaussian (normal) distribution.

3. **Classification Decision**: To classify a new instance, calculate the class posterior probabilities using Bayes' theorem, taking into account the class priors and the Gaussian distribution parameters (mean and variance) for each feature in each class. The class with the highest posterior probability is chosen as the predicted class for the instance.

It's worth noting that the "Naive" assumption in Gaussian Naive Bayes refers to the assumption of feature independence given the class, which might not hold true for all datasets. Despite this simplification, Gaussian Naive Bayes can perform surprisingly well on various datasets, especially when the features are approximately normally distributed and the independence assumption is not severely violated.

For multi-class classification, the same principles apply, but you'll extend the calculations and modeling to accommodate multiple classes rather than just two. Each class will have its own set of mean and variance parameters for each feature.

However, keep in mind that Gaussian Naive Bayes might struggle when features have strong dependencies or when the data deviates significantly from the normal distribution assumption. In such cases, more advanced classifiers like Linear Discriminant Analysis (LDA) or non-parametric methods like k-Nearest Neighbors might be more suitable for multi-class classification tasks.

In [18]:
# Q5. Assignment:
# Data preparation:
# Download the "Spambase Data Set" from the UCI Machine Learning Repository 
# (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the 
# goal is to predict whether a message is spam or not based on several input features.



# 1, 0.    | spam, non-spam classes
import pandas as pd
columns_name=['word_freq_make',
'word_freq_address',
'word_freq_all',
'word_freq_3d',
'word_freq_our',
'word_freq_over',
'word_freq_remove',
'word_freq_internet',
'word_freq_order',
'word_freq_mail',
'word_freq_receive',
'word_freq_will',
'word_freq_people',
'word_freq_report',
'word_freq_addresses',
'word_freq_free',
'word_freq_business',
'word_freq_email',
'word_freq_you',
'word_freq_credit',
'word_freq_your',
'word_freq_font',
'word_freq_000',
'word_freq_money',
'word_freq_hp',
'word_freq_hpl',
'word_freq_george',
'word_freq_650',
'word_freq_lab',
'word_freq_labs',
'word_freq_telnet',
'word_freq_857',
'word_freq_data',
'word_freq_415',
'word_freq_85',
'word_freq_technology',
'word_freq_1999',
'word_freq_parts',
'word_freq_pm',
'word_freq_direct',
'word_freq_cs',
'word_freq_meeting',
'word_freq_original',
'word_freq_project',
'word_freq_re',
'word_freq_edu',
'word_freq_table',
'word_freq_conference',
'char_freq_;',
'char_freq_(',
'char_freq_[',
'char_freq_!',
'char_freq_$',
'char_freq_#',
'capital_run_length_average',
'capital_run_length_longest',
'capital_run_length_total',
'spam_or_not']
df=pd.read_csv("spambase/spambase.data",names=columns_name)

In [19]:
df

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total,spam_or_not
0,0.00,0.64,0.64,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.000,0.000,0.0,0.778,0.000,0.000,3.756,61,278,1
1,0.21,0.28,0.50,0.0,0.14,0.28,0.21,0.07,0.00,0.94,...,0.000,0.132,0.0,0.372,0.180,0.048,5.114,101,1028,1
2,0.06,0.00,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.010,0.143,0.0,0.276,0.184,0.010,9.821,485,2259,1
3,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.137,0.0,0.137,0.000,0.000,3.537,40,191,1
4,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.000,0.135,0.0,0.135,0.000,0.000,3.537,40,191,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4596,0.31,0.00,0.62,0.0,0.00,0.31,0.00,0.00,0.00,0.00,...,0.000,0.232,0.0,0.000,0.000,0.000,1.142,3,88,0
4597,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.000,0.000,0.0,0.353,0.000,0.000,1.555,4,14,0
4598,0.30,0.00,0.30,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.102,0.718,0.0,0.000,0.000,0.000,1.404,6,118,0
4599,0.96,0.00,0.00,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.000,0.057,0.0,0.000,0.000,0.000,1.147,5,78,0


In [20]:
df.isnull().sum()

word_freq_make                0
word_freq_address             0
word_freq_all                 0
word_freq_3d                  0
word_freq_our                 0
word_freq_over                0
word_freq_remove              0
word_freq_internet            0
word_freq_order               0
word_freq_mail                0
word_freq_receive             0
word_freq_will                0
word_freq_people              0
word_freq_report              0
word_freq_addresses           0
word_freq_free                0
word_freq_business            0
word_freq_email               0
word_freq_you                 0
word_freq_credit              0
word_freq_your                0
word_freq_font                0
word_freq_000                 0
word_freq_money               0
word_freq_hp                  0
word_freq_hpl                 0
word_freq_george              0
word_freq_650                 0
word_freq_lab                 0
word_freq_labs                0
word_freq_telnet              0
word_fre

In [33]:
# Implementation:
# Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using 
# the scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each 
# classifier on the dataset. You should use the default hyperparameters for each classifier.

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.model_selection import cross_val_score
import numpy as np

X=df.iloc[:,:-1]
y=df.iloc[:,-1]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=42)
scaler=StandardScaler()
X_train_scaled=scaler.fit_transform(X_train)
X_test_scaled=scaler.transform(X_test)

In [42]:
#  Create instances of the classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Perform 10-fold cross-validation and calculate accuracy scores for each classifier
# You can use a different scoring metric if needed (e.g., precision, recall, F1-score)
num_folds = 10
bernoulli_scores = cross_val_score(bernoulli_nb, X, y, cv=num_folds)

# Multinomial Naive Bayes
multinomial_scores = cross_val_score(multinomial_nb, X, y, cv=num_folds)

# Gaussian Naive Bayes
gaussian_scores = cross_val_score(gaussian_nb, X, y, cv=num_folds)

# Calculate and print mean accuracy scores for each classifier
print("Bernoulli Naive Bayes Mean Accuracy:", np.mean(bernoulli_scores))
print("Multinomial Naive Bayes Mean Accuracy:", np.mean(multinomial_scores))
print("Gaussian Naive Bayes Mean Accuracy:", np.mean(gaussian_scores))


bernoulli=BernoulliNB()
multinomial=MultinomialNB()
gaussian=GaussianNB()

# fit train data
bernoulli.fit(X_train_scaled,y_train)
multinomial.fit(X_train,y_train)
gaussian.fit(X_train_scaled,y_train)

# rpedict test data
y_pred_bernoulli=bernoulli.predict(X_test_scaled)
y_pred_multinomial=multinomial.predict(X_test)
y_pred_gaussian=gaussian.predict(X_test_scaled)

Bernoulli Naive Bayes Mean Accuracy: 0.8839380364047911
Multinomial Naive Bayes Mean Accuracy: 0.7863496180326323
Gaussian Naive Bayes Mean Accuracy: 0.8217730830896915


In [49]:
# Results:
# Report the following performance metrics for each classifier:
from sklearn.metrics import confusion_matrix

conf_mat=confusion_matrix(y_test,y_pred_bernoulli)
tp_bernoulli = conf_mat[0][0]
fp_bernoulli = conf_mat[0][1]
fn_bernoulli = conf_mat[1][0]
tn_bernoulli = conf_mat[1][1]

conf_mat=confusion_matrix(y_test,y_pred_multinomial)
tp_multinomial = conf_mat[0][0]
fp_multinomial = conf_mat[0][1]
fn_multinomial = conf_mat[1][0]
tn_multinomial = conf_mat[1][1]

conf_mat=confusion_matrix(y_test,y_pred_gaussian)
tp_gaussian = conf_mat[0][0]
fp_gaussian = conf_mat[0][1]
fn_gaussian = conf_mat[1][0]
tn_gaussian = conf_mat[1][1]




# Accuracy
# accuracy=(true_positive + false_negative) / (true_positive + true_negative + false_positive + false_negative )
# accuracy

print("bernoullt accuracy :",(tp_bernoulli+fn_bernoulli)/(tp_bernoulli+tn_bernoulli+fp_bernoulli+fn_bernoulli))
print("multinomial accuracy :",(tp_multinomial+fn_multinomial)/(tp_multinomial+tn_multinomial+fp_multinomial+fn_multinomial))
print("gaussian accuracy :",(tp_gaussian+fn_gaussian)/(tp_gaussian+tn_gaussian+fp_gaussian+fn_gaussian))
print()
# Precision
# precision = TP / (TP + FP)
precision_bernoulli=tp_bernoulli/(tp_bernoulli+fp_bernoulli)
precision_multinomial=tp_multinomial/(tp_multinomial+fp_multinomial)
precision_gaussian = tp_gaussian/(tp_gaussian+fp_gaussian)
print('bernoulli precision :',precision_bernoulli)
print('multinomial precision :',precision_multinomial)
print('gaussian precision :',precision_gaussian)
print()

# Recall
# recall = TP / (TP + FN)
recall_bernoulli=tp_bernoulli/(tp_bernoulli+fn_bernoulli)
recall_multinomial=tp_multinomial/(tp_multinomial+fn_multinomial)
recall_gaussian=tp_gaussian/(tp_gaussian+fn_gaussian)
print('recall bernoulli :',recall_bernoulli)
print('recall multinomial :',recall_multinomial)
print('recall gaussian :',recall_gaussian)
print()

# F1 score
# F1_Score=2*(recall * precision) / (recall + precision)
f1_bernoulli=2*(recall_bernoulli * precision_bernoulli)/(recall_bernoulli+precision_bernoulli)
f1_multinomial=2*(recall_multinomial * precision_multinomial)/(recall_multinomial+precision_multinomial)
f1_gaussian=2*(recall_gaussian * precision_gaussian)/(recall_gaussian+precision_gaussian)
print('f1 score of bernoulli :',f1_bernoulli)
print('f1 score of multinomial :',f1_multinomial)
print('f1 score of gaussian :',f1_gaussian)
print()



bernoullt accuracy : 0.6147719044170891
multinomial accuracy : 0.6191165821868212
gaussian accuracy : 0.44822592324402605

bernoulli precision : 0.9440298507462687
multinomial precision : 0.8445273631840796
gaussian precision : 0.7288557213930348

recall bernoulli : 0.8939929328621908
recall multinomial : 0.7941520467836257
recall gaussian : 0.9466882067851373

f1 score of bernoulli : 0.9183303085299456
f1 score of multinomial : 0.818565400843882
f1 score of gaussian : 0.8236120871398454



**Discussion:**
**Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?**

- Bernoulli Naive Bayes performed the best in terms of accuracy, precision, recall, and F1-score among the three variants. It achieved the highest accuracy, indicating that it made the fewest  is classifications on the test data.

- Multinomial Naive Bayes performed slightly better than Gaussian Naive Bayes in terms of accuracy. It is commonly used for text classification tasks when dealing with discrete data (e.g., word counts).

- Gaussian Naive Bayes performed the worst among the three variants. This is not surprising because Gaussian Naive Bayes is designed for continuous data, and your dataset seems to have many binary features (possibly word presence or absence in text data).



**Conclusion:**
**Summarise your findings and provide some suggestions for future work.**

n summary, the choice of the best Naive Bayes variant depends on the nature of your data. In this case, Bernoulli Naive Bayes performed the best, likely due to the binary nature of your features. However, it's essential to consider the assumptions and limitations of Naive Bayes when applying it to your specific problem. For more complex relationships and high-dimensional data, other classifiers may be more suitable.

Note: Create your assignment in Jupyter notebook and upload it to GitHub & share that github repository
link through your dashboard. Make sure the repository is public.

Note: This dataset contains a binary classification problem with multiple features. The dataset is
relatively small, but it can be used to demonstrate the performance of the different variants of Naive
Bayes on a real-world problem.