**Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?**

P(B∣A) = P(A∣B)⋅P(B) / P(A) 

P(B∣A) = 0.40⋅P(B) / 0.70

**Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?**

- **Bernoulli Naive Bayes:**
  - Handles binary features (presence or absence).
  - Well-suited for document classification based on word presence.
  - Focuses on whether a feature is present or not.

- **Multinomial Naive Bayes:**
  - Handles count or frequency features.
  - Well-suited for document classification based on word frequencies.
  - Considers the frequency of features in the document.

**Q3. How does Bernoulli Naive Bayes handle missing values?**

1. **Missing Values Treated as Absence:**
   - When a feature is missing for a particular instance (document, observation, etc.), it is considered as if the feature is absent (0).

2. **Feature Presence in the Model:**
   - Bernoulli Naive Bayes models whether a feature is present or absent in a document. The absence of a feature contributes information just as the presence of a feature does.

3. **Probability Calculation:**
   - In the context of probability calculation, missing values are essentially treated as if the corresponding features have a value of 0. The algorithm estimates the probability of feature presence or absence based on the observed data.

4. **Sparse Data Representation:**
   - Typically, text data used with Bernoulli Naive Bayes is represented in a sparse format (e.g., as a binary matrix). If a feature is missing, it is simply not included in the sparse representation, and its absence is assumed.

5. **Imputation (Optional):**
   - In some cases, practitioners might choose to impute missing values before applying Bernoulli Naive Bayes. However, this is not a common practice for Bernoulli Naive Bayes, as the algorithm is designed to handle the absence of features naturally.

**Q4. Can Gaussian Naive Bayes be used for multi-class classification?**

Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is an extension of the Naive Bayes algorithm that assumes the features are continuous and follow a Gaussian (normal) distribution. While it is commonly used for binary and two-class problems, it can also be adapted for multi-class classification tasks.

**Q5. Assignment:
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.**

**Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.**

In [1]:
import numpy as np
import seaborn as sns
import pandas as pd
from sklearn.model_selection import train_test_split

In [5]:
!pip install ucimlrepo

Defaulting to user installation because normal site-packages is not writeable


In [15]:
from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
spambase = fetch_ucirepo(id=94) 
  
# data (as pandas dataframes) 
X = spambase.data.features 
y = spambase.data.targets 

In [16]:
X.head()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,word_freq_conference,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.0,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191


In [18]:
X.isnull().sum()

word_freq_make                0
word_freq_address             0
word_freq_all                 0
word_freq_3d                  0
word_freq_our                 0
word_freq_over                0
word_freq_remove              0
word_freq_internet            0
word_freq_order               0
word_freq_mail                0
word_freq_receive             0
word_freq_will                0
word_freq_people              0
word_freq_report              0
word_freq_addresses           0
word_freq_free                0
word_freq_business            0
word_freq_email               0
word_freq_you                 0
word_freq_credit              0
word_freq_your                0
word_freq_font                0
word_freq_000                 0
word_freq_money               0
word_freq_hp                  0
word_freq_hpl                 0
word_freq_george              0
word_freq_650                 0
word_freq_lab                 0
word_freq_labs                0
word_freq_telnet              0
word_fre

In [50]:
from sklearn.model_selection import KFold

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)

In [51]:
X.describe()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,word_freq_conference,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total
count,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,...,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0
mean,0.104553,0.213015,0.280656,0.065425,0.312223,0.095901,0.114208,0.105295,0.090067,0.239413,...,0.031869,0.038575,0.13903,0.016976,0.269071,0.075811,0.044238,5.191515,52.172789,283.289285
std,0.305358,1.290575,0.504143,1.395151,0.672513,0.273824,0.391441,0.401071,0.278616,0.644755,...,0.285735,0.243471,0.270355,0.109394,0.815672,0.245882,0.429342,31.729449,194.89131,606.347851
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.588,6.0,35.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.065,0.0,0.0,0.0,0.0,2.276,15.0,95.0
75%,0.0,0.0,0.42,0.0,0.38,0.0,0.0,0.0,0.0,0.16,...,0.0,0.0,0.188,0.0,0.315,0.052,0.0,3.706,43.0,266.0
max,4.54,14.28,5.1,42.81,10.0,5.88,7.27,11.11,5.26,18.18,...,10.0,4.385,9.752,4.081,32.478,6.003,19.829,1102.5,9989.0,15841.0


In [52]:
from sklearn.naive_bayes import GaussianNB

In [53]:
classifier = GaussianNB()

In [54]:
classifier.fit(X_train,y_train,)

  y = column_or_1d(y, warn=True)


In [55]:
preds = classifier.predict(X_test)

In [56]:
from sklearn.metrics import confusion_matrix, classification_report 

In [57]:
print(confusion_matrix(preds,y_test))
print(classification_report(preds,y_test))

[[418   7]
 [147 349]]
              precision    recall  f1-score   support

           0       0.74      0.98      0.84       425
           1       0.98      0.70      0.82       496

    accuracy                           0.83       921
   macro avg       0.86      0.84      0.83       921
weighted avg       0.87      0.83      0.83       921



In [62]:
kf = KFold(n_splits=10)
new_classifier = GaussianNB()
for train_index, test_index in kf.split(X_train, y_train):
    X_train_fold, X_val_fold = X_train.iloc[train_index], X_train.iloc[test_index]
    y_train_fold, y_val_fold = y_train.iloc[train_index], y_train.iloc[test_index]
    new_classifier.fit(X_train_fold,y_train_fold)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
