Q1. A company conducted a survey of its employees and found that 70% of the employees use the
company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the
probability that an employee is a smoker given that he/she uses the health insurance plan?

Let's define the events:

- A: Employee uses health insurance plan.
- B: Employee is a smoker.

We are given:

- P(A) = 0.70 (70% use the plan)
- P(B/A) = 0.40 (40% of plan users are smokers)     

We want to find P(B/A), which is the probability that an employee is a smoker given that they use the health insurance plan.

We already have this probability: P(B/A) = 0.40.

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 40%.

Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

### Bernoulli Naive Bayes vs. Multinomial Naive Bayes

**Bernoulli Naive Bayes:**
- Treats each feature as a binary variable (present or absent).
- Text classification tasks where the presence or absence of a word is more important than its frequency. For example, document classification based on keywords.

**Multinomial Naive Bayes:**
- Treats each feature as a count of occurrences.
- Text classification tasks where the frequency of words is important. For example, classifying news articles into categories based on the frequency of certain words.

Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes doesn't have a built-in mechanism to handle missing values directly. However, there are common strategies to address missing data before applying the algorithm:     


- Replace missing values with the mean or median of the respective feature. This is a simple approach, but it can distort the distribution of the data, especially for skewed features.
- Replace missing values with the most frequent value of the feature. This is suitable for categorical features.
- Remove entire instances (rows) with missing values. This can lead to significant data loss, especially if many instances have missing data.
- Remove only the pairs of observations with missing values for a particular analysis. This can be more efficient than listwise deletion but can still lead to some data loss.
- For Bernoulli Naive Bayes, one approach is to simply ignore features with missing values during the calculation of probabilities. This is because the algorithm relies on the presence or absence of features, and missing values can be treated as "absent".

Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification.

Q5. Assignment:      
**Data preparation:**    
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.     

**Implementation:**
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.     

**Results:**   
Report the following performance metrics for each classifier:       
Accuracy      
Precision        
Recall         
F1 score          

**Discussion:**
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?   

**Conclusion:**
Summarise your findings and provide some suggestions for future work.


In [2]:
import pandas as pd

In [3]:
dataset = pd.read_csv('spambase.data', header=None)

In [4]:
dataset.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,48,49,50,51,52,53,54,55,56,57
count,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,...,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0
mean,0.104553,0.213015,0.280656,0.065425,0.312223,0.095901,0.114208,0.105295,0.090067,0.239413,...,0.038575,0.13903,0.016976,0.269071,0.075811,0.044238,5.191515,52.172789,283.289285,0.394045
std,0.305358,1.290575,0.504143,1.395151,0.672513,0.273824,0.391441,0.401071,0.278616,0.644755,...,0.243471,0.270355,0.109394,0.815672,0.245882,0.429342,31.729449,194.89131,606.347851,0.488698
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.588,6.0,35.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.065,0.0,0.0,0.0,0.0,2.276,15.0,95.0,0.0
75%,0.0,0.0,0.42,0.0,0.38,0.0,0.0,0.0,0.0,0.16,...,0.0,0.188,0.0,0.315,0.052,0.0,3.706,43.0,266.0,1.0
max,4.54,14.28,5.1,42.81,10.0,5.88,7.27,11.11,5.26,18.18,...,4.385,9.752,4.081,32.478,6.003,19.829,1102.5,9989.0,15841.0,1.0


In [5]:
dataset.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,48,49,50,51,52,53,54,55,56,57
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [6]:
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]

In [7]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

In [8]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [9]:
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from  sklearn.model_selection import GridSearchCV

In [10]:
param ={'priors':[None], 'var_smoothing':[1e-09]}

In [11]:
grid  =GridSearchCV(GaussianNB(),param_grid=param, cv= 10, verbose=3, scoring='accuracy')

In [12]:
grid.fit(X_train,y_train)

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV 1/10] END .priors=None, var_smoothing=1e-09;, score=0.857 total time=   0.0s
[CV 2/10] END .priors=None, var_smoothing=1e-09;, score=0.848 total time=   0.0s
[CV 3/10] END .priors=None, var_smoothing=1e-09;, score=0.845 total time=   0.0s
[CV 4/10] END .priors=None, var_smoothing=1e-09;, score=0.839 total time=   0.0s
[CV 5/10] END .priors=None, var_smoothing=1e-09;, score=0.795 total time=   0.0s
[CV 6/10] END .priors=None, var_smoothing=1e-09;, score=0.823 total time=   0.0s
[CV 7/10] END .priors=None, var_smoothing=1e-09;, score=0.832 total time=   0.0s
[CV 8/10] END .priors=None, var_smoothing=1e-09;, score=0.807 total time=   0.0s
[CV 9/10] END .priors=None, var_smoothing=1e-09;, score=0.804 total time=   0.0s
[CV 10/10] END priors=None, var_smoothing=1e-09;, score=0.845 total time=   0.0s


In [13]:
y_pred = grid.predict(X_test)

In [14]:
print('accuracy:',accuracy_score(y_test, y_pred))
print('precision:',precision_score(y_test, y_pred))
print('recall:',recall_score(y_test, y_pred))
print('f1:',f1_score(y_test, y_pred))

accuracy: 0.8124547429398986
precision: 0.7
recall: 0.9391771019677997
f1: 0.8021390374331551


In [15]:
param = {'alpha':[1.0], 'force_alpha':[True],
         'fit_prior':[True], 'class_prior':[None]}

In [16]:
grid  =GridSearchCV(MultinomialNB(),param_grid=param, cv= 10, verbose=3, scoring='accuracy')

In [17]:
grid.fit(X_train,y_train)

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV 1/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.823 total time=   0.0s
[CV 2/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.776 total time=   0.0s
[CV 3/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.801 total time=   0.0s
[CV 4/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.848 total time=   0.0s
[CV 5/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.780 total time=   0.0s
[CV 6/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.826 total time=   0.0s
[CV 7/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.783 total time=   0.0s
[CV 8/10] END alpha=1.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.801 total time=   0.0s
[CV 9/10] END alpha=1.0, class_prior=None, fit_prior=True, force_al

In [18]:
y_pred = grid.predict(X_test)

In [19]:
print('accuracy:',accuracy_score(y_test, y_pred))
print('precision:',precision_score(y_test, y_pred))
print('recall:',recall_score(y_test, y_pred))
print('f1:',f1_score(y_test, y_pred))

accuracy: 0.8095582910934106
precision: 0.7730627306273062
recall: 0.7495527728085868
f1: 0.7611262488646685


In [20]:
param = {'alpha':[1.0], 'force_alpha':[True],
         'binarize':[0.0],
         'fit_prior':[True], 'class_prior':[None]}

In [21]:
grid  =GridSearchCV(BernoulliNB(),param_grid=param, cv= 10, verbose=3, scoring='accuracy')

In [22]:
grid.fit(X_train,y_train)

Fitting 10 folds for each of 1 candidates, totalling 10 fits
[CV 1/10] END alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.910 total time=   0.0s
[CV 2/10] END alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.863 total time=   0.0s
[CV 3/10] END alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.870 total time=   0.0s
[CV 4/10] END alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.876 total time=   0.0s
[CV 5/10] END alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.894 total time=   0.0s
[CV 6/10] END alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.882 total time=   0.0s
[CV 7/10] END alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True, force_alpha=True;, score=0.888 total time=   0.0s
[CV 8/10] END alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True, force

In [23]:
y_pred = grid.predict(X_test)

In [24]:
print('accuracy:',accuracy_score(y_test, y_pred))
print('precision:',precision_score(y_test, y_pred))
print('recall:',recall_score(y_test, y_pred))
print('f1:',f1_score(y_test, y_pred))

accuracy: 0.8754525706010138
precision: 0.8924949290060852
recall: 0.7871198568872988
f1: 0.8365019011406845


Conclusion:

This script demonstrates the implementation and evaluation of different Naive Bayes variants for spam classification.