**Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?**

To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we can use the formula for conditional probability:

$ P(\text{Smoker | Uses insurance}) = \frac{P(\text{Smoker and Uses insurance})}{P(\text{Uses insurance})} $

From the information given:
- $ P(\text{Uses insurance}) = 0.70 $ (70% of employees use the insurance plan)
- $ P(\text{Smoker and Uses insurance}) = 0.40 \times 0.70 $ (40% of employees who use the plan are smokers)

Thus, we have:

$ P(\text{Smoker | Uses insurance}) = \frac{0.40 \times 0.70}{0.70} = 0.40 $

So, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.40 or 40%.

**Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?**

Bernoulli Naive Bayes:
- Assumes that features are binary variables (i.e., they take on values of 0 or 1).
- Typically used for text classification tasks where the presence or absence of words in a document is considered, represented as binary "word present" or "word absent" features.
- It models the presence or absence of each feature independently, given the class label.

Multinomial Naive Bayes:
- Assumes that features represent counts or frequencies of events (e.g., word counts in a document).
- Commonly used in text classification tasks where features are represented as word counts or term frequencies.
- It models the likelihood of observing each feature's value (word count or frequency) given the class label.

**Q3. How does Bernoulli Naive Bayes handle missing values?**

- Treating missing values as a separate category: One approach is to encode missing values as a separate binary category. For example, if a feature is binary (0 or 1), a missing value could be represented as a third category (e.g., -1), indicating that the value is missing. This approach allows the algorithm to explicitly consider the absence of information as a feature.

- Imputation with the most common value (mode): Another common approach is to impute missing values with the most common value (mode) for each feature. For binary features, this would mean imputing missing values with the mode of the observed data (either 0 or 1). This approach assumes that missing values are similar to the most common value for the feature and may work well when the data is missing completely at random.

**Q4. Can Gaussian Naive Bayes be used for multi-class classification?**

Yes, Gaussian Naive Bayes can be used for multi-class classification tasks. 

In Gaussian Naive Bayes, it's assumed that the features follow a Gaussian (normal) distribution. Each feature is modeled as being generated from a simple Gaussian distribution, with the mean and standard deviation of each feature calculated from the training data.

For multi-class classification, Gaussian Naive Bayes extends naturally by applying the Bayes' theorem to estimate the probability of each class given the observed features. It calculates the posterior probability of each class for a given set of features and selects the class with the highest probability as the predicted class.

In summary, Gaussian Naive Bayes can handle multi-class classification by estimating the likelihood of each class given the observed features and selecting the class with the highest posterior probability as the predicted class.

**Q5. Assignment:**

**Data preparation:         
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.**

In [1]:
pip install ucimlrepo

Note: you may need to restart the kernel to use updated packages.


In [2]:
from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
spambase = fetch_ucirepo(id=94) 
  
# data (as pandas dataframes) 
X = spambase.data.features 
y = spambase.data.targets 
  
# metadata 
print(spambase.metadata) 
  
# variable information 
print(spambase.variables) 


{'uci_id': 94, 'name': 'Spambase', 'repository_url': 'https://archive.ics.uci.edu/dataset/94/spambase', 'data_url': 'https://archive.ics.uci.edu/static/public/94/data.csv', 'abstract': 'Classifying Email as Spam or Non-Spam', 'area': 'Computer Science', 'tasks': ['Classification'], 'characteristics': ['Multivariate'], 'num_instances': 4601, 'num_features': 57, 'feature_types': ['Integer', 'Real'], 'demographics': [], 'target_col': ['Class'], 'index_col': None, 'has_missing_values': 'no', 'missing_values_symbol': None, 'year_of_dataset_creation': 1999, 'last_updated': 'Mon Aug 28 2023', 'dataset_doi': '10.24432/C53G6X', 'creators': ['Mark Hopkins', 'Erik Reeber', 'George Forman', 'Jaap Suermondt'], 'intro_paper': None, 'additional_info': {'summary': 'The "spam" concept is diverse: advertisements for products/web sites, make money fast schemes, chain letters, pornography...\n\nThe classification task for this dataset is to determine whether a given email is spam or not.\n\t\nOur collecti

In [3]:
X

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,word_freq_conference,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total
0,0.00,0.64,0.64,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.000,0.0,0.778,0.000,0.000,3.756,61,278
1,0.21,0.28,0.50,0.0,0.14,0.28,0.21,0.07,0.00,0.94,...,0.0,0.000,0.132,0.0,0.372,0.180,0.048,5.114,101,1028
2,0.06,0.00,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.0,0.010,0.143,0.0,0.276,0.184,0.010,9.821,485,2259
3,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.0,0.000,0.137,0.0,0.137,0.000,0.000,3.537,40,191
4,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.0,0.000,0.135,0.0,0.135,0.000,0.000,3.537,40,191
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4596,0.31,0.00,0.62,0.0,0.00,0.31,0.00,0.00,0.00,0.00,...,0.0,0.000,0.232,0.0,0.000,0.000,0.000,1.142,3,88
4597,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.000,0.0,0.353,0.000,0.000,1.555,4,14
4598,0.30,0.00,0.30,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.0,0.102,0.718,0.0,0.000,0.000,0.000,1.404,6,118
4599,0.96,0.00,0.00,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.057,0.0,0.000,0.000,0.000,1.147,5,78


In [4]:
y

Unnamed: 0,Class
0,1
1,1
2,1
3,1
4,1
...,...
4596,0
4597,0
4598,0
4599,0


In [5]:
X.isnull().sum()

word_freq_make                0
word_freq_address             0
word_freq_all                 0
word_freq_3d                  0
word_freq_our                 0
word_freq_over                0
word_freq_remove              0
word_freq_internet            0
word_freq_order               0
word_freq_mail                0
word_freq_receive             0
word_freq_will                0
word_freq_people              0
word_freq_report              0
word_freq_addresses           0
word_freq_free                0
word_freq_business            0
word_freq_email               0
word_freq_you                 0
word_freq_credit              0
word_freq_your                0
word_freq_font                0
word_freq_000                 0
word_freq_money               0
word_freq_hp                  0
word_freq_hpl                 0
word_freq_george              0
word_freq_650                 0
word_freq_lab                 0
word_freq_labs                0
word_freq_telnet              0
word_fre

In [6]:
X.describe()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,word_freq_conference,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total
count,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,...,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0
mean,0.104553,0.213015,0.280656,0.065425,0.312223,0.095901,0.114208,0.105295,0.090067,0.239413,...,0.031869,0.038575,0.13903,0.016976,0.269071,0.075811,0.044238,5.191515,52.172789,283.289285
std,0.305358,1.290575,0.504143,1.395151,0.672513,0.273824,0.391441,0.401071,0.278616,0.644755,...,0.285735,0.243471,0.270355,0.109394,0.815672,0.245882,0.429342,31.729449,194.89131,606.347851
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.588,6.0,35.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.065,0.0,0.0,0.0,0.0,2.276,15.0,95.0
75%,0.0,0.0,0.42,0.0,0.38,0.0,0.0,0.0,0.0,0.16,...,0.0,0.0,0.188,0.0,0.315,0.052,0.0,3.706,43.0,266.0
max,4.54,14.28,5.1,42.81,10.0,5.88,7.27,11.11,5.26,18.18,...,10.0,4.385,9.752,4.081,32.478,6.003,19.829,1102.5,9989.0,15841.0


**Implementation:    
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.**

In [7]:
from sklearn.model_selection import train_test_split

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25,random_state=0)

**Bernoulli Naive Bayes**

In [9]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import GridSearchCV

In [10]:
param_grid = {'alpha':[0.1,0.5,1.0]}

In [11]:
grid_bernaulli = GridSearchCV(BernoulliNB(),param_grid=param_grid,cv=10)

In [12]:
grid_bernaulli.fit(X_train,y_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = colu

In [13]:
grid_bernaulli.best_params_

{'alpha': 0.1}

In [14]:
y_pred_bernaulli = grid_bernaulli.predict(X_test)

In [15]:
y_pred_bernaulli

array([1, 0, 0, ..., 0, 0, 0])

**Multinomial Naive Bayes**

In [16]:
from sklearn.naive_bayes import MultinomialNB

In [17]:
grid_multinomial = GridSearchCV(MultinomialNB(),param_grid=param_grid,cv=10)

In [19]:
grid_multinomial.fit(X_train,y_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = colu

In [20]:
grid_multinomial.best_params_

{'alpha': 1.0}

In [21]:
y_pred_multinomial = grid_multinomial.predict(X_test)

In [22]:
y_pred_multinomial

array([1, 0, 0, ..., 0, 0, 0])

**Gaussian Naive Bayes**

In [23]:
from sklearn.naive_bayes import GaussianNB

In [29]:
param_grid_gaussian = {
    'var_smoothing': [0.00000001, 0.000000001, 0.00000001]
}
grid_gaussian = GridSearchCV(GaussianNB(),param_grid=param_grid_gaussian,cv=10)

In [30]:
grid_gaussian.fit(X_train,y_train)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = colu

In [31]:
grid_gaussian.best_params_

{'var_smoothing': 1e-09}

In [33]:
y_pred_gaussian = grid_gaussian.predict(X_test)

In [34]:
y_pred_gaussian

array([1, 1, 0, ..., 0, 0, 1])

**Results:        
Report the following performance metrics for each classifier:
Accuracy     
Precision       
Recall       
F1 score**         


In [35]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

In [37]:
print('accuracy of bernaulli:', accuracy_score(y_test,y_pred_bernaulli))
print('precision of bernaulli:', precision_score(y_test,y_pred_bernaulli))
print('recall of bernaulli:', recall_score(y_test,y_pred_bernaulli))
print('F1 score of bernaulli:', f1_score(y_test,y_pred_bernaulli))
print('classification report of bernaulli: \n', classification_report(y_test,y_pred_bernaulli))

accuracy of bernaulli: 0.8757602085143353
precision of bernaulli: 0.8819277108433735
recall of bernaulli: 0.7956521739130434
F1 score of bernaulli: 0.8365714285714285
classification report of bernaulli: 
               precision    recall  f1-score   support

           0       0.87      0.93      0.90       691
           1       0.88      0.80      0.84       460

    accuracy                           0.88      1151
   macro avg       0.88      0.86      0.87      1151
weighted avg       0.88      0.88      0.87      1151



In [38]:
print('accuracy of Multinomial:', accuracy_score(y_test,y_pred_multinomial))
print('precision of Multinomial:', precision_score(y_test,y_pred_multinomial))
print('recall of Multinomial:', recall_score(y_test,y_pred_multinomial))
print('F1 score of Multinomial:', f1_score(y_test,y_pred_multinomial))
print('classification report of Multinomial: \n', classification_report(y_test,y_pred_multinomial))

accuracy of Multinomial: 0.7993049522154648
precision of Multinomial: 0.7584650112866818
recall of Multinomial: 0.7304347826086957
F1 score of Multinomial: 0.7441860465116279
classification report of Multinomial: 
               precision    recall  f1-score   support

           0       0.82      0.85      0.83       691
           1       0.76      0.73      0.74       460

    accuracy                           0.80      1151
   macro avg       0.79      0.79      0.79      1151
weighted avg       0.80      0.80      0.80      1151



In [39]:
print('accuracy of Gaussian:', accuracy_score(y_test,y_pred_gaussian))
print('precision of Gaussian:', precision_score(y_test,y_pred_gaussian))
print('recall of Gaussian:', recall_score(y_test,y_pred_gaussian))
print('F1 score of Gaussian:', f1_score(y_test,y_pred_gaussian))
print('classification report of Gaussian: \n', classification_report(y_test,y_pred_gaussian))

accuracy of Gaussian: 0.8027801911381407
precision of Gaussian: 0.6876006441223832
recall of Gaussian: 0.9282608695652174
F1 score of Gaussian: 0.7900092506938019
classification report of Gaussian: 
               precision    recall  f1-score   support

           0       0.94      0.72      0.81       691
           1       0.69      0.93      0.79       460

    accuracy                           0.80      1151
   macro avg       0.81      0.82      0.80      1151
weighted avg       0.84      0.80      0.80      1151



**Discussion:        
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?**

From the results, it's evident that Bernoulli Naive Bayes outperforms both Multinomial and Gaussian Naive Bayes in terms of overall accuracy and F1 score. However, it's crucial to consider the specific requirements of the task. In the case of spam detection, precision is indeed an important metric as we want to minimize false positives (non-spam emails classified as spam), which could lead to legitimate emails being wrongly flagged.

Bernoulli Naive Bayes achieves the highest precision among the three classifiers, indicating its ability to minimize false positives. This makes it particularly suitable for spam classification tasks where precision is a priority.

Limitations of Naive Bayes:

- Naive Bayes assumes independence among features, which may not hold true in some datasets. This simplification could lead to suboptimal performance.
- It performs poorly when faced with unseen feature-label combinations during training.
- It's sensitive to irrelevant features and may not handle noisy data well.

**Conclusion:        
Summarise your findings and provide some suggestions for future work.**

Conclusion:        
In conclusion, based on the results and the priority of precision in spam classification tasks, Bernoulli Naive Bayes emerges as the best-performing variant for this specific dataset. However, it's essential to consider the trade-offs between precision and other metrics based on the specific requirements of the application. Future work could involve exploring more sophisticated machine learning models tailored for text classification tasks and addressing the limitations of Naive Bayes through feature engineering and ensemble methods.