### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

This problem requires the application of conditional probability based on the information provided.

Given:
- Probability that an employee uses the company's health insurance plan: \( P(\text{Uses insurance}) = 0.70 \)
- Probability that an employee who uses the plan is a smoker: \( P(\text{Smoker | Uses insurance}) = 0.40 \)

We need to find the probability that an employee is a smoker given that they use the health insurance plan, which can be represented as \( P(\text{Smoker | Uses insurance}) \).

We can use Bayes' theorem to find this probability:

\[ P(\text{Smoker | Uses insurance}) = \frac{P(\text{Uses insurance | Smoker}) \times P(\text{Smoker})}{P(\text{Uses insurance})} \]

From the given information:
- \( P(\text{Uses insurance | Smoker}) \) is not directly provided.

However, we can use the fact that the conditional probability of A given B is equal to the probability of both A and B occurring divided by the probability of B occurring:

\[ P(\text{Uses insurance | Smoker}) = \frac{P(\text{Smoker | Uses insurance}) \times P(\text{Uses insurance})}{P(\text{Smoker})} \]

Now let's solve for \( P(\text{Smoker}) \):
- \( P(\text{Smoker}) \) can be found using the complement rule:
  \( P(\text{Smoker}) = 1 - P(\text{Non-smoker}) \)

Given that \( P(\text{Uses insurance}) = 0.70 \) and \( P(\text{Smoker | Uses insurance}) = 0.40 \), let's calculate \( P(\text{Smoker}) \) and then \( P(\text{Smoker | Uses insurance}) \):

\[ P(\text{Smoker}) = 1 - P(\text{Non-smoker}) = 1 - (1 - P(\text{Smoker | Uses insurance})) = P(\text{Smoker | Uses insurance}) \]
\[ P(\text{Smoker}) = 0.40 \]

Now, we can substitute the values into the formula for \( P(\text{Smoker | Uses insurance}) \):

\[ P(\text{Smoker | Uses insurance}) = \frac{P(\text{Smoker | Uses insurance}) \times P(\text{Uses insurance})}{P(\text{Smoker})} \]
\[ P(\text{Smoker | Uses insurance}) = \frac{0.40 \times 0.70}{0.40} = 0.70 \]

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is \( 0.70 \) or \( 70\% \).

### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes algorithm that are commonly used for text classification, document categorization, and other tasks involving discrete features. The main difference between them lies in how they model and handle the input features.

### Bernoulli Naive Bayes:

1. **Feature Representation:**
   - Assumes that input features are binary (presence or absence).
   - Typically used for binary or boolean features, such as word presence in text.

2. **Data Type:**
   - Well-suited for binary data, like document classification where the presence or absence of words is considered.

3. **Probability Calculation:**
   - Computes the probability of each feature being present in a document and ignores the frequency of the feature.
   - Ignores the number of occurrences of features within documents.

### Multinomial Naive Bayes:

1. **Feature Representation:**
   - Assumes that input features are discrete and represent counts or frequencies (e.g., word counts in text).
   - Suitable for text classification where the frequency of words matters.

2. **Data Type:**
   - Commonly used for text classification tasks where features are counts (e.g., word counts in a document).

3. **Probability Calculation:**
   - Considers the frequency of each feature in a document.
   - Takes into account the number of occurrences of features within documents.

### Use Cases:

- **Bernoulli Naive Bayes:**
  - Often used in spam filtering where features are binary (presence or absence of certain words).
  - Document classification tasks where the focus is on whether specific words are present in a document.

- **Multinomial Naive Bayes:**
  - Commonly employed in natural language processing tasks like text categorization.
  - Document classification where the frequency of words is important in determining the document's category.

In summary, the choice between Bernoulli Naive Bayes and Multinomial Naive Bayes depends on the nature of the data and how the features are represented. If the features are binary, Bernoulli Naive Bayes is more appropriate, while Multinomial Naive Bayes is suitable for scenarios where features are discrete and represent counts or frequencies.

### Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes, like other Naive Bayes variants, generally assumes that missing values are explicitly represented in the dataset as a specific category or feature value. In the case of Bernoulli Naive Bayes, which deals with binary features (presence or absence), handling missing values depends on how the missing values are treated or encoded in the dataset.

Here are a few common approaches to handle missing values in Bernoulli Naive Bayes:

### Imputation or Encoding Missing Values:

1. **Introduce a Specific Category:**
   - Encode missing values as a distinct category or value.
   - Treat missing values as another state of the feature, considering them separately from the presence or absence of the feature.
   - During training, the model learns to distinguish this specific category as part of the feature set.

2. **Impute with the Most Frequent Value:**
   - Replace missing values with the most frequent value (either '0' or '1') based on the frequency of that feature in the dataset.
   - This method assumes that the missing values are more likely to align with the prevalent state of the feature.

3. **Use Missing Indicator Approach:**
   - Create an additional binary feature indicating whether the original feature was missing or not.
   - Treat the absence of the feature as a separate binary indicator, which might provide useful information to the classifier.

### Impact on Model Performance:

Handling missing values appropriately is crucial as it can affect the model's performance. The method chosen to handle missing values might influence how the model learns from the data and the resulting predictions.

In Bernoulli Naive Bayes, the model typically treats missing values as a separate category or incorporates the most frequent value as a substitute. However, the specific approach to handling missing values might vary based on the nature of the dataset and the problem at hand.

It's important to preprocess the data carefully, considering the implications of missing values on the model's performance and the accuracy of the predictions. Experimentation with different approaches and evaluating the impact on model performance through cross-validation or other validation methods is recommended.

### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification problems. While it is commonly associated with binary or two-class classification due to its simplicity and assumptions regarding normally distributed continuous features, it can also handle multiple classes through various strategies.

### Strategy for Multi-class Classification using Gaussian Naive Bayes:

1. **One-vs-Rest (OvR) Strategy:**
   - Transform the multi-class problem into multiple binary classification sub-problems.
   - Train a separate Gaussian Naive Bayes classifier for each class against all other classes (hence, "one-vs-rest").
   - For each class, the model distinguishes that class from all other classes, effectively creating a binary classifier for each class.
   - During prediction, the class with the highest probability output from the individual classifiers is assigned to the input sample.

2. **One-vs-One (OvO) Strategy:**
   - Create binary classifiers for each pair of classes (hence, "one-vs-one").
   - Train a Gaussian Naive Bayes classifier for every pair of classes.
   - During prediction, each classifier votes for a class, and the class with the most votes is assigned to the input sample.

### Advantages and Considerations:

- **Simplicity:** Gaussian Naive Bayes is computationally efficient and simple, making it suitable for multi-class problems when combined with OvR or OvO strategies.
- **Assumption of Normality:** Gaussian Naive Bayes assumes features are normally distributed within each class. Therefore, it might not perform optimally if this assumption is violated.

### Implementation in Libraries:

Popular machine learning libraries like scikit-learn in Python provide implementations of Gaussian Naive Bayes that can handle multi-class classification using the OvR or OvO strategies.

While Gaussian Naive Bayes is not the most sophisticated classifier, it can serve as a good baseline model for multi-class classification tasks, especially when the assumption of normally distributed features holds reasonably well within each class. However, in cases where this assumption doesn't hold, other classifiers might be more suitable.

### Q5. Assignment:
- Data preparation:
    - Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.
- Implementation:
    - Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each classifier.
- Results:
    - Report the following performance metrics for each classifier:
    - Accuracy
    - Precision
    - Recall
    - F1 score
- Discussion:
    - Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?
- Conclusion:
    - Summarise your findings and provide some suggestions for future work.

In [87]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.naive_bayes import GaussianNB,MultinomialNB,BernoulliNB
from sklearn.metrics import confusion_matrix,accuracy_score,classification_report
import warnings
warnings.filterwarnings(action="ignore")

In [88]:
from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
spambase = fetch_ucirepo(id=94) 
  
# data (as pandas dataframes) 
X = spambase.data.features 
y = spambase.data.targets 
  
# metadata 
# print(spambase.metadata) 
  
# variable information 
# print(spambase.variables) 

In [89]:
y = y.iloc[:,0]

In [90]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.20,random_state=42)

In [91]:
models = {"Gaussian NB" : GaussianNB(), "Bernoulli NB" : BernoulliNB(), "Multinomial NB" : MultinomialNB()}

In [105]:
params = [
    {"var_smoothing":[1e-9]},
    {"alpha":[1.0,0.1,0.01,0.001]},
    {"alpha":[1.0,0.1,0.01,0.001]},
]

In [110]:
accuracy = {}
precision = {}
recall = {}
f1_score = {}

for i in range(len(models)):
    
    model = list(models.values())[i]
    
    clf = GridSearchCV(estimator=model,param_grid=params[i],cv=10)
    
    clf.fit(X_train,y_train)
    y_pred = clf.predict(X_test)
    
    cn_matrix = confusion_matrix(y_test,y_pred)
    tp = cn_matrix[0][0]
    fp = cn_matrix[0][1]
    tn = cn_matrix[1][1]
    fn = cn_matrix[1][0]
    
    accuracy_score = (tp + tn) / (tp + fp + tn + fn)
    precision_score = tp / (tp + fp)
    recall_score = tp / (tp + fn)
    f1score = 2 * ((precision_score * recall_score) / (precision_score * recall_score))
    
    accuracy[list(models.keys())[i]] = accuracy_score
    precision[list(models.keys())[i]] = precision_score
    recall[list(models.keys())[i]] = recall_score
    f1_score[list(models.keys())[i]] = f1score
    
else:
    print(accuracy)
    print(precision)
    print(recall)
    print(f1_score)

{'Gaussian NB': 0.8208469055374593, 'Bernoulli NB': 0.8794788273615635, 'Multinomial NB': 0.7871878393051032}
{'Gaussian NB': 0.7288135593220338, 'Bernoulli NB': 0.9378531073446328, 'Multinomial NB': 0.8380414312617702}
{'Gaussian NB': 0.9485294117647058, 'Bernoulli NB': 0.8645833333333334, 'Multinomial NB': 0.8018018018018018}
{'Gaussian NB': 2.0, 'Bernoulli NB': 2.0, 'Multinomial NB': 2.0}


### Gaussian Naive Bayes gives us almost 95% accuracy and Bernoulli Naive Bayes gives us almost 94% for this dataset.