## Naive Bayes Assignment - 2
**By Shahequa Modabbera**

#### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

Ans) To solve this problem, we can use conditional probability. Let's define the following events:

A: Employee uses the company's health insurance plan.

S: Employee is a smoker.

We are given:

P(A) = 0.70 (probability that an employee uses the health insurance plan)

P(S|A) = 0.40 (probability that an employee is a smoker given that they use the health insurance plan)

We want to find:

P(S|A) (probability that an employee is a smoker given that they use the health insurance plan)

We can use Bayes' theorem to calculate the conditional probability:

P(S|A) = P(A|S) * P(S) / P(A)

We are not given the values of P(A|S) or P(S), so we cannot directly calculate P(S|A) using this formula.

However, we can make an assumption about the relationship between the events A and S. Let's assume that the probability of being a smoker is the same for employees who use the health insurance plan and those who don't. In other words, we assume that P(S|A') = P(S), where A' represents the event that an employee does not use the health insurance plan.

With this assumption, we can write:

P(S|A) = P(A|S) * P(S) / P(A) = P(A|S) * P(S) / (P(A|S) * P(S) + P(A|S') * P(S'))

Since we are not given the value of P(A|S') or P(S'), we cannot compute the exact probability. We would need additional information to make a precise calculation.

However, if we assume that the probability of using the health insurance plan is the same for smokers and non-smokers (P(A|S) = P(A|S')), we can simplify the equation as follows:

P(S|A) = P(A|S) * P(S) / (P(A|S) * P(S) + P(A|S) * P(S'))

       = P(A|S) * P(S) / (P(A|S) * P(S) + P(A|S) * (1 - P(S)))

Using the given values:

P(A|S) = 0.40 (probability that an employee uses the health insurance plan given that they are a smoker)

P(S) = 0.40 (probability that an employee is a smoker)

P(S|A) = 0.40 * 0.40 / (0.40 * 0.40 + 0.40 * (1 - 0.40))

       = 0.16 / (0.16 + 0.24)
       
       = 0.16 / 0.40
       
       = 0.4

Therefore, the probability that an employee is a smoker given that they use the health insurance plan is 0.4 or 40%.

#### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Ans) The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the type of data they are suited for.

1. Bernoulli Naive Bayes:
   - Bernoulli Naive Bayes is used when the features are binary or follow a Bernoulli distribution, meaning they can take only two values (0 or 1).
   - It assumes that each feature is independent of each other given the class.
   - It is commonly used for text classification tasks, where the presence or absence of words in a document is used as features.

2. Multinomial Naive Bayes:
   - Multinomial Naive Bayes is used when the features are categorical or discrete, but can have more than two values.
   - It assumes that the features follow a multinomial distribution, where the probability of an event is calculated as the count of occurrences divided by the total count of all events.
   - It is often used in text classification tasks, where the features represent word frequencies or document term frequencies.

In summary, Bernoulli Naive Bayes is suitable for binary features, while Multinomial Naive Bayes is appropriate for discrete features with more than two possible values. The choice between the two depends on the nature of the data and the problem at hand.

#### Q3. How does Bernoulli Naive Bayes handle missing values?

Ans) Bernoulli Naive Bayes does not handle missing values explicitly. When there are missing values in the dataset, they are typically treated as a separate category or considered as an indicator variable.

Here are two common approaches to handle missing values in Bernoulli Naive Bayes:

1. Separate category: Treat missing values as a separate category for each feature. This means that a missing value is considered as a distinct value that can be present or absent. The model will learn the probability of the missing value separately from the other categories.

2. Indicator variable: Create an additional binary feature that indicates whether a value is missing or not. This approach adds a new feature that takes a value of 1 if the original feature is missing and 0 otherwise. The model then treats this indicator variable as a regular binary feature during training and classification.

The choice between these approaches depends on the specific dataset and the nature of the missing values. It is important to consider the potential impact of missing values on the classification task and select the most appropriate handling method.

Examples of how missing values can be handled in Bernoulli Naive Bayes:

1. Separate category:
   Let's say we have a dataset of emails with two features: "contains word 'free'" and "contains word 'sale'". The values for these features are binary: 1 if the word is present in the email and 0 if it is not. If there are missing values for these features, we can introduce a separate category, let's say -1, to represent the missing values. So, a missing value in the "contains word 'free'" feature would be encoded as -1, indicating that the presence or absence of the word is unknown for that email.

2. Indicator variable:
   Continuing with the email example, instead of introducing a separate category, we can create an additional binary feature called "word missing" to indicate whether a value is missing or not. This feature would take a value of 1 if either "contains word 'free'" or "contains word 'sale'" is missing, and 0 otherwise. So, for an email with a missing value in either of these features, the "word missing" feature would be set to 1, indicating that the presence or absence of either word is unknown.

#### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Ans) Yes, Gaussian Naive Bayes can be used for multi-class classification. It is one of the Naive Bayes variants commonly used for handling continuous or numeric features.

In Gaussian Naive Bayes, it is assumed that the features follow a Gaussian (normal) distribution within each class. This assumption allows the algorithm to estimate the mean and standard deviation of each feature for each class. During prediction, the algorithm calculates the probability of an instance belonging to each class based on the Gaussian distribution parameters.

For multi-class classification, Gaussian Naive Bayes can be extended to handle multiple classes by using the "one-vs-all" or "one-vs-one" strategies. 

In the "one-vs-all" strategy, a separate binary classifier is trained for each class, considering it as the positive class and the rest as the negative class. The class with the highest predicted probability is then assigned to the instance.

In the "one-vs-one" strategy, a binary classifier is trained for each pair of classes. During prediction, each binary classifier votes for the class it assigns the instance to, and the class with the most votes is chosen as the predicted class.

So, Gaussian Naive Bayes can handle multi-class classification by using these strategies to extend the binary classification framework to multiple classes.

#### Q5. Assignment:
#### Data preparation:
#### Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.

#### Implementation:
#### Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each classifier.

#### Results:
#### Report the following performance metrics for each classifier:
    Accuracy
    Precision
    Recall
    F1 score

#### Discussion:
#### Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?

#### Conclusion:
#### Summarise your findings and provide some suggestions for future work.

#### Note: This dataset contains a binary classification problem with multiple features. The dataset is relatively small, but it can be used to demonstrate the performance of the different variants of Naive Bayes on a real-world problem.

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_validate
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Download the Spambase dataset from the UCI Machine Learning Repository
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
data = pd.read_csv(url, header=None)

# Split the data into features (X) and labels (y)
X = data.iloc[:, :-1]
y = data.iloc[:, -1]

# Instantiate the classifiers
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Define the performance metrics to evaluate
metrics = ['accuracy', 'precision', 'recall', 'f1']

# Perform cross-validation and evaluate performance metrics
classifiers = {'Bernoulli Naive Bayes': bernoulli_nb, 
               'Multinomial Naive Bayes': multinomial_nb, 
               'Gaussian Naive Bayes': gaussian_nb}

for classifier_name, classifier in classifiers.items():
    scores = cross_validate(classifier, X, y, cv=10, scoring=metrics)
    mean_scores = {metric: np.mean(scores[f'test_{metric}']) for metric in metrics}
    print(f"\nClassifier: {classifier_name}")
    print(f"Accuracy: {mean_scores['accuracy']:.4f}")
    print(f"Precision: {mean_scores['precision']:.4f}")
    print(f"Recall: {mean_scores['recall']:.4f}")
    print(f"F1 Score: {mean_scores['f1']:.4f}")


Classifier: Bernoulli Naive Bayes
Accuracy: 0.8839
Precision: 0.8870
Recall: 0.8152
F1 Score: 0.8481

Classifier: Multinomial Naive Bayes
Accuracy: 0.7863
Precision: 0.7393
Recall: 0.7215
F1 Score: 0.7283

Classifier: Gaussian Naive Bayes
Accuracy: 0.8218
Precision: 0.7104
Recall: 0.9570
F1 Score: 0.8131


Ans) In this implementation, we first download the Spambase dataset from the UCI Machine Learning Repository using the provided URL. The dataset is then loaded into a pandas DataFrame, where the last column represents the target variable (spam or not spam) and the preceding columns are the input features.

We then instantiate three Naive Bayes classifiers: BernoulliNB for binary features, MultinomialNB for discrete features, and GaussianNB for continuous features.

Using 10-fold cross-validation, we evaluate the performance of each classifier by calculating the mean scores for the accuracy, precision, recall, and F1 score metrics. The `cross_val_score` function performs the cross-validation, and we pass the metrics to the `scoring` parameter to specify the metrics to compute.

Finally, we print the performance metrics for each classifier.

Discussion:
Based on the results obtained, we can analyze the performance of each Naive Bayes variant:

- Bernoulli Naive Bayes: It performs well on binary features where the presence or absence of a feature is considered. It may have worked well if the dataset has many binary features.

- Multinomial Naive Bayes: It is suitable for discrete features where the feature represents counts or frequencies. This variant may have performed well if the dataset contains discrete features such as word counts or frequencies of specific words in the email messages.

- Gaussian Naive Bayes: It assumes that the features follow a Gaussian distribution. However, this dataset contains a mixture of binary, discrete, and continuous features. Gaussian Naive Bayes assumes continuous features, so it may not have performed as well as the other variants in this case.

From the results, it is important to note that the performance of each variant depends on the nature of the dataset and the specific characteristics of the features. In this case, either Bernoulli or Multinomial Naive Bayes may have performed better since the dataset consists of binary and discrete features.

Limitations of Naive Bayes include its strong independence assumption, which may not hold in all