#### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?


To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we can use conditional probability.

Let's denote:

A: Event that an employee uses the health insurance plan.
    
B: Event that an employee is a smoker.
    
We're given:

P(A) = 0.70 (probability that an employee uses the health insurance plan)

P(B|A) = 0.40 (probability that an employee is a smoker given that they use the health insurance plan)

We want to find P(B|A), the probability that an employee is a smoker given that they use the health insurance plan, which is the conditional probability of B given A.

Using the conditional probability formula:
    
P(B∣A)= P(A∩B)/P(A)

We can rearrange the formula to solve for P(A∩B)=P(B∣A)×P(A)

Substituting the given values:

    P(A∩B)=0.40×0.70=0.28

So, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.28, or 28%.








#### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?


#### Bernoulli Naive Bayes
#### Assumption about Data: 
            It assumes that all features are binary such that they take only two values (e.g., 0 or 1). A common example is text classification where each word in the vocabulary is either present or absent in a document.
#### Use Case:
           It's particularly useful for binary/bag of words count features.
#### Mathematical Model:
           The probability of each feature belonging to a certain class is modeled using a Bernoulli distribution. Multinomial 
#### Naive Bayes
#### Assumption about Data:
           It assumes that features represent the frequencies with which certain events have been generated by a multinomial distribution. This is the case where each feature counts something, such as the number of times a word appears in a document.
#### Use Case:
           It's used for discrete data and is particularly suited for count-based feature vectors (e.g., word counts in text classification).
#### Mathematical Model: 
           The probability of observing a sampled vector is given by a multinomial distribution, reflecting the count of features (e.g., word counts).
           
           


#### Q3. How does Bernoulli Naive Bayes handle missing values?

In the Bernoulli Naive Bayes model, each feature of the input vector represents the presence or absence of something (e.g., a word in text classification). The feature vector for a document (or any instance) is a binary vector where:

1 indicates the presence of the feature (e.g., a word is present in the document),

0 indicates the absence of the feature (e.g., a word is not present in the document).
#### Handling "Missing" Features

In the context of Bernoulli Naive Bayes, a "missing" feature is essentially treated as an absence (0). This model is based on binary outcomes for each feature; hence, if a feature (word) does not occur in a document, it is naturally treated as absent. This is a critical part of the model's design and not an additional mechanism to handle missing data.

#### Example
If your vocabulary includes the words ["love", "action", "comedy", "drama"] and you're classifying movie reviews, a review that mentions only "comedy" and "drama" would be represented as [0, 0, 1, 1]. The words "love" and "action" are considered absent for this review. There's no need to "handle" the fact that "love" and "action" weren't mentioned; their absence is informative on its own and is directly used in calculating the probabilities for classification.

#### Traditional Missing Values
In more conventional scenarios where you might have missing values (e.g., unknown attributes in a dataset), the strategy for dealing with these would differ based on the model and situation:

Some models might ignore features with missing values during training or prediction.
Others might require you to impute missing values (fill them in with estimated or average values) before training or classification.

However, for Bernoulli Naive Bayes, the concept of a missing value as it's traditionally understood (e.g., an unknown measurement or an empty field in a dataset) doesn't directly apply. The model's binary nature automatically incorporates the absence of features as part of its normal operation.

#### Q4. Can Gaussian Naive Bayes be used for multi-class classification?


Yes, Gaussian Naive Bayes can be used for multi-class classification. The model is quite flexible and not limited to binary classification tasks. In a multi-class setting, Gaussian Naive Bayes assumes that the continuous values associated with each feature are distributed according to a Gaussian (normal) distribution for each class.

#### Q5. Assignment:
#### Data preparation:
#### Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message is spam or not based on several input features.
#### Implementation:
#### Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
#### scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
#### dataset. You should use the default hyperparameters for each classifier.
#### Results:
#### Report the following performance metrics for each classifier:
#### Accuracy
#### Precision
#### Recall
#### F1 score
##### Discussion:
#### Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?
#### Conclusion:
#### Summarise your findings and provide some suggestions for future work.

#### Note: This dataset contains a binary classification problem with multiple features. The dataset is relatively small, but it can be used to demonstrate the performance of the different variants of NaiveBayes on a real-world problem.

In [6]:
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load dataset
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
names = ["word_freq_make", "word_freq_address", "word_freq_all", "word_freq_3d", "word_freq_our",
         "word_freq_over", "word_freq_remove", "word_freq_internet", "word_freq_order", "word_freq_mail",
         "word_freq_receive", "word_freq_will", "word_freq_people", "word_freq_report", "word_freq_addresses",
         "word_freq_free", "word_freq_business", "word_freq_email", "word_freq_you", "word_freq_credit",
         "word_freq_your", "word_freq_font", "word_freq_000", "word_freq_money", "word_freq_hp", "word_freq_hpl",
         "word_freq_george", "word_freq_650", "word_freq_lab", "word_freq_labs", "word_freq_telnet",
         "word_freq_857", "word_freq_data", "word_freq_415", "word_freq_85", "word_freq_technology",
         "word_freq_1999", "word_freq_parts", "word_freq_pm", "word_freq_direct", "word_freq_cs",
         "word_freq_meeting", "word_freq_original", "word_freq_project", "word_freq_re", "word_freq_edu",
         "word_freq_table", "word_freq_conference", "char_freq_;", "char_freq_(", "char_freq_[", "char_freq_!",
         "char_freq_$", "char_freq_hash", "capital_run_length_average", "capital_run_length_longest",
         "capital_run_length_total", "class"]
data = pd.read_csv(url, names=names, header=None)

# Prepare data
X = data.drop('class', axis=1)
y = data['class']

# Initialize classifiers
models = {'Bernoulli Naive Bayes': BernoulliNB(),
          'Multinomial Naive Bayes': MultinomialNB(),
          'Gaussian Naive Bayes': GaussianNB()}

# Evaluate each classifier
results = {}
for name, model in models.items():
    accuracy = cross_val_score(model, X, y, cv=10, scoring='accuracy')
    precision = cross_val_score(model, X, y, cv=10, scoring='precision')
    recall = cross_val_score(model, X, y, cv=10, scoring='recall')
    f1 = cross_val_score(model, X, y, cv=10, scoring='f1')

    results[name] = {'Accuracy': np.mean(accuracy),
                     'Precision': np.mean(precision),
                     'Recall': np.mean(recall),
                     'F1 Score': np.mean(f1)}

# Print results
print("Performance Metrics:")
for name, metrics in results.items():
    print(f"\n{name}:")
    for metric, score in metrics.items():
        print(f"{metric}: {score:.4f}")


Performance Metrics:

Bernoulli Naive Bayes:
Accuracy: 0.8839
Precision: 0.8870
Recall: 0.8152
F1 Score: 0.8481

Multinomial Naive Bayes:
Accuracy: 0.7863
Precision: 0.7393
Recall: 0.7215
F1 Score: 0.7283

Gaussian Naive Bayes:
Accuracy: 0.8218
Precision: 0.7104
Recall: 0.9570
F1 Score: 0.8131


##### Bernoulli Naive Bayes performed the best

Bernoulli Naive Bayes performed the best among the three variants, achieving the highest accuracy, precision, and F1 score.
Multinomial Naive Bayes had slightly lower performance compared to Bernoulli Naive Bayes.

Gaussian Naive Bayes showed high recall but lower precision, indicating it may have misclassified some non-spam messages as spam.

Suggestions for future work include feature engineering, trying different classification algorithms, parameter tuning, using ensemble methods, data augmentation, handling class imbalance, and exploring alternative cross-validation strategies.