#### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

To find the probability that an employee is a smoker given that they use the health insurance plan, we can use conditional probability. We are looking for P(Smoker | Uses Insurance Plan).

We can use the formula for conditional probability:

P(Smoker | Uses Insurance Plan) = P(Smoker and Uses Insurance Plan) / P(Uses Insurance Plan)

We are given two pieces of information:

1. P(Uses Insurance Plan) = 70% = 0.70 (the probability that an employee uses the insurance plan).
2. P(Smoker | Uses Insurance Plan) = 40% = 0.40 (the probability that a person is a smoker given that they use the insurance plan).

Now, we can plug these values into the formula:

P(Smoker | Uses Insurance Plan) = P(Smoker and Uses Insurance Plan) / P(Uses Insurance Plan)

0.40 = P(Smoker and Uses Insurance Plan) / 0.70

Now, we can solve for P(Smoker and Uses Insurance Plan) :

P(Smoker and Uses Insurance Plan) = 0.40 * 0.70 = 0.28

So, the probability that an employee is a smoker given that they use the health insurance plan is 0.28, or 28%.

#### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?


Bernoulli Naive Bayes and Multinomial Naive Bayes are two different variants of the Naive Bayes classifier, each suited to different types of data and applications. Here are the key differences between them:

1. **Data Type:**
   - **Bernoulli Naive Bayes:** It is typically used for binary or categorical data where features represent the presence (1) or absence (0) of specific attributes or events. It is well-suited for problems involving binary text data or presence/absence features.
   - **Multinomial Naive Bayes:** It is used for discrete data, particularly when dealing with count or frequency data. It is commonly applied to text classification problems where features represent word counts, term frequencies, or other discrete numerical values.

2. **Feature Representation:**
   - **Bernoulli Naive Bayes:** Assumes binary features, and it models the probability of each feature being either 0 or 1. It is useful when you want to capture the presence or absence of specific features.
   - **Multinomial Naive Bayes:** Assumes discrete numerical features, typically non-negative integers. It models the probability of observing a specific count or frequency of each feature. It is useful for modeling the distribution of discrete values, such as word frequencies.

3. **Use Cases:**
   - **Bernoulli Naive Bayes:** It is commonly used for text classification tasks, such as spam detection, sentiment analysis, or document categorization, where the focus is on the presence or absence of specific words or features.
   - **Multinomial Naive Bayes:** It is also used for text classification but is more appropriate when you want to consider the frequency of words or terms in documents. It is suitable for tasks like topic modeling or document retrieval.

4. **Mathematical Formulation:**
   - **Bernoulli Naive Bayes:** It models the probability of observing a binary feature given a class label, typically using Bernoulli distribution probabilities.
   - **Multinomial Naive Bayes:** It models the probability of observing a count or frequency of a feature given a class label, typically using multinomial distribution probabilities.

In summary, the choice between Bernoulli Naive Bayes and Multinomial Naive Bayes depends on the nature of your data and the specific problem you are trying to solve. If your data consists of binary or presence/absence features, Bernoulli Naive Bayes may be more appropriate. If your data involves discrete counts or frequencies, particularly in text data, Multinomial Naive Bayes is a better choice.

#### Q3. How does Bernoulli Naive Bayes handle missing values?

Bernoulli Naive Bayes handles missing values by considering them as a separate category or class. In other words, missing values are treated as a distinct feature category in Bernoulli Naive Bayes, and their presence or absence is considered when calculating probabilities.

Here's how Bernoulli Naive Bayes deals with missing values:

1. **Encoding Missing Values:**
   - Bernoulli Naive Bayes typically requires binary features, where each feature is either 0 (absence) or 1 (presence). To handle missing values, you can encode them as a separate category, often represented as -1 or another distinct value.

2. **Modeling Missing Values:**
   - When calculating probabilities for each feature given a class label, the classifier considers the presence (1), absence (0), and the missing value (-1) as separate possibilities.

3. **Probability Estimation:**
   - In the probability estimation step, the classifier calculates the probability of each feature being 0, 1, or -1 for each class.

4. **Predictions:**
   - When making predictions, the classifier factors in the probabilities associated with the presence, absence, and missing values of features for each class and selects the class with the highest posterior probability.

5. **Handling Missing Values in Test Data:**
   - When applying the trained model to new data that may contain missing values, the classifier uses the encoded missing value (-1) to make predictions.

It's important to note that the treatment of missing values as a separate category can introduce additional complexity into the model and may require careful preprocessing of the data. The choice of how to encode missing values (e.g., as -1) should be consistent between the training and testing data to ensure accurate predictions.

In summary, Bernoulli Naive Bayes accommodates missing values by treating them as a distinct category and factoring them into the probability calculations for each feature. However, handling missing values in Naive Bayes can be a topic of consideration during data preprocessing and model training.

#### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification tasks. While it is often associated with binary classification problems, it can be extended to handle multi-class classification by employing a one-vs-all (also known as one-vs-rest) strategy.

Here's how Gaussian Naive Bayes can be adapted for multi-class classification:

1. **One-vs-All (OvA) Strategy:**
   - In a multi-class classification problem with \(k\) classes, you create \(k\) separate binary classifiers, one for each class.
   - For each binary classifier, you treat one class as the positive class and group the remaining \(k-1\) classes as the negative class.

2. **Training:**
   - For each binary classifier, you train it using the Gaussian Naive Bayes algorithm but with the labels transformed into binary values, where the positive class is labeled as 1, and the negative class is labeled as 0.
   - Each binary classifier learns the Gaussian distribution parameters (mean and variance) for its respective positive class.

3. **Prediction:**
   - To make predictions for a new instance, you apply all \(k\) binary classifiers to it.
   - The class associated with the binary classifier that produces the highest probability or score is the predicted class for the multi-class problem.

4. **Decision Rule:**
   - The decision rule is typically based on the posterior probabilities produced by each binary classifier. You can choose the class with the highest posterior probability or apply a threshold to make decisions.

This OvA strategy effectively transforms the multi-class problem into a set of binary classification problems, where each binary classifier specializes in distinguishing one class from the rest. Gaussian Naive Bayes is well-suited for this approach when features are continuous and can be modeled using Gaussian (normal) distributions.

In summary, while Gaussian Naive Bayes is originally designed for binary classification, it can be extended to handle multi-class classification through the one-vs-all strategy, making it a versatile algorithm for a wide range of classification tasks.

#### Q5. Assignment

In [17]:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
import numpy as np
import urllib.request
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"
destination_file = "spambase.data"
urllib.request.urlretrieve(url, destination_file)


# Load the dataset into a pandas DataFrame
column_names = [
    "word_freq_make", "word_freq_address", "word_freq_all", "word_freq_3d", "word_freq_our",
    "word_freq_over", "word_freq_remove", "word_freq_internet", "word_freq_order", "word_freq_mail",
    "word_freq_receive", "word_freq_will", "word_freq_people", "word_freq_report", "word_freq_addresses",
    "word_freq_free", "word_freq_business", "word_freq_email", "word_freq_you", "word_freq_credit",
    "word_freq_your", "word_freq_font", "word_freq_000", "word_freq_money", "word_freq_hp",
    "word_freq_hpl", "word_freq_george", "word_freq_650", "word_freq_lab", "word_freq_labs",
    "word_freq_telnet", "word_freq_857", "word_freq_data", "word_freq_415", "word_freq_85",
    "word_freq_technology", "word_freq_1999", "word_freq_parts", "word_freq_pm", "word_freq_direct",
    "word_freq_cs", "word_freq_meeting", "word_freq_original", "word_freq_project", "word_freq_re",
    "word_freq_edu", "word_freq_table", "word_freq_conference", "char_freq_;", "char_freq_(",
    "char_freq_[", "char_freq_!", "char_freq_$", "char_freq_#", "capital_run_length_average",
    "capital_run_length_longest", "capital_run_length_total", "is_spam"
]

# Read the dataset into a DataFrame
df = pd.read_csv("spambase.data", names=column_names, header=None)
df.head()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total,is_spam
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [18]:
# Split the dataset into features (X) and the target (y)
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

In [19]:
from sklearn.model_selection import cross_val_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB

# Create instances of each Naive Bayes classifier
bernoulli_nb = BernoulliNB()
multinomial_nb = MultinomialNB()
gaussian_nb = GaussianNB()

# Perform 10-fold cross-validation and evaluate the classifiers
scores_bernoulli = cross_val_score(bernoulli_nb, X, y, cv=10, scoring='accuracy')
scores_multinomial = cross_val_score(multinomial_nb, X, y, cv=10, scoring='accuracy')
scores_gaussian = cross_val_score(gaussian_nb, X, y, cv=10, scoring='accuracy')

# Print the average accuracy scores for each classifier
print("Average Accuracy (Bernoulli Naive Bayes):", scores_bernoulli.mean())
print("Average Accuracy (Multinomial Naive Bayes):", scores_multinomial.mean())
print("Average Accuracy (Gaussian Naive Bayes):", scores_gaussian.mean())


Average Accuracy (Bernoulli Naive Bayes): 0.8839380364047911
Average Accuracy (Multinomial Naive Bayes): 0.7863496180326323
Average Accuracy (Gaussian Naive Bayes): 0.8217730830896915


The results obtained from evaluating different variants of Naive Bayes classifiers on the dataset are as follows:

- Average Accuracy (Bernoulli Naive Bayes): 0.8839
- Average Accuracy (Multinomial Naive Bayes): 0.7863
- Average Accuracy (Gaussian Naive Bayes): 0.8218

**Discussion of Results:**

1. **Bernoulli Naive Bayes:** The Bernoulli Naive Bayes classifier achieved the highest average accuracy among the three variants, with an accuracy of approximately 88.39%. This variant performed the best in this specific dataset, likely because the dataset or the features are binary or binary-like (presence or absence of specific attributes).

2. **Multinomial Naive Bayes:** The Multinomial Naive Bayes classifier had a lower average accuracy of approximately 78.63%. This lower performance might be attributed to the nature of the dataset or features. Multinomial Naive Bayes is more suitable for datasets with discrete counts or frequencies, often used in text classification tasks.

3. **Gaussian Naive Bayes:** The Gaussian Naive Bayes classifier achieved an average accuracy of approximately 82.18%. Gaussian Naive Bayes is suitable for datasets with continuous features that can be modeled using Gaussian (normal) distributions. It performed moderately well, but not as well as Bernoulli Naive Bayes.

**Limitations of Naive Bayes:**

- **Independence Assumption:** Naive Bayes classifiers assume that features are independent of each other given the class label. This assumption may not hold in real-world datasets, and violations of independence can affect model performance.

- **Sensitivity to Feature Scaling:** Gaussian Naive Bayes is sensitive to the scale of continuous features. If features have different scales, it can impact the classifier's performance.

- **Limited Expressiveness:** Naive Bayes classifiers have limited expressiveness compared to more complex models like decision trees or neural networks. They may not capture intricate relationships in the data.

**Suggestions for Future Work:**

1. **Feature Engineering:** Explore feature engineering techniques to improve the performance of Naive Bayes classifiers. Feature selection, dimensionality reduction, or transforming features might enhance the models' accuracy.

2. **Hyperparameter Tuning:** Experiment with hyperparameter tuning for the Naive Bayes classifiers. Adjusting parameters or using smoothing techniques could lead to better results.

3. **Ensemble Methods:** Consider using ensemble methods like Random Forest or Gradient Boosting, which can combine the predictions of multiple classifiers to potentially achieve higher accuracy.

4. **Evaluate Other Algorithms:** Compare the performance of Naive Bayes with other classification algorithms (e.g., decision trees, support vector machines, or deep learning models) to determine if a different algorithm is better suited for this dataset.

5. **Data Preprocessing:** Carefully preprocess the dataset, addressing issues like missing values, outliers, or class imbalances, which can impact model performance.

6. **Cross-Validation Strategies:** Experiment with different cross-validation strategies, such as stratified or k-fold cross-validation, to ensure robust evaluation.

In summary, while Naive Bayes classifiers are simple and interpretable, they may not always be the best choice for all datasets. Exploring alternative algorithms and preprocessing techniques can lead to improved model performance.