# Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

In this case:
- Event A: The employee is a smoker.
- Event B: The employee uses the health insurance plan.

The probability that an employee is a smoker given that they use the health insurance plan can be calculated using the formula:

\[P(A|B) = \frac{P(A \cap B)}{P(B)}\]

Given the information provided:
- \(P(B)\) is the probability that an employee uses the health insurance plan, which is 70% or 0.7.
- \(P(A \cap B)\) is the probability that an employee is a smoker and uses the health insurance plan. This is the product of the probabilities, which is \(P(A) \times P(B|A)\).
   - \(P(A)\) is the probability that an employee is a smoker, which is 40% or 0.4.
   - \(P(B|A)\) is the probability that an employee uses the health insurance plan given that they are a smoker. Since this information is not provided, we cannot calculate it.

So, we need the probability that a smoker uses the health insurance plan (\(P(B|A)\)) to proceed with the calculation. If you have that information, please provide it, and I can help you calculate the conditional probability.

# Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

1. **Nature of Input Data**:
   - **Bernoulli Naive Bayes**: It is suitable for binary-valued features, meaning features that can take only two values (0 or 1). It's commonly used for text classification tasks where the presence or absence of a term is important (e.g., in sentiment analysis, where each term is like a "feature").
   - **Multinomial Naive Bayes**: It's designed for discrete data, specifically for features that can be categorized into counts (e.g., word counts for text classification tasks).

2. **Feature Representation**:
   - **Bernoulli Naive Bayes**: It works with binary feature vectors where 0 represents the absence of a feature, and 1 represents its presence.
   - **Multinomial Naive Bayes**: It works with feature vectors representing the frequency of term occurrences (e.g., word counts).

3. **Application**:
   - **Bernoulli Naive Bayes**: It's often used in problems where presence or absence of a term is more important than its frequency, such as document classification tasks like spam filtering.
   - **Multinomial Naive Bayes**: It's well-suited for problems where the frequency of terms is crucial, like text categorization, where you want to consider the distribution of words.

4. **Mathematical Formulation**:
   - Both algorithms use the same underlying principles of Bayes' theorem, but they have different likelihood functions based on the nature of the input data.

5. **Handling Missing Features**:
   - **Bernoulli Naive Bayes**: It assumes that missing features are as informative as a non-occurring feature. This may not be suitable for all situations.
   - **Multinomial Naive Bayes**: It can handle missing features by using a smoothing technique like Laplace smoothing.

6. **Performance**:
   - The performance of these algorithms depends on the specific dataset and the nature of the problem. There's no universally superior algorithm; it's recommended to try both and evaluate their performance on your specific task.


# Q3. How does Bernoulli Naive Bayes handle missing values?

1. **Imputation**:
   - You can impute missing values with a specific value, such as 0 or 1, to indicate their presence or absence. However, this approach may introduce bias, especially if the missing values are not missing at random.

2. **Create a Special Category for Missing Values**:
   - You can treat missing values as a separate category and include it as a feature in your model. This assumes that missing values carry some information.

3. **Use a Different Algorithm or Approach**:
   - Depending on the nature of your data and the problem you're trying to solve, you might consider using a different algorithm or technique that handles missing values more effectively, such as decision trees with missing value support or ensemble methods like Random Forests.

4. **Data Preprocessing Techniques**:
   - You could explore data preprocessing techniques like mean imputation or more sophisticated methods like k-Nearest Neighbors (KNN) imputation if you believe they are suitable for your specific dataset.


# Q4. Can Gaussian Naive Bayes be used for multi-class classification?

1. **Feature Distribution**:
   - For each class, Gaussian Naive Bayes models the distribution of each feature as a Gaussian (normal) distribution. This means it assumes that the data for each class is normally distributed.

2. **Class Priors**:
   - It also calculates the prior probabilities of each class based on the frequency of each class in the training data.

3. **Likelihood Estimation**:
   - Given a new data point, it calculates the likelihood of observing the features under each class's Gaussian distribution.

4. **Posterior Probability**:
   - Using Bayes' theorem, it combines the likelihood with the class priors to calculate the posterior probabilities for each class.

5. **Classification**:
   - The class with the highest posterior probability is predicted as the output.

While Gaussian Naive Bayes can be used for multi-class classification, it's important to note that it makes the assumption of normality, which may not always hold in real-world datasets. If your data doesn't follow a normal distribution, you might consider using other algorithms like Multinomial Naive Bayes for discrete data, or algorithms like Decision Trees, Random Forests, or Support Vector Machines (SVMs) which are more flexible in handling different types of data distributions.

In practice, it's always a good idea to try out different algorithms and evaluate their performance on your specific dataset to determine which one works best.

# Q5. Assignment:

>
Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

>
Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

>
Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score

>
Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?

>
Conclusion:
Summarise your findings and provide some suggestions for future work.

>
Note: This dataset contains a binary classification problem with multiple features. The dataset is
relatively small, but it can be used to demonstrate the performance of the different variants of Naive
Bayes on a real-world problem.

In [1]:
pip install ucimlrepo

Collecting ucimlrepo
  Downloading ucimlrepo-0.0.3-py3-none-any.whl (7.0 kB)
Installing collected packages: ucimlrepo
Successfully installed ucimlrepo-0.0.3
Note: you may need to restart the kernel to use updated packages.


In [3]:
from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
spambase = fetch_ucirepo(id=94) 
  
# data (as pandas dataframes) 
X = spambase.data.features 
y = spambase.data.targets 

In [60]:
import pandas as pd

In [5]:
from sklearn.model_selection import train_test_split

In [6]:
X_train, X_test, y_train, y_test = train_test_split(X , y , test_size=0.30 , random_state=42)

In [16]:
from sklearn.naive_bayes import BernoulliNB , MultinomialNB , GaussianNB
from sklearn.model_selection import GridSearchCV , RandomizedSearchCV

In [26]:
import warnings
warnings.filterwarnings('ignore')

param_grid_BG = {"alpha" :[1.0],
    "force_alpha" : ['warn'],
    "fit_prior":[True],
    "class_prior":[None]}

B_G = GridSearchCV(BernoulliNB() , param_grid=param_grid_BG , cv=10)
B_G.fit(X_train ,y_train)

In [39]:
param_grid_MG ={ "force_alpha" : ['warn'],
    "fit_prior":[True],
    "class_prior":[None]
                }
M_G = GridSearchCV(MultinomialNB() , param_grid=param_grid_MG , cv=10)
M_G.fit(X_train , y_train)

In [49]:
M_G_2= M_G.best_estimator_

M_G_2

In [40]:
param_grid_GG = {"priors":[None], "var_smoothing":[1e-09]}

G_G = RandomizedSearchCV(GaussianNB() , param_distributions=param_grid_GG , cv=10)

G_G.fit(X_train , y_train)

In [66]:
from sklearn.metrics import accuracy_score , precision_score , recall_score , f1_score

BG_Y_PRED = B_G.predict(X_test)

accuracy_score = accuracy_score(BG_Y_PRED , y_test)
precision_score = precision_score(BG_Y_PRED , y_test)
recall_score = recall_score(BG_Y_PRED , y_test)
f1_score = f1_score(BG_Y_PRED , y_test)


print("accuracy_score :" , accuracy_score)
print("precision_score :" , precision_score)
print("recall_score :",recall_score)
print("f1_score :",f1_score)

accuracy_score : 0.8790731354091238
precision_score : 0.8128249566724437
recall_score : 0.8882575757575758
f1_score : 0.848868778280543


Discuss the results you obtained. Which variant of Naive Bayes performed the best? Provide insights into why you think that is the case.
Mention any limitations of Naive Bayes that you observed during the experiment.
Summarize your findings and suggest potential directions for future work.
Please make sure to replace 'X' and 'y' with your actual feature matrix and target labels. Additionally, ensure that you have loaded the data correctly and that it's properly preprocessed for modeling.