# Answer1

To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we can use conditional probability. 

Let's denote:
- A: the event that an employee uses the health insurance plan.
- B: the event that an employee is a smoker.

The probability of an employee using the health insurance plan is denoted as P(A) = 70% or 0.7, and the probability of an employee being a smoker is denoted as P(B) = 40% or 0.4.

The conditional probability of an employee being a smoker given that they use the health insurance plan is denoted as P(B|A), and it is calculated using the formula:

[ P(B|A) = \frac{P(A \cap B)}{P(A)} \]

In this case, \( P(A \cap B) \) is the probability that an employee both uses the health insurance plan and is a smoker.

Given that 40% of the employees who use the plan are smokers, we can say \( P(A \cap B) = 0.4 \).

Now, substituting the values into the formula:

In [1]:
# Given probabilities
P_A = 0.7  # Probability of using the health insurance plan
P_B = 0.4  # Probability of being a smoker among those who use the plan

# Calculate the conditional probability P(B|A) using Naive Bayes theorem
P_A_given_B = 1  # Probability of using the health insurance plan given being a smoker

# Calculate the result
P_B_given_A_naive_bayes = (P_A_given_B * P_B) / P_A

print("The probability that an employee is a smoker given that he/she uses the health insurance plan (Naive Bayes):", P_B_given_A_naive_bayes)

The probability that an employee is a smoker given that he/she uses the health insurance plan (Naive Bayes): 0.5714285714285715


# Answer2
Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes algorithm, and they are commonly used in text classification and other machine learning tasks. The main difference between them lies in the type of data they are designed to handle.

1. **Bernoulli Naive Bayes:**
   - **Data Type:** Bernoulli Naive Bayes is suitable for binary data, where features are either present or absent. It is commonly used for document classification tasks where the presence or absence of words in a document is considered.
   - **Representation:** The input data is often represented as binary vectors, where each feature is represented as 0 or 1 (absence or presence).
   - **Example:** In text classification, a document might be represented as a binary vector where each element corresponds to the presence (1) or absence (0) of a specific word in the document.

2. **Multinomial Naive Bayes:**
   - **Data Type:** Multinomial Naive Bayes is designed for count-based data, where features represent the frequency of terms or events. It is commonly used in text classification with features like word counts in documents.
   - **Representation:** The input data is typically represented as integer counts. For example, in text classification, a document might be represented as a vector of word frequencies.
   - **Example:** If the task involves classifying documents based on word frequencies, Multinomial Naive Bayes is a suitable choice.

In summary, the choice between Bernoulli and Multinomial Naive Bayes depends on the nature of the features in your dataset. If your features are binary (presence or absence), Bernoulli Naive Bayes is more appropriate. If your features are counts or frequencies, Multinomial Naive Bayes is a better choice. Additionally, there is also a Gaussian Naive Bayes variant, which is suitable for continuous data following a Gaussian distribution.

# Answer3
In the context of Bernoulli Naive Bayes, missing values are typically treated as the absence of a feature. Since Bernoulli Naive Bayes is designed for binary data where features are either present (1) or absent (0), a missing value is assumed to be equivalent to the absence of the feature.

When dealing with missing values in a Bernoulli Naive Bayes model, you generally have two common approaches:

1. **Ignore Missing Values:**
   - You can simply ignore instances with missing values during training and classification. This means that if a particular feature is missing for a data point, you treat it as if the feature is not present (set it to 0) when calculating probabilities.

2. **Impute Missing Values:**
   - Another approach is to impute missing values before training the model. You might choose to impute missing values with the most common value (mode) for that feature or use some other imputation method that makes sense for your data. Once the missing values are imputed, you can train the Bernoulli Naive Bayes model as usual.


# Answer4
Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is a variant of the Naive Bayes algorithm that is suitable for continuous data that follows a Gaussian (normal) distribution. While it is often used for binary and two-class classification problems, it can be extended to handle multi-class classification as well.

In the context of multi-class classification, the Gaussian Naive Bayes model makes certain assumptions about the distribution of the features within each class. Specifically, it assumes that the features within each class are normally distributed and estimates the mean and variance for each feature in each class.

Here's a general overview of how Gaussian Naive Bayes is adapted for multi-class classification:

1. **Parameter Estimation:**
   - For each class, the model estimates the mean and variance of each feature based on the training data for that class.

2. **Class Prior Probability:**
   - The prior probability of each class is calculated based on the proportion of instances belonging to that class in the training data.

3. **Classifying New Instances:**
   - When classifying a new instance, the model calculates the likelihood of the observed features given each class using the Gaussian probability density function.

4. **Decision Rule:**
   - The model then applies the Naive Bayes decision rule to assign the instance to the class with the highest posterior probability.

# Answer5

In [1]:
pip install ucimlrepo

Note: you may need to restart the kernel to use updated packages.


In [2]:
from ucimlrepo import fetch_ucirepo

In [3]:
spambase = fetch_ucirepo(id=94) 

In [4]:
#Split into Dependent and Independent:

In [5]:
X = spambase.data.features

In [6]:
X

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,word_freq_conference,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total
0,0.00,0.64,0.64,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.000,0.0,0.778,0.000,0.000,3.756,61,278
1,0.21,0.28,0.50,0.0,0.14,0.28,0.21,0.07,0.00,0.94,...,0.0,0.000,0.132,0.0,0.372,0.180,0.048,5.114,101,1028
2,0.06,0.00,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.0,0.010,0.143,0.0,0.276,0.184,0.010,9.821,485,2259
3,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.0,0.000,0.137,0.0,0.137,0.000,0.000,3.537,40,191
4,0.00,0.00,0.00,0.0,0.63,0.00,0.31,0.63,0.31,0.63,...,0.0,0.000,0.135,0.0,0.135,0.000,0.000,3.537,40,191
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4596,0.31,0.00,0.62,0.0,0.00,0.31,0.00,0.00,0.00,0.00,...,0.0,0.000,0.232,0.0,0.000,0.000,0.000,1.142,3,88
4597,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.000,0.0,0.353,0.000,0.000,1.555,4,14
4598,0.30,0.00,0.30,0.0,0.00,0.00,0.00,0.00,0.00,0.00,...,0.0,0.102,0.718,0.0,0.000,0.000,0.000,1.404,6,118
4599,0.96,0.00,0.00,0.0,0.32,0.00,0.00,0.00,0.00,0.00,...,0.0,0.000,0.057,0.0,0.000,0.000,0.000,1.147,5,78


In [7]:
y = spambase.data.targets

In [10]:
y

Unnamed: 0,Class
0,1
1,1
2,1
3,1
4,1
...,...
4596,0
4597,0
4598,0
4599,0


## Bernaulli Naive bayes:

In [14]:
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.naive_bayes import BernoulliNB, MultinomialNB, GaussianNB
import warnings
warnings.filterwarnings('ignore')

In [15]:
bernoulli_nb = BernoulliNB()
y_pred_bernoulli = cross_val_predict(bernoulli_nb, X, y, cv=10)
print("Bernoulli Naive Bayes:")
print("Accuracy: {:.2f}%".format(accuracy_score(y, y_pred_bernoulli) * 100))
print("Precision: {:.2f}".format(precision_score(y, y_pred_bernoulli, average='weighted')))
print("Recall: {:.2f}".format(recall_score(y, y_pred_bernoulli, average='weighted')))
print("F1 Score: {:.2f}".format(f1_score(y, y_pred_bernoulli, average='weighted')))
print("\n")

Bernoulli Naive Bayes:
Accuracy: 88.39%
Precision: 0.88
Recall: 0.88
F1 Score: 0.88




# Multinomial Naive bayes:

In [16]:
multinomial_nb = MultinomialNB()
y_pred_multinomial = cross_val_predict(multinomial_nb, X, y, cv=10)
print("Multinomial Naive Bayes:")
print("Accuracy: {:.2f}%".format(accuracy_score(y, y_pred_multinomial) * 100))
print("Precision: {:.2f}".format(precision_score(y, y_pred_multinomial, average='weighted')))
print("Recall: {:.2f}".format(recall_score(y, y_pred_multinomial, average='weighted')))
print("F1 Score: {:.2f}".format(f1_score(y, y_pred_multinomial, average='weighted')))
print("\n")

Multinomial Naive Bayes:
Accuracy: 78.64%
Precision: 0.79
Recall: 0.79
F1 Score: 0.79




# Gaussian Naive Bayes

In [17]:
gaussian_nb = GaussianNB()
y_pred_gaussian = cross_val_predict(gaussian_nb, X, y, cv=10)
print("Gaussian Naive Bayes:")
print("Accuracy: {:.2f}%".format(accuracy_score(y, y_pred_gaussian) * 100))
print("Precision: {:.2f}".format(precision_score(y, y_pred_gaussian, average='weighted')))
print("Recall: {:.2f}".format(recall_score(y, y_pred_gaussian, average='weighted')))
print("F1 Score: {:.2f}".format(f1_score(y, y_pred_gaussian, average='weighted')))

Gaussian Naive Bayes:
Accuracy: 82.18%
Precision: 0.86
Recall: 0.82
F1 Score: 0.82


Overall Assessment:

In this specific case, the Bernaulli Naive Bayes variant may perform the best because the spam mail dataset consists of Binary features.
The choice of the best variant depends on the characteristics of the dataset. Bernoulli is suitable for binary features, Multinomial for discrete features, and Gaussian for continuous features.

Limitations of Naive Bayes:

1-Assumption of Independence:

Naive Bayes assumes that features are conditionally independent given the class. This assumption may not hold in real-world scenarios, leading to suboptimal performance.

2-Sensitivity to Feature Distribution:

Gaussian Naive Bayes assumes a Gaussian (normal) distribution of features. If the features do not follow this distribution, it might not perform well.

3-Zero Probability Issue:

In the case of zero-frequency events (features not observed in training), the model assigns zero probability, leading to difficulties in classification.

4-Limited Expressiveness:

Naive Bayes models have a simple structure and may not capture complex relationships in the data compared to more sophisticated models.