Q1. To find the probability that an employee is a smoker given that he/she uses the health insurance plan, we can use Bayes' theorem. Let:
- \( P(S) \) be the probability that an employee is a smoker.
- \( P(H) \) be the probability that an employee uses the health insurance plan.
- \( P(S|H) \) be the probability that an employee is a smoker given that he/she uses the health insurance plan.
- \( P(H|S) \) be the probability that an employee uses the health insurance plan given that he/she is a smoker.

We are given:
- \( P(H) = 0.70 \) (probability that an employee uses the health insurance plan)
- \( P(S|H) = 0.40 \) (probability that an employee is a smoker given that he/she uses the health insurance plan)

Using Bayes' theorem:
\[ P(S|H) = \frac{P(H|S) \times P(S)}{P(H)} \]

Rearranging, we get:
\[ P(H|S) = \frac{P(S|H) \times P(H)}{P(S)} \]

Substituting the given values:
\[ P(H|S) = \frac{0.40 \times 0.70}{P(S)} \]

Since \( P(S) \) is not given directly, we can calculate it using the law of total probability:
\[ P(S) = P(S|H) \times P(H) + P(S|\neg H) \times P(\neg H) \]

Given that \( P(S|\neg H) = 0.10 \) (probability that an employee is a smoker given that he/she does not use the health insurance plan), and \( P(\neg H) = 1 - P(H) = 1 - 0.70 = 0.30 \), we can calculate \( P(S) \):
\[ P(S) = 0.40 \times 0.70 + 0.10 \times 0.30 \]
\[ P(S) = 0.28 + 0.03 = 0.31 \]

Now, substitute \( P(S) = 0.31 \) into the equation for \( P(H|S) \):
\[ P(H|S) = \frac{0.40 \times 0.70}{0.31} \]
\[ P(H|S) = \frac{0.28}{0.31} \]
\[ P(H|S) \approx 0.903 \]

Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is approximately 0.903 or 90.3%.

Q2. Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes algorithm that differ in the way they handle feature representations and assumptions about the distribution of data.

- Bernoulli Naive Bayes: This variant assumes that features are binary-valued (e.g., presence or absence of a feature). It calculates the probability of each feature being present or absent in each class.
- Multinomial Naive Bayes: This variant is suitable for features that describe discrete frequency counts (e.g., word counts in text classification). It calculates the probability of each feature's occurrence count in each class.

Q3. Bernoulli Naive Bayes handles missing values by considering them as the absence of the feature. It assumes that if a feature is missing, it does not contribute to the likelihood calculation for that class.

Q4. Yes, Gaussian Naive Bayes can be used for multi-class classification. It extends the basic Naive Bayes algorithm to handle continuous-valued features by assuming that each feature follows a Gaussian (normal) distribution. It can handle multiple classes by calculating the posterior probability for each class and selecting the class with the highest probability.


In [1]:
pip install ucimlrepo

Collecting ucimlrepo
  Downloading ucimlrepo-0.0.3-py3-none-any.whl (7.0 kB)
Installing collected packages: ucimlrepo
Successfully installed ucimlrepo-0.0.3
Note: you may need to restart the kernel to use updated packages.


In [3]:
from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
spambase = fetch_ucirepo(id=94) 
  
# data (as pandas dataframes) 
X = spambase.data.features 
y = spambase.data.targets 


In [9]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)

In [10]:
from sklearn.naive_bayes import GaussianNB, BernoulliNB, MultinomialNB

In [11]:
# Bernauli naive bayes
bnb = BernoulliNB()
bnb.fit(x_train,y_train)

  y = column_or_1d(y, warn=True)


In [12]:
y_pred1 = bnb.predict(x_test)

In [13]:
# Multinomial Naive Bayes
mnb = MultinomialNB()
mnb.fit(x_train,y_train)

  y = column_or_1d(y, warn=True)


In [14]:
y_pred2 = mnb.predict(x_test)

In [15]:
# Gausian Naive bayes
gnb = GaussianNB()
gnb.fit(x_train,y_train)

  y = column_or_1d(y, warn=True)


In [16]:
y_pred3 = gnb.predict(x_test)

In [17]:
from sklearn.metrics import accuracy_score,classification_report,precision_score,r2_score

In [19]:
print('For Bernouli Naive Bayes: ')
print(accuracy_score(y_test,y_pred1))
print(precision_score(y_test,y_pred1))
print(r2_score(y_test,y_pred1))


For Bernouli Naive Bayes: 
0.8773072747014115
0.8792134831460674
0.4949236607879489


In [None]:
print('For Multinomial Naive Bayes: ')
print(accuracy_score(y_test,y_pred2))
print(precision_score(y_test,y_pred2))
print(r2_score(y_test,y_pred))
