#### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

In [2]:
# Calculation of probability using provided information
p_use_plan = 0.70  # Probability that an employee uses the health insurance plan
p_smoker_given_use_plan = 0.40  # Probability that an employee is a smoker given that they use the health insurance plan

# Calculation of the probability using Bayes' theorem
p_smoker_and_use_plan = p_use_plan * p_smoker_given_use_plan  # P(Smoker and Use Plan)
p_use_plan_given_smoker = p_smoker_and_use_plan / p_use_plan  # P(Use Plan | Smoker)

# Printing the result
print("Probability that an employee is a smoker given that they use the health insurance plan:", p_use_plan_given_smoker)


Probability that an employee is a smoker given that they use the health insurance plan: 0.39999999999999997


#### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

Ans--> The main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the type of data they are suitable for and the underlying assumptions they make. Here are the key differences:

1. Data Type:
   - Bernoulli Naive Bayes: It is suitable for binary or Boolean features, where each feature represents the presence or absence of a particular attribute. It works with data where features are binary indicators (0 or 1) of the presence or absence of certain attributes.
   - Multinomial Naive Bayes: It is suitable for discrete count data, typically represented by word frequencies or occurrence counts. It works with data where features represent counts or frequencies of events or categories.

2. Feature Representation:
   - Bernoulli Naive Bayes: It assumes that each feature follows a Bernoulli distribution, meaning that features are binary indicators. It considers the presence or absence of a feature in a document.
   - Multinomial Naive Bayes: It assumes that features follow a multinomial distribution, where each feature represents the count or frequency of a specific event or category. It considers the occurrence count or frequency of each feature.

3. Assumptions:
   - Bernoulli Naive Bayes: It assumes that features are independent of each other given the class variable. It treats features as binary indicators and assumes that the presence or absence of one feature does not affect the presence or absence of other features.
   - Multinomial Naive Bayes: It assumes that features are independent of each other given the class variable. It assumes that the occurrence counts or frequencies of different features are conditionally independent given the class.

4. Application:
   - Bernoulli Naive Bayes: It is commonly used in text classification tasks where the presence or absence of certain words or features is considered, such as sentiment analysis or document classification with binary bag-of-words representation.
   - Multinomial Naive Bayes: It is widely used in text classification problems where word frequencies or occurrence counts are important, such as document categorization or spam filtering based on the count of words.

In summary, Bernoulli Naive Bayes is suitable for binary feature data, while Multinomial Naive Bayes is suitable for discrete count data. Bernoulli Naive Bayes considers the presence or absence of features, while Multinomial Naive Bayes considers the occurrence counts or frequencies. The choice between the two depends on the nature of the data and the specific problem at hand.

#### Q3. How does Bernoulli Naive Bayes handle missing values?

Ans--> Bernoulli Naive Bayes does not handle missing values explicitly. It assumes that each feature follows a Bernoulli distribution, where features are binary indicators representing the presence or absence of a particular attribute. In this formulation, missing values are typically treated as a separate category or class.

When dealing with missing values in Bernoulli Naive Bayes, there are a few common approaches:

1. Assign a default or special value: Replace missing values with a specific value that represents missingness, such as "NaN" or "unknown". This allows the missing values to be treated as a separate category during the training and prediction phases.

2. Ignore missing values: Alternatively, you can choose to ignore instances with missing values during the training and prediction stages. This means removing instances with missing values from the dataset before applying the Bernoulli Naive Bayes algorithm.

3. Imputation: Another option is to impute the missing values with some estimated values. This can be done by using various imputation techniques, such as mean imputation (replacing missing values with the mean of the feature) or regression imputation (predicting the missing values based on other features).

The choice of handling missing values in Bernoulli Naive Bayes depends on the specific problem, the amount and nature of missingness in the data, and the available information. It is important to consider the potential impact of missing values on the accuracy and reliability of the classification results.

#### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Ans--> Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is an extension of Naive Bayes algorithm that assumes the features are continuous and follows a Gaussian distribution. It is commonly used for classification tasks where the features have continuous values.

In the case of multi-class classification, where there are more than two classes to predict, Gaussian Naive Bayes can be adapted to handle multiple classes. The algorithm calculates the probability of each class given the input features and selects the class with the highest probability as the predicted class.

To use Gaussian Naive Bayes for multi-class classification, the algorithm estimates the class conditional probability distribution for each class using the Gaussian distribution assumption. Then, when making predictions, it calculates the probability of each class for a given input and selects the class with the highest probability.

While Gaussian Naive Bayes can be effective for some types of data and classification problems, it makes strong assumptions about the independence of features, which may not hold in all cases. Additionally, it assumes that the features follow a Gaussian distribution, which may not be the case for all datasets. Therefore, it's important to assess whether these assumptions are valid for your specific dataset before applying Gaussian Naive Bayes for multi-class classification.

#### Q5. Assignment:

#### Data preparation:

Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.

In [8]:
import pandas as pd

In [11]:
spam=pd.read_csv('spambase.csv')

In [12]:
spam.head()

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
0,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
1,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
2,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,1.85,0.0,0.0,1.85,0.0,0.0,...,0.0,0.223,0.0,0.0,0.0,0.0,3.0,15,54,1


In [13]:
spam.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4600 entries, 0 to 4599
Data columns (total 58 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       4600 non-null   float64
 1   0.64    4600 non-null   float64
 2   0.64.1  4600 non-null   float64
 3   0.1     4600 non-null   float64
 4   0.32    4600 non-null   float64
 5   0.2     4600 non-null   float64
 6   0.3     4600 non-null   float64
 7   0.4     4600 non-null   float64
 8   0.5     4600 non-null   float64
 9   0.6     4600 non-null   float64
 10  0.7     4600 non-null   float64
 11  0.64.2  4600 non-null   float64
 12  0.8     4600 non-null   float64
 13  0.9     4600 non-null   float64
 14  0.10    4600 non-null   float64
 15  0.32.1  4600 non-null   float64
 16  0.11    4600 non-null   float64
 17  1.29    4600 non-null   float64
 18  1.93    4600 non-null   float64
 19  0.12    4600 non-null   float64
 20  0.96    4600 non-null   float64
 21  0.13    4600 non-null   float64
 22  

In [14]:
spam.describe()

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
count,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,...,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0
mean,0.104576,0.212922,0.280578,0.065439,0.312222,0.095922,0.114233,0.105317,0.090087,0.239465,...,0.038583,0.139061,0.01698,0.26896,0.075827,0.044248,5.191827,52.17087,283.290435,0.393913
std,0.305387,1.2907,0.50417,1.395303,0.672586,0.27385,0.39148,0.401112,0.278643,0.644816,...,0.243497,0.270377,0.109406,0.815726,0.245906,0.429388,31.732891,194.912453,606.413764,0.488669
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.588,6.0,35.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.065,0.0,0.0,0.0,0.0,2.2755,15.0,95.0,0.0
75%,0.0,0.0,0.42,0.0,0.3825,0.0,0.0,0.0,0.0,0.16,...,0.0,0.188,0.0,0.31425,0.052,0.0,3.70525,43.0,265.25,1.0
max,4.54,14.28,5.1,42.81,10.0,5.88,7.27,11.11,5.26,18.18,...,4.385,9.752,4.081,32.478,6.003,19.829,1102.5,9989.0,15841.0,1.0


In [15]:
spam.isnull().sum()

0         0
0.64      0
0.64.1    0
0.1       0
0.32      0
0.2       0
0.3       0
0.4       0
0.5       0
0.6       0
0.7       0
0.64.2    0
0.8       0
0.9       0
0.10      0
0.32.1    0
0.11      0
1.29      0
1.93      0
0.12      0
0.96      0
0.13      0
0.14      0
0.15      0
0.16      0
0.17      0
0.18      0
0.19      0
0.20      0
0.21      0
0.22      0
0.23      0
0.24      0
0.25      0
0.26      0
0.27      0
0.28      0
0.29      0
0.30      0
0.31      0
0.33      0
0.34      0
0.35      0
0.36      0
0.37      0
0.38      0
0.39      0
0.40      0
0.41      0
0.42      0
0.43      0
0.778     0
0.44      0
0.45      0
3.756     0
61        0
278       0
1         0
dtype: int64

In [16]:
spam.isna().sum()

0         0
0.64      0
0.64.1    0
0.1       0
0.32      0
0.2       0
0.3       0
0.4       0
0.5       0
0.6       0
0.7       0
0.64.2    0
0.8       0
0.9       0
0.10      0
0.32.1    0
0.11      0
1.29      0
1.93      0
0.12      0
0.96      0
0.13      0
0.14      0
0.15      0
0.16      0
0.17      0
0.18      0
0.19      0
0.20      0
0.21      0
0.22      0
0.23      0
0.24      0
0.25      0
0.26      0
0.27      0
0.28      0
0.29      0
0.30      0
0.31      0
0.33      0
0.34      0
0.35      0
0.36      0
0.37      0
0.38      0
0.39      0
0.40      0
0.41      0
0.42      0
0.43      0
0.778     0
0.44      0
0.45      0
3.756     0
61        0
278       0
1         0
dtype: int64

In [20]:
x=spam.drop('1',axis=1)
y=spam['1']

##### Implementation:

Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.

In [22]:
from sklearn.model_selection import train_test_split,cross_val_score
from sklearn.naive_bayes import GaussianNB,BernoulliNB, MultinomialNB

In [25]:
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)

In [26]:
bernaulliNb=BernoulliNB()

In [27]:
scores=cross_val_score(bernaulliNb,x_train,y_train,cv=10)

In [29]:
accuracy=scores.mean()

In [30]:
accuracy

0.8891304347826086

In [36]:
multinomialNb=MultinomialNB()

In [37]:
scores=cross_val_score(multinomialNb,x_train,y_train,cv=10)

In [38]:
accuracy=scores.mean()

In [39]:
accuracy

0.7953804347826087

In [40]:
gaussianNb=GaussianNB()

In [41]:
scores=cross_val_score(gaussianNb,x_train,y_train,cv=10)

In [42]:
accuracy=scores.mean()

In [43]:
accuracy

0.8220108695652174

In [44]:
bernaulliNb.fit(x_train,y_train)

In [32]:
from sklearn.metrics import accuracy_score,precision_score,recall_score,f1_score

In [45]:
precision = precision_score(y_test, bernaulliNb.predict(x_test))
recall = recall_score(y_test, bernaulliNb.predict(x_test))
f1 = f1_score(y_test, bernaulliNb.predict(x_test))

#### Results:

In [46]:
print("Accuracy:",accuracy_score(y_test,bernaulliNb.predict(x_test)))
print("Precision:",precision)
print("Recall:",recall)
print("F1-Score:",f1)

Accuracy: 0.8728260869565218
Precision: 0.8933717579250721
Recall: 0.7948717948717948
F1-Score: 0.841248303934871


##### Discussion:

BernaoulliNb is performed better in terms of accuracy

##### Conclusion:

Naive Bayes that you observed, such as the assumptions of independence between features and the Gaussian distribution assumption for Gaussian Naive Bayes