# ```Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?```

> # ``` Using Conditional Probability: ```
## To solve this problem, we need to use Conditional Probability formula, which states that:

- ### P(A|B) = P(A ∩ B) / P(B)

- ### where A and B are events, and P(A|B) is the probability of event A given that event B has occurred.

### In this case, we want to find the probability that an employee is a smoker given that he/she uses the health insurance plan, which is:

- ### P(smoker | uses insurance) = P(smoker ∩ uses insurance) / P(uses insurance)

## We know that 70% of employees use the insurance plan, so:

- ### P(uses insurance) = 0.7
## We also know that 40% of employees are smoker as well as plan holder, so:

- ### P(smoker ∩ uses insurance) = 0.4

### Putting it all together, we get:

- ### P(smoker | uses insurance) = 0.4 / 0.7 = 0.57

### Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.57 or 57%.

> # ``` Using Bayes' Theorem: ```
## To solve this problem, we need to use Bayes' theorem, which states that:

- ### P(A|B) = P(B|A) * P(A) / P(B)

- ### where A and B are events, and P(A|B) is the probability of event A given that event B has occurred.

### In this case, we want to find the probability that an employee is a smoker given that he/she uses the health insurance plan, which is:

- ### P(smoker | uses insurance) = P(uses insurance | smoker) * P(smoker) / P(uses insurance)

## We know that 70% of employees use the insurance plan, so:

- ### P(uses insurance) = 0.7
## We also know that 40% of employees who use the plan are smokers, so:

- ### P(uses insurance | smoker) = 0.4

## Finally, we know that 100% of smokers smoke, so:

- ### P(smoker) = 1.0

### Putting it all together, we get:

- ### P(smoker | uses insurance) = 0.4 * 1.0 / 0.7 = 0.57

### Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is 0.57 or 57%.

# ```Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?```
### Bernoulli Naive Bayes and Multinomial Naive Bayes are both variants of Naive Bayes classification, but they differ in the type of data they are best suited for.

## ```Bernoulli Naive Bayes``` is used when the input variables are binary, i.e., they take on values of 0 or 1. It is often used for text classification problems where the presence or absence of certain words in a document is used to determine its classification.

## ```Multinomial Naive Bayes``` is used when the input variables represent counts or frequencies of events. It is often used for text classification problems where the number of occurrences of certain words in a document is used to determine its classification.

### In summary, Bernoulli Naive Bayes is used for binary data while Multinomial Naive Bayes is used for count or frequency data.

# ```Q3. How does Bernoulli Naive Bayes handle missing values?```
## ```Bernoulli Naive Bayes``` assumes that the input variables are binary, taking on values of 0 or 1. If a data point is missing a value for a particular input variable, it is typically treated as a 0 for that variable. This is because the absence of a value can be interpreted as a "no" or "false" response to the question of whether the feature is present.

## However, if a significant number of data points have missing values for a particular input variable, the performance of the Bernoulli Naive Bayes classifier may be affected. In such cases, it may be more appropriate to impute the missing values using a suitable imputation technique or consider using a different classification algorithm that can handle missing data more effectively.

# ```Q4. Can Gaussian Naive Bayes be used for multi-class classification?```
## Yes, ```Gaussian Naive Bayes``` can be used for multi-class classification. It is a probabilistic algorithm that assumes that the input variables are normally distributed, and it calculates the conditional probability of each class given the input variables.

## For ```multi-class classification```, the algorithm calculates the conditional probability of each class given the input variables, and then selects the class with the highest probability as the predicted class. This approach is known as the "one-vs-all" approach, where each class is treated as a binary classification problem, and the algorithm is run for each class.

## Alternatively, ```Gaussian Naive Bayes``` can be modified to directly model the joint probability of all classes and input variables, which is known as the "one-vs-one" approach. In this approach, the algorithm calculates the joint probability of each class and input variables, and then selects the class with the highest probability as the predicted class.

### Overall, Gaussian Naive Bayes is a versatile algorithm that can be used for both binary and multi-class classification problems, although its assumptions about the input variables may not always hold in practice.

> # ```Q5.```
## ```Implementation:```

## Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each classifier.


## ```Results:```

## Report the following performance metrics for each classifier:
- ### Accuracy
- ### Precision
- ### Recall
- ### F1 score

## ```Discussion:```

### Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is the case? Are there any limitations of Naive Bayes that you observed?

## ```Conclusion:```

## Summarise your findings and provide some suggestions for future work.


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

In [2]:
spams = pd.read_csv('spambase.csv')

In [3]:
spams.head()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total,spam
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [115]:
spams.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4601 entries, 0 to 4600
Data columns (total 58 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   word_freq_make              4601 non-null   float64
 1   word_freq_address           4601 non-null   float64
 2   word_freq_all               4601 non-null   float64
 3   word_freq_3d                4601 non-null   float64
 4   word_freq_our               4601 non-null   float64
 5   word_freq_over              4601 non-null   float64
 6   word_freq_remove            4601 non-null   float64
 7   word_freq_internet          4601 non-null   float64
 8   word_freq_order             4601 non-null   float64
 9   word_freq_mail              4601 non-null   float64
 10  word_freq_receive           4601 non-null   float64
 11  word_freq_will              4601 non-null   float64
 12  word_freq_people            4601 non-null   float64
 13  word_freq_report            4601 

In [4]:
spams.describe()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total,spam
count,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,...,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0,4601.0
mean,0.104553,0.213015,0.280656,0.065425,0.312223,0.095901,0.114208,0.105295,0.090067,0.239413,...,0.038575,0.13903,0.016976,0.269071,0.075811,0.044238,5.191515,52.172789,283.289285,0.394045
std,0.305358,1.290575,0.504143,1.395151,0.672513,0.273824,0.391441,0.401071,0.278616,0.644755,...,0.243471,0.270355,0.109394,0.815672,0.245882,0.429342,31.729449,194.89131,606.347851,0.488698
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.588,6.0,35.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.065,0.0,0.0,0.0,0.0,2.276,15.0,95.0,0.0
75%,0.0,0.0,0.42,0.0,0.38,0.0,0.0,0.0,0.0,0.16,...,0.0,0.188,0.0,0.315,0.052,0.0,3.706,43.0,266.0,1.0
max,4.54,14.28,5.1,42.81,10.0,5.88,7.27,11.11,5.26,18.18,...,4.385,9.752,4.081,32.478,6.003,19.829,1102.5,9989.0,15841.0,1.0


In [5]:
spams.isnull().any().any()

False

In [6]:
spams.shape

(4601, 58)

In [57]:
X = spams.drop('spam', axis=1)
y = spams.spam

> # Train Test Split

In [58]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.05,random_state=2)

> # Standardization

In [59]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)   

> # Bernoulli Naive Bayes

In [60]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import KFold,cross_val_score
model = BernoulliNB()
kfold = KFold(n_splits=10, shuffle=True, random_state=1)
scores = cross_val_score(model,X_train,y_train, cv=kfold)
print(f"Accuracy: {round(scores.mean(),4)*100} %")

Accuracy: 90.25 %


> ## Confusion Matrix & Classification Report

In [63]:
model.fit(X_train,y_train)
y_pred = model.predict(X_test)

from sklearn.metrics import confusion_matrix,classification_report
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

[[116  11]
 [ 11  93]]
              precision    recall  f1-score   support

           0       0.91      0.91      0.91       127
           1       0.89      0.89      0.89       104

    accuracy                           0.90       231
   macro avg       0.90      0.90      0.90       231
weighted avg       0.90      0.90      0.90       231



> # Gaussian Naive Bayes

In [65]:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import KFold,cross_val_score
model = GaussianNB()
kfold = KFold(n_splits=10, shuffle=True, random_state=1)
scores = cross_val_score(model,X_train,y_train, cv=kfold)
print(f"Accuracy: {round(scores.mean(),4)*100} %")

Accuracy: 81.14 %


> ## Confusion Matrix & Classification Report

In [66]:
model.fit(X_train,y_train)
y_pred = model.predict(X_test)

from sklearn.metrics import confusion_matrix,classification_report
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

[[ 93  34]
 [  2 102]]
              precision    recall  f1-score   support

           0       0.98      0.73      0.84       127
           1       0.75      0.98      0.85       104

    accuracy                           0.84       231
   macro avg       0.86      0.86      0.84       231
weighted avg       0.88      0.84      0.84       231



> # Multinomial Naive Bayes

In [67]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.05,random_state=2)

> ## MinMax Scaling for Multinomial Naive Bayes 

In [69]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [70]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import KFold,cross_val_score
model = MultinomialNB()
kfold = KFold(n_splits=10, shuffle=True, random_state=1)
scores = cross_val_score(model,X_train,y_train, cv=kfold)
print(f"Accuracy: {round(scores.mean(),4)*100} %")

Accuracy: 88.47 %


> ## Confusion Matrix & Classification Report

In [71]:
model.fit(X_train,y_train)
y_pred = model.predict(X_test)

from sklearn.metrics import confusion_matrix,classification_report
print(confusion_matrix(y_test,y_pred))
print(classification_report(y_test,y_pred))

[[122   5]
 [ 19  85]]
              precision    recall  f1-score   support

           0       0.87      0.96      0.91       127
           1       0.94      0.82      0.88       104

    accuracy                           0.90       231
   macro avg       0.90      0.89      0.89       231
weighted avg       0.90      0.90      0.90       231



> # Discussion

## After training our model using Bernoulli, Gaussian and Multinomial naive bayes algorithm at same test size and random state with  KFold of 10, We have following results:
* ### Bernoulli Accuracy: 90.25 %
* ### Gaussian Accuracy: 81.14 %
* ### Multinomial Accuracy: 88.47 %

## Therefore, The Bernaoulli naive bayes perform most accurately than other two, As we know the Bernoulli is specialized for binary class. And our targer variable is also a binary class variable.


## Yes, there are some limitations of Naive Bayes:

 * ### 1. ```Assumption of independence```: Naive Bayes assumes that all features are independent of each other, which may not be true in all cases. If the features are correlated, then Naive Bayes may not perform well.

* ### 2. ```Limited expressiveness```: Naive Bayes can only represent linear decision boundaries. It may not be able to capture more complex decision boundaries that are necessary in some datasets.

* ### 3. ```Sensitivity to irrelevant features```: Naive Bayes treats all features equally, even if some features may not be relevant for classification. This can lead to reduced performance if irrelevant features are included in the model.

* ### 4. ```Lack of probability estimates```: Naive Bayes does not provide accurate probability estimates for the predicted class, and may not be appropriate if probabilities are important.

* ### 5. ```Limited data```: Naive Bayes may not perform well if there is limited training data available, as it relies heavily on the training data to estimate the parameters of the model.

> # Summary