## Ques 1:

### Ans: To find the probability that an employee is a smoker given that they use the health insurance plan, we need to use Bayes' theorem:
### P(Smoker | Uses insurance plan) = P(Uses insurance plan | Smoker) * P(Smoker) / P(Uses insurance plan)
### We know that 70% of employees use the company's health insurance plan, so:
### P(Uses insurance plan) = 0.7
### We also know that 40% of the employees who use the plan are smokers, so:
### P(Uses insurance plan | Smoker) = 0.4
### Finally, we need to determine the probability of being a smoker, regardless of whether or not the person uses the insurance plan:
### P(Smoker) = ?
### Unfortunately, we don't have enough information to determine this directly from the survey results. However, we can make a reasonable assumption based on some general statistics:
### Let's say that 20% of all employees are smokers. This is a rough estimate, but it's a common percentage in many populations.
### With this assumption, we can calculate:
### P(Smoker) = 0.2
### Now we can substitute these values into Bayes' theorem:
### P(Smoker | Uses insurance plan) = 0.4 * 0.2 / 0.7
### P(Smoker | Uses insurance plan) ≈ 0.1143
### Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is approximately 0.1143 or about 11.43%.

## Ques 2:

### Ans: Both Bernoulli Naive Bayes and Multinomial Naive Bayes are variants of the Naive Bayes algorithm used in machine learning for classification problems. The main difference between the two lies in the type of input data they are best suited for.
### Bernoulli Naive Bayes is generally used when the features (input variables) are binary, meaning they can take on only two values such as 0 or 1. It is named after the Bernoulli distribution, which models the probability of a binary event occurring. In Bernoulli Naive Bayes, each feature is assumed to be a binary variable that indicates whether or not a particular word or feature occurs in the text. This approach is commonly used in text classification tasks where the presence or absence of a particular word in a document is used as a feature for classification.
### On the other hand, Multinomial Naive Bayes is used when the features are count-based, such as word frequencies. It is named after the multinomial distribution, which models the probability of multiple outcomes occurring from a discrete set. In Multinomial Naive Bayes, the frequency of occurrence of each word or feature in a document is used as a feature for classification. This approach is commonly used in tasks such as spam filtering, where the frequency of certain words or phrases in an email is used as a feature to determine whether the email is spam or not. 

## Ques 3:

### Ans: Bernoulli Naive Bayes is a classification algorithm that assumes that the input features are binary (i.e., have only two possible values: 0 or 1), and it is commonly used in text classification tasks where the presence or absence of a word is used as a feature for classification. In such cases, if a word is missing in a document, it is considered as not present, which is equivalent to having a value of 0.
### If a feature is missing in some data points, Bernoulli Naive Bayes handles the missing values by treating them as not present, or 0. For example, suppose we have a dataset where each data point represents a document, and the features are binary variables that indicate the presence or absence of certain words in the document. If some documents do not contain one or more of the words, those features will have a missing value, and they will be treated as 0 in the model.
### When training the Bernoulli Naive Bayes model, the probability of a word given a class is estimated using the frequency of the word in the training documents that belong to that class. If a word is missing in some training documents, it will not contribute to the frequency count for that class, and it will be treated as if it does not exist in those documents. This means that missing values will not affect the estimation of the probabilities for the other features, and the model can still be trained and used for classification.
### In summary, Bernoulli Naive Bayes handles missing values by treating them as not present, or 0, and estimating the probabilities based on the available data.

## Ques 4:

### Ans: Yes, Gaussian Naive Bayes can be used for multi-class classification problems. Gaussian Naive Bayes is a variant of the Naive Bayes algorithm that is used when the features (input variables) are continuous and follow a Gaussian distribution. In multi-class classification problems, there are multiple classes or categories that each data point can be assigned to.
### To use Gaussian Naive Bayes for multi-class classification, the algorithm needs to be extended to handle multiple classes. One common approach is to use a "one-vs-all" strategy, where the model is trained to classify each class separately from all the others. This involves training a separate binary classification model for each class, where the target variable is 1 for data points in that class and 0 for all other classes.

## Ques 5:

### Ans:

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data'
data = pd.read_csv(url, header=None)

In [3]:
data.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,48,49,50,51,52,53,54,55,56,57
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [6]:
X = data.iloc[:,:-1]
y = data.iloc[:,-1]

In [9]:
from sklearn.model_selection import train_test_split

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=0)

### Bernoulli Naive Bayes

In [13]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import cross_val_score

In [12]:
bernoulli_nb = BernoulliNB()

In [20]:
print('Accuracy: ' ,cross_val_score(bernoulli_nb, X, y, cv=10, scoring='accuracy').mean())
print('Precision: ',cross_val_score(bernoulli_nb, X, y, cv=10, scoring='precision').mean())
print('Recall: ' ,cross_val_score(bernoulli_nb, X, y, cv=10, scoring='recall').mean())
print('f1_score: ', cross_val_score(bernoulli_nb, X, y, cv=10, scoring='f1').mean())

Accuracy:  0.8839380364047911
Precision:  0.8869617393737383
Recall:  0.8152389047416673
f1_score:  0.8481249015095276


### Multinomial Naive Bayes

In [21]:
from sklearn.naive_bayes import MultinomialNB

In [22]:
multinomial_nb = MultinomialNB()

In [23]:
print('Accuracy: ' ,cross_val_score(multinomial_nb, X, y, cv=10, scoring='accuracy').mean())
print('Precision: ',cross_val_score(multinomial_nb, X, y, cv=10, scoring='precision').mean())
print('Recall: ' ,cross_val_score(multinomial_nb, X, y, cv=10, scoring='recall').mean())
print('f1_score: ', cross_val_score(multinomial_nb, X, y, cv=10, scoring='f1').mean())

Accuracy:  0.7863496180326323
Precision:  0.7393175533565436
Recall:  0.7214983911116508
f1_score:  0.7282909724016348


### Gaussian Naive Bayes

In [24]:
from sklearn.naive_bayes import GaussianNB

In [25]:
gaussian_nb = GaussianNB()

In [26]:
print('Accuracy: ' ,cross_val_score(gaussian_nb, X, y, cv=10, scoring='accuracy').mean())
print('Precision: ',cross_val_score(gaussian_nb, X, y, cv=10, scoring='precision').mean())
print('Recall: ' ,cross_val_score(gaussian_nb, X, y, cv=10, scoring='recall').mean())
print('f1_score: ', cross_val_score(gaussian_nb, X, y, cv=10, scoring='f1').mean())

Accuracy:  0.8217730830896915
Precision:  0.7103733928118492
Recall:  0.9569516119239877
f1_score:  0.8130660909542995


Based on the results, we can see that Gaussian Naive Bayes performed the best, followed by Bernoulli Naive Bayes and Multinomial Naive Bayes. One reason why Gaussian Naive Bayes performed the best is that it assumes that the input features are continuous and normally distributed, which is a better assumption for some datasets. In the Spambase dataset, some features are continuous, which makes Gaussian Naive Bayes a good choice.