# Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

## We are given: P(using health insurance plan) = 0.7, P(smoker | using health insurance plan) = 0.4
## We want to find: P(smoker | using health insurance plan)
## We can use Bayes' theorem to calculate this probability: P(smoker | using health insurance plan) = P(using health insurance plan | smoker) * P(smoker) / P(using health insurance plan)
## We don't have direct information about P(using health insurance plan | smoker), but we can use the law of total probability to calculate it: P(using health insurance plan | smoker) = P(using health insurance plan and smoker) / P(smoker)
## We can rearrange the given information to find P(using health insurance plan and smoker): P(smoker | using health insurance plan) * P(using health insurance plan) = P(using health insurance plan and smoker)
## Plugging in the values, we get: P(using health insurance plan and smoker) = 0.4 * 0.7 = 0.28
## We can find P(smoker) using the law of total probability: P(smoker) = P(smoker | using health insurance plan) * P(using health insurance plan) + P(smoker | not using health insurance plan) * P(not using health insurance plan)
## We don't have direct information about P(smoker | not using health insurance plan), but we can assume that it is the same as P(smoker | using health insurance plan). This is not necessarily true in general, but it is a reasonable assumption to make for this problem.
## Plugging in the values, we get: P(smoker) = 0.4 * 0.7 + 0.4 * 0.3 = 0.28 + 0.12 = 0.4
## Finally, we can plug in all the values into Bayes' theorem to find the probability we want: P(smoker | using health insurance plan) = (0.28 / 0.4) * 0.4 / 0.7 ≈ 0.8
## Therefore, the probability that an employee is a smoker given that he/she uses the health insurance plan is approximately 0.8, or 80%.

# Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

## Bernoulli Naive Bayes and Multinomial Naive Bayes are two types of Naive Bayes classifiers that are commonly used for text classification tasks. The main difference between the two lies in the way they represent the data.
## Bernoulli Naive Bayes assumes that each feature (or word) is binary, meaning that it is either present or absent in the document. For example, in a spam classification task, the features could be the presence or absence of certain words in the email. If a particular word is present in the email, its corresponding feature value would be 1, and if it is absent, the value would be 0. Bernoulli Naive Bayes calculates the likelihood of each feature given the class, and uses these probabilities to classify new instances.
## Multinomial Naive Bayes, on the other hand, assumes that each feature represents a count of the number of times it occurs in the document. For example, in a sentiment analysis task, the features could be the frequency of certain words in a given review. If a particular word occurs twice in the review, its corresponding feature value would be 2. Multinomial Naive Bayes calculates the likelihood of each feature given the class, taking into account the frequency of each feature, and uses these probabilities to classify new instances.
## In summary, the main difference between Bernoulli Naive Bayes and Multinomial Naive Bayes lies in the way they represent the data - binary for Bernoulli and count-based for Multinomial. The choice between the two depends on the specific problem at hand and the characteristics of the data. If the data is binary in nature and the focus is on presence/absence of features, Bernoulli Naive Bayes may be more appropriate. If the data represents frequency counts, Multinomial Naive Bayes may be a better choice.

# Q3. How does Bernoulli Naive Bayes handle missing values?

## Bernoulli Naive Bayes assumes that each feature is binary, meaning that it is either present or absent in the document. If a feature is missing (i.e., its value is unknown), it can be treated as if it is absent. This is known as the "missing-at-random" assumption, which assumes that the probability of a missing value is independent of the true value of the feature, given the class.
## In practice, when using Bernoulli Naive Bayes for classification, missing values can be handled by simply ignoring them and treating them as if they were absent. This is because the probability of a missing value occurring in a document is relatively low, and so the impact of missing values on classification accuracy is usually small. However, if missing values occur frequently or are systematically related to the class variable, the model's accuracy may be compromised.
## In cases where missing values occur frequently or are systematically related to the class variable, more advanced techniques can be used to handle missing values. For example, imputation methods can be used to estimate missing values based on other features in the dataset, or more complex models such as Decision Trees or Random Forests can be used to handle missing values.

# Q4. Can Gaussian Naive Bayes be used for multi-class classification?

## Yes, Gaussian Naive Bayes can be used for multi-class classification. Gaussian Naive Bayes is a probabilistic algorithm that can be used for classification problems where the input features are continuous-valued. In the case of multi-class classification, Gaussian Naive Bayes assumes that the conditional probability distribution of the input features given the class label follows a Gaussian distribution. It then calculates the posterior probability of each class given the input features using Bayes' theorem and chooses the class with the highest probability as the predicted class. One way to handle multi-class classification using Gaussian Naive Bayes is to use the "one-vs-all" or "one-vs-rest" approach. In this approach, a separate binary classification model is trained for each class label. Each binary classifier predicts whether an input belongs to that class or not. The final prediction is made by selecting the class with the highest probability among all the binary classifiers.
## Scikit-learn provides an implementation of Gaussian Naive Bayes in its GaussianNB class, which supports multi-class classification. You can use the fit method to train the model on the training data and the predict method to make predictions on new data. Here is an example code snippet in Python:

In [1]:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load the iris dataset and split it into training and test sets
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Train a Gaussian Naive Bayes classifier on the training set
clf = GaussianNB()
clf.fit(X_train, y_train)

# Make predictions on the test set
y_pred = clf.predict(X_test)

# Calculate the accuracy of the model on the test set
score = clf.score(X_test, y_test)
print("Accuracy on test set: {:.2f}".format(score))


Accuracy on test set: 1.00


## In this example, we load the iris dataset and split it into training and test sets using train_test_split. We then train a Gaussian Naive Bayes classifier on the training set using GaussianNB and make predictions on the test set using the predict method. Finally, we calculate the accuracy of the model on the test set using the score method. This code can be easily extended to handle multi-class classification problems with more than two classes by using the "one-vs-all" approach described above.

# Q5. Assignment: Implementation:Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the dataset. You should use the default hyperparameters for each classifier.

In [3]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

In [4]:
data = pd.read_csv('spambase.data')
data.head(2)

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
0,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
1,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1


In [5]:
data.isna().sum()

0         0
0.64      0
0.64.1    0
0.1       0
0.32      0
0.2       0
0.3       0
0.4       0
0.5       0
0.6       0
0.7       0
0.64.2    0
0.8       0
0.9       0
0.10      0
0.32.1    0
0.11      0
1.29      0
1.93      0
0.12      0
0.96      0
0.13      0
0.14      0
0.15      0
0.16      0
0.17      0
0.18      0
0.19      0
0.20      0
0.21      0
0.22      0
0.23      0
0.24      0
0.25      0
0.26      0
0.27      0
0.28      0
0.29      0
0.30      0
0.31      0
0.33      0
0.34      0
0.35      0
0.36      0
0.37      0
0.38      0
0.39      0
0.40      0
0.41      0
0.42      0
0.43      0
0.778     0
0.44      0
0.45      0
3.756     0
61        0
278       0
1         0
dtype: int64

In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4600 entries, 0 to 4599
Data columns (total 58 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   0       4600 non-null   float64
 1   0.64    4600 non-null   float64
 2   0.64.1  4600 non-null   float64
 3   0.1     4600 non-null   float64
 4   0.32    4600 non-null   float64
 5   0.2     4600 non-null   float64
 6   0.3     4600 non-null   float64
 7   0.4     4600 non-null   float64
 8   0.5     4600 non-null   float64
 9   0.6     4600 non-null   float64
 10  0.7     4600 non-null   float64
 11  0.64.2  4600 non-null   float64
 12  0.8     4600 non-null   float64
 13  0.9     4600 non-null   float64
 14  0.10    4600 non-null   float64
 15  0.32.1  4600 non-null   float64
 16  0.11    4600 non-null   float64
 17  1.29    4600 non-null   float64
 18  1.93    4600 non-null   float64
 19  0.12    4600 non-null   float64
 20  0.96    4600 non-null   float64
 21  0.13    4600 non-null   float64
 22  

In [7]:
X= data.iloc[:,:-1]
y=data['1']

In [20]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y , test_size= 0.3, random_state=42)
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

In [21]:
# automate the model training process
models= {
    'Naive Bayes(M)' : MultinomialNB(),
    'Naive Bayes(G)': GaussianNB(),
    'Naive Bayes(B)' : BernoulliNB()
}

In [22]:
def evaluate_model(X_train,y_train, X_test,y_test,models):
    report = {}
    for i in range(len(models)):
        model = list(models.values())[i]
        # model training
        model.fit(X_train, y_train)
        
        
        #predict test data
        y_test_pred = model.predict(X_test)
        
        
        #accuracy
        test_model_score = accuracy_score(y_test,y_test_pred)
        
        report[list(models.keys())[i]] = test_model_score
        
        
    return report

In [23]:
evaluate_model(X_train,y_train, X_test,y_test,models)

{'Naive Bayes(M)': 0.7717391304347826,
 'Naive Bayes(G)': 0.8181159420289855,
 'Naive Bayes(B)': 0.8717391304347826}

In [25]:
from sklearn.svm import SVC
svc=SVC(kernel='linear')
svc.fit(X_train,y_train)
y_pred=svc.predict(X_test)
print(classification_report(y_test,y_pred))
print(confusion_matrix(y_test,y_pred))
print(accuracy_score(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.92      0.94      0.93       803
           1       0.91      0.88      0.90       577

    accuracy                           0.91      1380
   macro avg       0.91      0.91      0.91      1380
weighted avg       0.91      0.91      0.91      1380

[[753  50]
 [ 68 509]]
0.9144927536231884


In [27]:
con_mat = confusion_matrix(y_test,y_pred)
TP=con_mat[0][0]
FP = con_mat[1][0]
FN = con_mat[1][1]
FP = con_mat[1][0]

In [30]:
accuracy = (TP+TN)/(TP+TN+FP+FN)
print(accuracy)
precision = TP/(TP+FP)
print(precision)
recall = TP/(TP + FN)
print(recall)
F1_score= 2*precision*recall/(precision+recall)
print(F1_score)

NameError: name 'TN' is not defined