# Naive Bayes Assignment 2

Q 1 ANS:-

To solve this problem, we can use Bayes' theorem. Let's define the events as follows:

A: Employee uses the health insurance plan.
S: Employee is a smoker.

We are given the following probabilities:

P(A) = 0.7 (Probability that an employee uses the health insurance plan)
P(S|A) = 0.4 (Probability that an employee is a smoker given that they use the health insurance plan)

We want to find P(S|A), the probability that an employee is a smoker given that they use the health insurance plan.

Using Bayes' theorem:

P(S|A) = (P(A|S) * P(S)) / P(A)

P(A|S) is the probability that an employee uses the health insurance plan given that they are a smoker. However, this information is not provided in the problem statement, so we cannot directly calculate it.

We can still find P(S|A) by calculating the other terms using the given information.

P(S) is the probability that an employee is a smoker. It is not provided in the problem statement, so we cannot determine it.

P(A) is the probability that an employee uses the health insurance plan, which is given as 0.7.

So, without the value of P(S) or P(A|S), we cannot calculate P(S|A) accurately. Additional information is required to determine these probabilities and obtain an accurate result.

Q 2 ANS:-

Bernoulli Naive Bayes and Multinomial Naive Bayes are two variants of the Naive Bayes algorithm commonly used for text classification tasks. Here are the key differences between the two:

1. Feature Representation:
   - Bernoulli Naive Bayes: It assumes that features are binary or Boolean in nature, representing the presence or absence of a feature. It works with binary feature indicators.
   - Multinomial Naive Bayes: It assumes that features are discrete or count-based, typically representing the frequency or occurrence counts of features. It works with integer feature counts.

2. Feature Probability Estimation:
   - Bernoulli Naive Bayes: It estimates the probabilities of feature occurrences in each class using binary indicators (0 or 1). It calculates the probability of a feature being present or absent in each class.
   - Multinomial Naive Bayes: It estimates the probabilities of feature occurrences in each class based on the count or frequency of feature appearances. It calculates the probability of a feature having a certain count or frequency in each class.

3. Handling of Absent Features:
   - Bernoulli Naive Bayes: It considers the absence of a feature as informative and incorporates it into the probability estimation. It assigns a non-zero probability for the absence of a feature in each class.
   - Multinomial Naive Bayes: It typically ignores the absence of a feature and focuses only on the count or frequency of feature occurrences. It assumes that missing features are not informative.

4. Application Suitability:
   - Bernoulli Naive Bayes: It is suitable for text classification tasks where the presence or absence of specific words or features is important, such as spam detection based on the presence or absence of certain keywords in an email.
   - Multinomial Naive Bayes: It is suitable for text classification tasks where the frequency or count of features is essential, such as document categorization based on word frequency or presence in the document.

Overall, the choice between Bernoulli Naive Bayes and Multinomial Naive Bayes depends on the nature of the data and the representation of features in the specific classification problem. If features are binary (presence/absence) and the absence of features carries valuable information, Bernoulli Naive Bayes is preferred. If features are discrete/count-based (frequencies), and the absence of features is less informative, Multinomial Naive Bayes is typically used.

Q 3 ANS:-

Bernoulli Naive Bayes does not handle missing values explicitly. It assumes that features are binary in nature, representing the presence or absence of a feature. In the context of Bernoulli Naive Bayes, missing values are typically treated as the absence of a feature.

When using Bernoulli Naive Bayes, each feature is represented as a binary indicator variable, taking a value of 1 if the feature is present and 0 if it is absent. If a particular instance has a missing value for a feature, it is typically considered as the absence of that feature, and the corresponding binary indicator is set to 0.

By assuming that missing values indicate the absence of features, Bernoulli Naive Bayes implicitly handles missing values by treating them as informative about the absence of the feature. It assigns a non-zero probability for the absence of a feature in each class. This allows the algorithm to account for the absence of features when estimating the probabilities and making predictions.

It's important to note that this treatment of missing values as the absence of features might not always be appropriate, especially if the missing values themselves carry meaningful information. In such cases, other approaches like data imputation or using other variants of Naive Bayes (e.g., Gaussian or Multinomial) that handle missing values more explicitly may be more suitable.

Q 4 ANS:-

Yes, Gaussian Naive Bayes can be used for multi-class classification problems. While it is commonly used for binary classification tasks, Gaussian Naive Bayes can also be extended to handle multi-class problems by employing a "one-vs-rest" (also known as "one-vs-all") strategy.

In the one-vs-rest approach, a separate binary classifier is trained for each class, treating it as the positive class and the remaining classes as the negative class. During training, the Gaussian Naive Bayes model is fitted separately for each class, estimating the mean and variance for each feature in that class.

When classifying a new instance, each binary classifier is applied to predict the probability of the instance belonging to its respective class. The class with the highest probability is then assigned as the predicted class for the instance.

The one-vs-rest strategy allows Gaussian Naive Bayes to handle multi-class classification by breaking it down into multiple binary classification tasks. While each binary classifier assumes that the features follow a Gaussian distribution, the combination of these classifiers provides a way to predict the class probabilities for multi-class problems.

It's important to note that Gaussian Naive Bayes may not be the most sophisticated or accurate model for complex multi-class problems, especially if the independence assumption is violated or the feature distributions significantly deviate from a Gaussian distribution. In such cases, more advanced models like logistic regression, support vector machines (SVMs), or decision trees may be considered.

Q 5 ANS:-


In [11]:
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.naive_bayes import BernoulliNB,MultinomialNB,GaussianNB

In [2]:
df = pd.read_csv('spambase.csv')
df.head()

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
0,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
1,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
2,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,1.85,0.0,0.0,1.85,0.0,0.0,...,0.0,0.223,0.0,0.0,0.0,0.0,3.0,15,54,1


In [3]:
df.tail()

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
4595,0.31,0.0,0.62,0.0,0.0,0.31,0.0,0.0,0.0,0.0,...,0.0,0.232,0.0,0.0,0.0,0.0,1.142,3,88,0
4596,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.353,0.0,0.0,1.555,4,14,0
4597,0.3,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.102,0.718,0.0,0.0,0.0,0.0,1.404,6,118,0
4598,0.96,0.0,0.0,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.057,0.0,0.0,0.0,0.0,1.147,5,78,0
4599,0.0,0.0,0.65,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.125,0.0,0.0,1.25,5,40,0


In [4]:
df.describe()

Unnamed: 0,0,0.64,0.64.1,0.1,0.32,0.2,0.3,0.4,0.5,0.6,...,0.41,0.42,0.43,0.778,0.44,0.45,3.756,61,278,1
count,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,...,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0,4600.0
mean,0.104576,0.212922,0.280578,0.065439,0.312222,0.095922,0.114233,0.105317,0.090087,0.239465,...,0.038583,0.139061,0.01698,0.26896,0.075827,0.044248,5.191827,52.17087,283.290435,0.393913
std,0.305387,1.2907,0.50417,1.395303,0.672586,0.27385,0.39148,0.401112,0.278643,0.644816,...,0.243497,0.270377,0.109406,0.815726,0.245906,0.429388,31.732891,194.912453,606.413764,0.488669
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.588,6.0,35.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.065,0.0,0.0,0.0,0.0,2.2755,15.0,95.0,0.0
75%,0.0,0.0,0.42,0.0,0.3825,0.0,0.0,0.0,0.0,0.16,...,0.0,0.188,0.0,0.31425,0.052,0.0,3.70525,43.0,265.25,1.0
max,4.54,14.28,5.1,42.81,10.0,5.88,7.27,11.11,5.26,18.18,...,4.385,9.752,4.081,32.478,6.003,19.829,1102.5,9989.0,15841.0,1.0


In [5]:
x = df.drop('1',axis=1)
y = df['1']

In [6]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.3,random_state=0)

In [13]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)

In [8]:
BNB = BernoulliNB()
BNB.fit(x_train_scaled,y_train)

In [19]:
from sklearn.naive_bayes import MultinomialNB
MNB = MultinomialNB()
MNB.fit(x_train,y_train)

In [22]:
GNB = GaussianNB()
GNB.fit(x_train_scaled,y_train)

In [9]:
y_pred = BNB.predict(x_test)



In [23]:
y_pred1 = MNB.predict(x_test)

In [25]:
y_pred2 = GNB.predict(x_test)



In [10]:
from sklearn.metrics import accuracy_score,confusion_matrix,classification_report
print(confusion_matrix(y_pred,y_test))
print(classification_report(y_pred,y_test))
print(accuracy_score(y_pred,y_test))

[[614  13]
 [208 545]]
              precision    recall  f1-score   support

           0       0.75      0.98      0.85       627
           1       0.98      0.72      0.83       753

    accuracy                           0.84      1380
   macro avg       0.86      0.85      0.84      1380
weighted avg       0.87      0.84      0.84      1380

0.8398550724637681


In [24]:
print(confusion_matrix(y_pred1,y_test))
print(classification_report(y_pred1,y_test))
print(accuracy_score(y_pred1,y_test))

[[684 131]
 [138 427]]
              precision    recall  f1-score   support

           0       0.83      0.84      0.84       815
           1       0.77      0.76      0.76       565

    accuracy                           0.81      1380
   macro avg       0.80      0.80      0.80      1380
weighted avg       0.80      0.81      0.80      1380

0.8050724637681159


In [26]:
print(confusion_matrix(y_pred2,y_test))
print(classification_report(y_pred2,y_test))
print(accuracy_score(y_pred2,y_test))

[[821 546]
 [  1  12]]
              precision    recall  f1-score   support

           0       1.00      0.60      0.75      1367
           1       0.02      0.92      0.04        13

    accuracy                           0.60      1380
   macro avg       0.51      0.76      0.40      1380
weighted avg       0.99      0.60      0.74      1380

0.6036231884057971


# Hyperparameter tunning

In [27]:
import warnings 
warnings.filterwarnings('ignore')
from sklearn.model_selection import GridSearchCV

### Bernoulli Naive Bayes classifier 

In [72]:
parameter = { 'alpha': [0.1, 0.5, 1.0],
    'binarize': [0.0, 0.2, 0.5],
    'fit_prior': [True, False],
    'class_prior': [None, [0.3, 0.7], [0.4, 0.6]]
             }

In [74]:
clf = GridSearchCV(BNB,param_grid=parameter,refit=True,cv=10,scoring='accuracy',verbose=3)
clf.fit(x_train,y_train)

Fitting 10 folds for each of 54 candidates, totalling 540 fits
[CV 1/10] END alpha=0.1, binarize=0.0, class_prior=None, fit_prior=True;, score=0.910 total time=   0.0s
[CV 2/10] END alpha=0.1, binarize=0.0, class_prior=None, fit_prior=True;, score=0.854 total time=   0.0s
[CV 3/10] END alpha=0.1, binarize=0.0, class_prior=None, fit_prior=True;, score=0.882 total time=   0.0s
[CV 4/10] END alpha=0.1, binarize=0.0, class_prior=None, fit_prior=True;, score=0.898 total time=   0.0s
[CV 5/10] END alpha=0.1, binarize=0.0, class_prior=None, fit_prior=True;, score=0.932 total time=   0.0s
[CV 6/10] END alpha=0.1, binarize=0.0, class_prior=None, fit_prior=True;, score=0.885 total time=   0.0s
[CV 7/10] END alpha=0.1, binarize=0.0, class_prior=None, fit_prior=True;, score=0.879 total time=   0.0s
[CV 8/10] END alpha=0.1, binarize=0.0, class_prior=None, fit_prior=True;, score=0.835 total time=   0.0s
[CV 9/10] END alpha=0.1, binarize=0.0, class_prior=None, fit_prior=True;, score=0.894 total time=

In [75]:
clf.best_params_

{'alpha': 1.0, 'binarize': 0.2, 'class_prior': [0.3, 0.7], 'fit_prior': True}

In [77]:
BNB = BernoulliNB(alpha=1.0,binarize=0.2,class_prior=[0.3,0.7],fit_prior=True)
BNB.fit(x_train,y_train)

In [78]:
y_pred4 = BNB.predict(x_test)

In [79]:
print(confusion_matrix(y_pred4,y_test))
print(classification_report(y_pred4,y_test))
print(accuracy_score(y_pred4,y_test))

[[759  56]
 [ 63 502]]
              precision    recall  f1-score   support

           0       0.92      0.93      0.93       815
           1       0.90      0.89      0.89       565

    accuracy                           0.91      1380
   macro avg       0.91      0.91      0.91      1380
weighted avg       0.91      0.91      0.91      1380

0.913768115942029


### Multinomial Naive Bayes

In [63]:
parameter = { 'alpha': [0.1, 0.5, 1.0],
    'fit_prior': [True, False],
    'class_prior': [None, [0.3, 0.7], [0.4, 0.6]]
            }

In [65]:
clf = GridSearchCV(MNB,param_grid=parameter,refit=True,cv=10,scoring='accuracy',verbose=3)
clf.fit(x_train,y_train)

Fitting 10 folds for each of 18 candidates, totalling 180 fits
[CV 1/10] END alpha=0.1, class_prior=None, fit_prior=True;, score=0.789 total time=   0.0s
[CV 2/10] END alpha=0.1, class_prior=None, fit_prior=True;, score=0.764 total time=   0.0s
[CV 3/10] END alpha=0.1, class_prior=None, fit_prior=True;, score=0.817 total time=   0.0s
[CV 4/10] END alpha=0.1, class_prior=None, fit_prior=True;, score=0.804 total time=   0.0s
[CV 5/10] END alpha=0.1, class_prior=None, fit_prior=True;, score=0.773 total time=   0.0s
[CV 6/10] END alpha=0.1, class_prior=None, fit_prior=True;, score=0.835 total time=   0.0s
[CV 7/10] END alpha=0.1, class_prior=None, fit_prior=True;, score=0.767 total time=   0.0s
[CV 8/10] END alpha=0.1, class_prior=None, fit_prior=True;, score=0.764 total time=   0.0s
[CV 9/10] END alpha=0.1, class_prior=None, fit_prior=True;, score=0.804 total time=   0.0s
[CV 10/10] END alpha=0.1, class_prior=None, fit_prior=True;, score=0.783 total time=   0.0s
[CV 1/10] END alpha=0.1, c

In [66]:
clf.best_params_

{'alpha': 0.1, 'class_prior': None, 'fit_prior': True}

In [68]:
MNB = MultinomialNB(alpha= 0.1, class_prior=None, fit_prior=True)
MNB.fit(x_train,y_train)

In [69]:
y_pred5 = MNB.predict(x_test)

In [70]:
print(confusion_matrix(y_pred5,y_test))
print(classification_report(y_pred5,y_test))
print(accuracy_score(y_pred5,y_test))

[[684 131]
 [138 427]]
              precision    recall  f1-score   support

           0       0.83      0.84      0.84       815
           1       0.77      0.76      0.76       565

    accuracy                           0.81      1380
   macro avg       0.80      0.80      0.80      1380
weighted avg       0.80      0.81      0.80      1380

0.8050724637681159


### Gaussian Naive Bayes

Since the Gaussian Naive Bayes classifier in scikit-learn does not have tunable hyperparameters, there are no parameter values to specify in an array format. The Gaussian Naive Bayes classifier assumes that the features follow a Gaussian distribution and does not require any additional parameters to be set.

In [83]:
GNB = GaussianNB()
GNB.fit(x_train,y_train)

In [84]:
y_pred3 = GNB.predict(x_test)

In [85]:
print(confusion_matrix(y_pred3,y_test))
print(classification_report(y_pred3,y_test))
print(accuracy_score(y_pred3,y_test))

[[600  15]
 [222 543]]
              precision    recall  f1-score   support

           0       0.73      0.98      0.84       615
           1       0.97      0.71      0.82       765

    accuracy                           0.83      1380
   macro avg       0.85      0.84      0.83      1380
weighted avg       0.86      0.83      0.83      1380

0.8282608695652174
