### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?

### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?

### Q3. How does Bernoulli Naive Bayes handle missing values?

### Q4. Can Gaussian Naive Bayes be used for multi-class classification?



### Q5. Assignment:
#### Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/
datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.
#### Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
#### Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score
#### Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
#### Conclusion:
Summarise your findings and provide some suggestions for future work.

## Answers

### Q1. A company conducted a survey of its employees and found that 70% of the employees use the company's health insurance plan, while 40% of the employees who use the plan are smokers. What is the probability that an employee is a smoker given that he/she uses the health insurance plan?



1. The probability that an employee uses the health insurance plan:
P(Uses Insurance)=0.70 (70% of employees use the plan).

2. The probability that an employee who uses the health insurance plan is a smoker: 
P(Smoker | Uses Insurance)=0.40.

P(Smoker | Uses Insurance)= P(Smoker and Uses Insurance)/P(Uses Insurance)


P(Smoker | Uses Insurance)= P(Uses Insurance | Smoker)⋅P(Smoker)/P(Uses Insurance)

 
P(Smoker | Uses Insurance)= 0.40⋅0.70/.70

P(Smoker | Uses Insurance) =0.40



### Q2. What is the difference between Bernoulli Naive Bayes and Multinomial Naive Bayes?



#### Multinomial Naive Bayes (MNB):

##### Data Type: 
MNB is well-suited for discrete data, especially when dealing with text data or categorical features.

##### Assumption: 
It assumes that features represent counts or frequencies of discrete values.

##### Use Cases: 
MNB is frequently used for text classification tasks like document categorization, spam email detection, and sentiment analysis, where the data is often represented as word frequencies or term presence/absence.

#### Bernoulli Naive Bayes (BNB):

##### Data Type: 
BNB is suitable for binary data, where features are binary (0 or 1) to represent the absence or presence of a particular attribute.

##### Assumption: 
It assumes that features are binary variables.

##### Use Cases: 
BNB is commonly used for text classification tasks involving binary features, such as text document classification based on the presence or absence of specific keywords.

### Q3. How does Bernoulli Naive Bayes handle missing values?



Bernoulli Naive Bayes is a variant of the Naive Bayes algorithm specifically designed for binary data, where features are binary variables indicating the presence or absence of certain attributes or events. It assumes that the features are conditionally independent given the class label, and it models the probability of each feature being either 0 (absent) or 1 (present) for each class.

#### Impute Missing Values:

- One approach is to impute (fill in) the missing values with either 0 or 1, depending on the nature of the data and the problem.
- You might impute missing values with 0 if it's more appropriate for your problem or impute them with 1 if that's more suitable.
- The choice of imputation method should be made based on domain knowledge and the specific problem you are solving.

#### Encode Missing Values:

- Another approach is to encode the missing values as a separate category or a specific binary value (e.g., -1), indicating that the presence or absence of the feature is unknown.
- In this case, you would treat the missing values as an additional category when calculating probabilities.
- This approach allows you to explicitly account for the uncertainty introduced by missing data.
 
#### Ignore Missing Values:

- Depending on the extent of missing data and the nature of your problem, you might choose to ignore instances with missing values or ignore the features with missing values altogether.
- If the missing values are relatively rare and not informative, omitting them may not significantly impact the classification results.

### Q4. Can Gaussian Naive Bayes be used for multi-class classification?

Yes, Gaussian Naive Bayes can be used for multi-class classification tasks. Gaussian Naive Bayes is an extension of the Naive Bayes algorithm that is suitable for continuous data, and it can be adapted to handle multiple classes

In multi-class classification with Gaussian Naive Bayes, the algorithm models the probability distribution of each class as a Gaussian (normal) distribution with its own mean and variance for each feature. When you want to classify a new instance into one of the multiple classes, the algorithm calculates the likelihood of the instance belonging to each class based on the Gaussian distribution parameters and then applies Bayes' theorem to compute the posterior probabilities for each class.

### Q5. Assignment:
#### Data preparation:
Download the "Spambase Data Set" from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Spambase). This dataset contains email messages, where the goal is to predict whether a message
is spam or not based on several input features.
#### Implementation:
Implement Bernoulli Naive Bayes, Multinomial Naive Bayes, and Gaussian Naive Bayes classifiers using the
scikit-learn library in Python. Use 10-fold cross-validation to evaluate the performance of each classifier on the
dataset. You should use the default hyperparameters for each classifier.
#### Results:
Report the following performance metrics for each classifier:
Accuracy
Precision
Recall
F1 score
#### Discussion:
Discuss the results you obtained. Which variant of Naive Bayes performed the best? Why do you think that is
the case? Are there any limitations of Naive Bayes that you observed?
#### Conclusion:
Summarise your findings and provide some suggestions for future work.

In [1]:
import pandas as pd

# Define the URL of the dataset file you downloaded
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"

# Define the column names for the dataset (you can find these names in the dataset description)
column_names = [
    "word_freq_make", "word_freq_address", "word_freq_all", "word_freq_3d",
    "word_freq_our", "word_freq_over", "word_freq_remove", "word_freq_internet",
    "word_freq_order", "word_freq_mail", "word_freq_receive", "word_freq_will",
    "word_freq_people", "word_freq_report", "word_freq_addresses", "word_freq_free",
    "word_freq_business", "word_freq_email", "word_freq_you", "word_freq_credit",
    "word_freq_your", "word_freq_font", "word_freq_000", "word_freq_money",
    "word_freq_hp", "word_freq_hpl", "word_freq_george", "word_freq_650",
    "word_freq_lab", "word_freq_labs", "word_freq_telnet", "word_freq_857",
    "word_freq_data", "word_freq_415", "word_freq_85", "word_freq_technology",
    "word_freq_1999", "word_freq_parts", "word_freq_pm", "word_freq_direct",
    "word_freq_cs", "word_freq_meeting", "word_freq_original", "word_freq_project",
    "word_freq_re", "word_freq_edu", "word_freq_table", "word_freq_conference",
    "char_freq_;", "char_freq_(", "char_freq_[", "char_freq_!",
    "char_freq_$", "char_freq_#", "capital_run_length_average",
    "capital_run_length_longest", "capital_run_length_total", "is_spam"
]

# Load the dataset into a DataFrame
df = pd.read_csv(url, header=None, names=column_names)

# Display the first few rows of the DataFrame to verify the data
df.head()


Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total,is_spam
0,0.0,0.64,0.64,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.778,0.0,0.0,3.756,61,278,1
1,0.21,0.28,0.5,0.0,0.14,0.28,0.21,0.07,0.0,0.94,...,0.0,0.132,0.0,0.372,0.18,0.048,5.114,101,1028,1
2,0.06,0.0,0.71,0.0,1.23,0.19,0.19,0.12,0.64,0.25,...,0.01,0.143,0.0,0.276,0.184,0.01,9.821,485,2259,1
3,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.137,0.0,0.137,0.0,0.0,3.537,40,191,1
4,0.0,0.0,0.0,0.0,0.63,0.0,0.31,0.63,0.31,0.63,...,0.0,0.135,0.0,0.135,0.0,0.0,3.537,40,191,1


In [2]:
df.tail()

Unnamed: 0,word_freq_make,word_freq_address,word_freq_all,word_freq_3d,word_freq_our,word_freq_over,word_freq_remove,word_freq_internet,word_freq_order,word_freq_mail,...,char_freq_;,char_freq_(,char_freq_[,char_freq_!,char_freq_$,char_freq_#,capital_run_length_average,capital_run_length_longest,capital_run_length_total,is_spam
4596,0.31,0.0,0.62,0.0,0.0,0.31,0.0,0.0,0.0,0.0,...,0.0,0.232,0.0,0.0,0.0,0.0,1.142,3,88,0
4597,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.353,0.0,0.0,1.555,4,14,0
4598,0.3,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.102,0.718,0.0,0.0,0.0,0.0,1.404,6,118,0
4599,0.96,0.0,0.0,0.0,0.32,0.0,0.0,0.0,0.0,0.0,...,0.0,0.057,0.0,0.0,0.0,0.0,1.147,5,78,0
4600,0.0,0.0,0.65,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.125,0.0,0.0,1.25,5,40,0


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4601 entries, 0 to 4600
Data columns (total 58 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   word_freq_make              4601 non-null   float64
 1   word_freq_address           4601 non-null   float64
 2   word_freq_all               4601 non-null   float64
 3   word_freq_3d                4601 non-null   float64
 4   word_freq_our               4601 non-null   float64
 5   word_freq_over              4601 non-null   float64
 6   word_freq_remove            4601 non-null   float64
 7   word_freq_internet          4601 non-null   float64
 8   word_freq_order             4601 non-null   float64
 9   word_freq_mail              4601 non-null   float64
 10  word_freq_receive           4601 non-null   float64
 11  word_freq_will              4601 non-null   float64
 12  word_freq_people            4601 non-null   float64
 13  word_freq_report            4601 

In [4]:
df.isnull().sum()

word_freq_make                0
word_freq_address             0
word_freq_all                 0
word_freq_3d                  0
word_freq_our                 0
word_freq_over                0
word_freq_remove              0
word_freq_internet            0
word_freq_order               0
word_freq_mail                0
word_freq_receive             0
word_freq_will                0
word_freq_people              0
word_freq_report              0
word_freq_addresses           0
word_freq_free                0
word_freq_business            0
word_freq_email               0
word_freq_you                 0
word_freq_credit              0
word_freq_your                0
word_freq_font                0
word_freq_000                 0
word_freq_money               0
word_freq_hp                  0
word_freq_hpl                 0
word_freq_george              0
word_freq_650                 0
word_freq_lab                 0
word_freq_labs                0
word_freq_telnet              0
word_fre

In [5]:
X=df.iloc[:,:-1]
y=df.iloc[:,-1]

In [6]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.25,random_state=0)
X_train.shape, X_test.shape

((3450, 57), (1151, 57))

In [7]:
from sklearn.naive_bayes import GaussianNB,MultinomialNB,BernoulliNB

In [8]:
from sklearn.model_selection import GridSearchCV

In [14]:
import numpy as np
param_grid_nb = {
    'var_smoothing': np.logspace(0,-9, num=100)
}

In [15]:
grid=GridSearchCV(GaussianNB(),param_grid=param_grid_nb,cv=10,scoring="accuracy",verbose=3)

In [16]:
grid.fit(X_train,y_train)

Fitting 10 folds for each of 100 candidates, totalling 1000 fits
[CV 1/10] END ................var_smoothing=1.0;, score=0.623 total time=   0.0s
[CV 2/10] END ................var_smoothing=1.0;, score=0.641 total time=   0.0s
[CV 3/10] END ................var_smoothing=1.0;, score=0.646 total time=   0.0s
[CV 4/10] END ................var_smoothing=1.0;, score=0.638 total time=   0.0s
[CV 5/10] END ................var_smoothing=1.0;, score=0.643 total time=   0.0s
[CV 6/10] END ................var_smoothing=1.0;, score=0.626 total time=   0.0s
[CV 7/10] END ................var_smoothing=1.0;, score=0.638 total time=   0.0s
[CV 8/10] END ................var_smoothing=1.0;, score=0.629 total time=   0.0s
[CV 9/10] END ................var_smoothing=1.0;, score=0.626 total time=   0.0s
[CV 10/10] END ...............var_smoothing=1.0;, score=0.649 total time=   0.0s
[CV 1/10] END .var_smoothing=0.8111308307896871;, score=0.626 total time=   0.0s
[CV 2/10] END .var_smoothing=0.8111308307896

In [17]:
grid.best_params_

{'var_smoothing': 1.519911082952933e-06}

In [20]:
gnb=GaussianNB(var_smoothing= 1.519911082952933e-06)

In [21]:
gnb.fit(X_train,y_train)

In [22]:
y_pred=gnb.predict(X_test)

In [23]:
from sklearn.metrics import confusion_matrix

In [24]:
conf_mat = confusion_matrix(y_test,y_pred)
conf_mat

array([[606,  85],
       [ 86, 374]])

In [25]:
true_positive = conf_mat[0][0]
false_positive = conf_mat[0][1]
false_negative = conf_mat[1][0]
true_negative = conf_mat[1][1]

In [26]:
Accuracy = (true_positive + true_negative) / (true_positive +false_positive + false_negative + true_negative)
Accuracy

0.8514335360556038

In [27]:
Precision = true_positive/(true_positive+false_positive)
Precision

0.8769898697539797

In [28]:
Recall = true_positive/(true_positive+false_negative)
Recall

0.8757225433526011

In [29]:
F1_Score = 2*(Recall * Precision) / (Recall + Precision)
F1_Score

0.8763557483731019

### Bernoulli Naive Bayes

In [30]:
from sklearn.naive_bayes import BernoulliNB

In [31]:
param_grid_bnb={"alpha":[1,2,3,4,5]
}

In [32]:
grid_bnb=GridSearchCV(BernoulliNB(),param_grid=param_grid_bnb,cv=10,scoring="accuracy",verbose=3)

In [33]:
grid_bnb.fit(X_train,y_train)

Fitting 10 folds for each of 5 candidates, totalling 50 fits
[CV 1/10] END ..........................alpha=1;, score=0.893 total time=   0.0s
[CV 2/10] END ..........................alpha=1;, score=0.893 total time=   0.0s
[CV 3/10] END ..........................alpha=1;, score=0.878 total time=   0.0s
[CV 4/10] END ..........................alpha=1;, score=0.878 total time=   0.0s
[CV 5/10] END ..........................alpha=1;, score=0.881 total time=   0.0s
[CV 6/10] END ..........................alpha=1;, score=0.893 total time=   0.0s
[CV 7/10] END ..........................alpha=1;, score=0.890 total time=   0.0s
[CV 8/10] END ..........................alpha=1;, score=0.890 total time=   0.0s
[CV 9/10] END ..........................alpha=1;, score=0.887 total time=   0.0s
[CV 10/10] END .........................alpha=1;, score=0.910 total time=   0.0s
[CV 1/10] END ..........................alpha=2;, score=0.893 total time=   0.0s
[CV 2/10] END ..........................alpha=2;

In [34]:
grid_bnb.best_params_

{'alpha': 1}

In [35]:
bnb=BernoulliNB(alpha=1)

In [36]:
bnb.fit(X_train,y_train)

In [37]:
y_pred_bnb=bnb.predict(X_test)

In [38]:
conf_mat = confusion_matrix(y_test,y_pred_bnb)
conf_mat

array([[642,  49],
       [ 94, 366]])

In [39]:
true_positive = conf_mat[0][0]
false_positive = conf_mat[0][1]
false_negative = conf_mat[1][0]
true_negative = conf_mat[1][1]

In [40]:
Accuracy = (true_positive + true_negative) / (true_positive +false_positive + false_negative + true_negative)
Accuracy

0.8757602085143353

In [41]:
Precision = true_positive/(true_positive+false_positive)
Precision

0.9290882778581766

In [42]:
Recall = true_positive/(true_positive+false_negative)
Recall

0.8722826086956522

In [43]:
F1_Score = 2*(Recall * Precision) / (Recall + Precision)
F1_Score

0.8997897687456201

### Multinomial Naive Bayes

In [44]:
from sklearn.naive_bayes import MultinomialNB

In [46]:
param_grid_mnb={ "alpha":[1,2,3,4,5]

}

In [47]:
grid_mnb=GridSearchCV(MultinomialNB(),param_grid=param_grid_mnb,cv=10,scoring="accuracy",verbose=3)

In [48]:
grid_mnb.fit(X_train,y_train)

Fitting 10 folds for each of 5 candidates, totalling 50 fits
[CV 1/10] END ..........................alpha=1;, score=0.835 total time=   0.0s
[CV 2/10] END ..........................alpha=1;, score=0.812 total time=   0.0s
[CV 3/10] END ..........................alpha=1;, score=0.765 total time=   0.0s
[CV 4/10] END ..........................alpha=1;, score=0.835 total time=   0.0s
[CV 5/10] END ..........................alpha=1;, score=0.803 total time=   0.0s
[CV 6/10] END ..........................alpha=1;, score=0.800 total time=   0.0s
[CV 7/10] END ..........................alpha=1;, score=0.803 total time=   0.0s
[CV 8/10] END ..........................alpha=1;, score=0.803 total time=   0.0s
[CV 9/10] END ..........................alpha=1;, score=0.803 total time=   0.0s
[CV 10/10] END .........................alpha=1;, score=0.751 total time=   0.0s
[CV 1/10] END ..........................alpha=2;, score=0.832 total time=   0.0s
[CV 2/10] END ..........................alpha=2;

In [49]:
grid_mnb.best_params_

{'alpha': 1}

In [50]:
mnb=MultinomialNB(alpha=1)

In [51]:
mnb.fit(X_train,y_train)

In [53]:
y_pred_mnb=mnb.predict(X_test)

In [54]:
conf_mat = confusion_matrix(y_test,y_pred_mnb)
conf_mat

array([[584, 107],
       [124, 336]])

In [55]:
true_positive = conf_mat[0][0]
false_positive = conf_mat[0][1]
false_negative = conf_mat[1][0]
true_negative = conf_mat[1][1]

In [56]:
Accuracy = (true_positive + true_negative) / (true_positive +false_positive + false_negative + true_negative)
Accuracy

0.7993049522154648

In [57]:
Precision = true_positive/(true_positive+false_positive)
Precision

0.8451519536903039

In [58]:
Recall = true_positive/(true_positive+false_negative)
Recall

0.8248587570621468

In [59]:
F1_Score = 2*(Recall * Precision) / (Recall + Precision)
F1_Score

0.8348820586132952

### Conclusion : Bernualli Naive Bayes Give Highest Accuracy