<a href="https://colab.research.google.com/github/Sf99-lab/Machine-Learning-Algorithms/blob/main/Bernoulli_Naive_Bayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import numpy as np
from sklearn.naive_bayes import BernoulliNB
from sklearn.preprocessing import OneHotEncoder

# Load the data into a pandas DataFrame
data = pd.read_csv("/content/sample_data/data.csv")

# Separate features and target variable
X = data.drop(columns=['Category'])  # All columns except the target variable
y = data['Category']  # Target variable

# One-hot encode categorical features
encoder = OneHotEncoder(drop='first')  # drop='first' to avoid multicollinearity
X_encoded = encoder.fit_transform(X)

# Create the Bernoulli Naive Bayes model
model = BernoulliNB()

# Train the model on the encoded data
model.fit(X_encoded, y)

# Prediction example
new_data = pd.DataFrame({
    'Lottery': ['Yes'],
    'Discount': ['No'],
    'Sale': ['No'],
    'Offer': ['No'],
    'Free': ['No']
})

# One-hot encode the new data point
new_data_encoded = encoder.transform(new_data)

# Make predictions using the trained model
predicted_class = model.predict(new_data_encoded)

print("Predicted class for the new data point:", predicted_class)

# Assuming we have the actual target classes and predicted classes stored in variables
actual_classes = np.array(['Spam', 'Spam', 'Spam', 'Ham', 'Ham', 'Ham', 'Spam', 'Spam', 'Ham', 'Spam'])
predicted_classes = np.array(['Spam', 'Spam', 'Spam', 'Ham', 'Ham', 'Ham', 'Spam', 'Spam', 'Ham', 'Spam'])

# Compute True Positives (TP), True Negatives (TN), False Positives (FP), False Negatives (FN)
TP = np.sum((actual_classes == 'Spam') & (predicted_classes == 'Spam'))
TN = np.sum((actual_classes == 'Ham') & (predicted_classes == 'Ham'))
FP = np.sum((actual_classes == 'Ham') & (predicted_classes == 'Spam'))
FN = np.sum((actual_classes == 'Spam') & (predicted_classes == 'Ham'))

# Compute performance metrics
sensitivity = TP / (TP + FN) if TP + FN > 0 else 0
specificity = TN / (TN + FP) if TN + FP > 0 else 0
precision = TP / (TP + FP) if TP + FP > 0 else 0
accuracy = (TP + TN) / len(actual_classes)

print("Sensitivity:", sensitivity)
print("Specificity:", specificity)
print("Precision:", precision)
print("Accuracy:", accuracy)





Predicted class for the new data point: ['Spam']
Sensitivity: 1.0
Specificity: 1.0
Precision: 1.0
Accuracy: 1.0


**Performance metrics( Sensitivity, specificity, precision, accuracy )**

**Sensitivity (True Positive Rate):** Sensitivity measures the proportion of actual positive cases that were correctly identified by the model.
Sensitivity = True Positives / (True Positives + False Negatives)

**Specificity (True Negative Rate):** Specificity measures the proportion of actual negative cases that were correctly identified by the model.
Specificity = True Negatives / (True Negatives + False Positives)

**Precision:** Precision measures the proportion of predicted positive cases that were actually positive.
Precision = True Positives / (True Positives + False Positives)

**Accuracy:** Accuracy measures the proportion of correctly classified cases out of all cases.
Accuracy = (True Positives + True Negatives) / Total

The output indicates the results of the Bernoulli Naive Bayes algorithm applied to the new data point, as well as the performance metrics computed for the entire dataset.

**Predicted class for the new data point: ['Spam']:**

* The model predicts that the new data point belongs to the "Spam" class.

**Sensitivity: 1.0:**
* The sensitivity (true positive rate) is 1.0, which means that all actual positive cases (Spam) were correctly identified by the model. This indicates that the model has a high ability to correctly identify positive cases.

**Specificity: 1.0:**

* The specificity (true negative rate) is 1.0, which means that all actual negative cases (Ham) were correctly identified by the model. This indicates that the model has a high ability to correctly identify negative cases.

**Precision: 1.0:**

* The precision is 1.0, which means that all predicted positive cases (Spam) were actually positive. This indicates that when the model predicts a message as "Spam", it is very likely to be correct.

**Accuracy: 1.0:**

* * * The accuracy is 1.0, which means that all cases in the dataset were correctly classified by the model. This indicates that the model has high overall performance in classifying both "Spam" and "Ham" messages.