Naive Bayes Algorithm

Naive Bayes is a family of probabilistic machine learning algorithms based on Bayes' theorem. It's a popular algorithm for classification tasks, especially when dealing with high-dimensional data.

How Naive Bayes Works:

1. Bayes' Theorem: Naive Bayes is based on Bayes' theorem, which describes the probability of an event occurring given some prior knowledge.
2. Independence Assumption: Naive Bayes assumes that features are independent of each other, which simplifies the calculation of probabilities.
3. Calculate Probabilities: The algorithm calculates the probability of each class given the input features.
4. Predict Class: The algorithm predicts the class with the highest probability.

Types of Naive Bayes:

1. Gaussian Naive Bayes: Used for continuous features, assuming a Gaussian distribution.
2. Multinomial Naive Bayes: Used for discrete features, such as text classification.
3. Bernoulli Naive Bayes: Used for binary features.

Advantages:

1. Easy to Implement: Naive Bayes is a simple algorithm to implement.
2. Fast Training: Naive Bayes is relatively fast to train, especially compared to more complex algorithms.
3. Handling High-Dimensional Data: Naive Bayes can handle high-dimensional data with a large number of features.

Disadvantages:

1. Independence Assumption: The independence assumption may not always hold true, which can affect accuracy.
2. Sensitive to Feature Correlation: Naive Bayes can be sensitive to feature correlation, which can impact performance.

Applications:

1. Text Classification: Naive Bayes is often used for text classification tasks, such as spam detection and sentiment analysis.
2. Image Classification: Naive Bayes can be used for image classification tasks, such as object recognition.
3. Recommendation Systems: Naive Bayes can be used in recommendation systems to predict user preferences.

When to Use:

1. Simple Classification Tasks: Naive Bayes is suitable for simple classification tasks with a small number of features.
2. High-Dimensional Data: Naive Bayes is suitable for high-dimensional data with a large number of features.
3. Fast and Efficient: Naive Bayes is a good choice when speed and efficiency are important.

Naive Bayes is a popular and effective algorithm for classification tasks, especially when dealing with high-dimensional data. However, it's essential to consider the independence assumption and feature correlation when applying Naive Bayes to real-world problems.

In [7]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix


In [8]:
data = sns.load_dataset('iris')
X = data.drop('species', axis=1)
y = data['species']

In [9]:
test_size = 0.2
random_state = 42
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)

In [10]:
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
print(f'Accuracy: {accuracy*100:.2f}%')
print('Confusion Matrix:')
print(cm)

Accuracy: 100.00%
Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


In [None]:
model = BernoulliNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
print(f'Accuracy: {accuracy*100:.2f}%')
print('Confusion Matrix:')
print(cm)
# Bernoulli gives us the poor results because it wants the data in the binary like yes or no 0/1 and other
# Gaussian works well with continuous data like height, weight, age etc.

Accuracy: 30.00%
Confusion Matrix:
[[ 0 10  0]
 [ 0  9  0]
 [ 0 11  0]]


Bernoulli gives us the poor results because it wants the data in the binary like yes or no 0/1 and other