## **Naive Bayes Algorithm:**
**A Brief Overview**

Naive Bayes is a probabilistic machine learning algorithm used for classification tasks. It’s widely employed in various applications, including spam filtering, document classification, and sentiment prediction1. The name “naive” comes from the assumption that features used in the model are independent of each other. In other words, changing the value of one feature doesn’t directly influence the value of any other features used in the algorithm.

**Key Concepts:**

Conditional Probability: Before we delve into Naive Bayes, let’s understand conditional probability.

Consider these examples:

**Coin Toss and Fair Dice:** When you flip a fair coin, the probability of getting heads or tails is equal (50%). Similarly, when rolling a fair 6-faced dice, the probability of getting a 1 is 1/6 (approximately 0.166).


Playing Cards Example: If you pick a card from a deck, what’s the probability of getting a queen given that the card is a spade? Conditional probability helps us answer such questions.

**Bayes Rule:** The Naive Bayes algorithm is based on Bayes’ Theorem. It allows us to update our beliefs about an event based on new evidence. The formula for Bayes’ Theorem is:

P(A∣B)=P(B∣A)*P(A)​/P(B)

(P(A|B)): Probability of event A given evidence B.
(P(B|A)): Probability of evidence B given event A.
(P(A)): Prior probability of event A.
(P(B)): Prior probability of evidence B.

**Gaussian Naive Bayes:** A variant of Naive Bayes that assumes features follow a Gaussian (normal) distribution. It’s commonly used for continuous data.
- GNB is suitable for continuous features.
- It assumes that the features follow a normal (Gaussian) distribution

**Multinomial Naive Bayes:**
- MNB is commonly used for text classification tasks where we deal with discrete data like word counts in documents.
- It assumes that the features represent discrete frequencies or counts of events.

# **The goal is to predict the species based on these features.**

In [1]:
# Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

In [2]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [3]:
# Split data into training and testing sets
X = df.drop("species", axis=1)
y = df["species"]

In [4]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Naive Bayes classifier
nb_classifier = GaussianNB()

# Train the model
nb_classifier.fit(X_train, y_train)

# Make predictions
y_pred = nb_classifier.predict(X_test)

In [5]:
# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred) * 100
print(f"Accuracy: {accuracy:.2f} %")

Accuracy: 97.78 %


In [6]:
sample_data = [[5.0, 3.6, 1.4, 0.2]]
predicted_class = nb_classifier.predict(sample_data)
print(f'Predicted class for sample data: {predicted_class}')

Predicted class for sample data: ['setosa']




### **Conclusion**
The Gaussian Naive Bayes classifier, when properly implemented, can achieve an accuracy of around 96-98% on the Iris dataset