# Naive Bayes algorithm

The Naive Bayes algorithm is a popular probabilistic classification algorithm based on Bayes' theorem with the "naive" assumption of feature independence. It's widely used for text classification, spam filtering, sentiment analysis, and other classification tasks in machine learning.

## How Naive Bayes works:



Bayes' Theorem: Naive Bayes algorithm is based on Bayes' theorem, which describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

Naive Assumption: Naive Bayes assumes that all features are independent of each other given the class label. This simplifies the computation of probabilities and makes the algorithm computationally efficient.
Classification: Given a new data point with features, Naive Bayes calculates the probability of each class label, given these features, using Bayes' theorem. The class label with the highest probability is then predicted for the new data point.

## Pros of Naive Bayes:

Simple and Fast: Naive Bayes is easy to implement and computationally efficient, making it suitable for large datasets.

Handles High-Dimensional Data: It performs well even with a large number of features, making it suitable for text classification and other high-dimensional datasets.

Requires Less Training Data: Naive Bayes can work well with less training data compared to other algorithms.
Good with Categorical Data: It can handle both numerical and categorical data, making it versatile.


## Cons of Naive Bayes:


Naive Assumption: The assumption of feature independence might not hold true in many real-world datasets, leading to suboptimal performance.

Sensitivity to Feature Distribution: Naive Bayes can be sensitive to the distribution of features, especially if the data is skewed or has outliers.

Inability to Capture Complex Relationships: It cannot capture complex relationships between features, which may limit its performance on some datasets.

Zero Probability Issue: If a feature value in the test data is unseen in the training data, Naive Bayes assigns a zero probability to that class, leading to inaccurate predictions.

## When to use Naive Bayes:
Text Classification: Naive Bayes is widely used for text classification tasks, such as sentiment analysis, spam detection, and document categorization.

Simple and Fast Solutions: When you need a simple and fast solution for classification tasks, Naive Bayes can be a good choice.

Large Datasets: It works well with large datasets with many features, such as those encountered in natural language processing tasks.

Binary and Multiclass Classification: Naive Bayes can handle both binary and multiclass classification problems effectively.

# Code

In [6]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

# Load the dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
column_names = ['sepal_length_cm', 'sepal_width_cm', 'petal_length_cm', 'petal_width_cm', 'species']

df = pd.read_csv(url,names=column_names)



# Split the dataset into features and target variable
X = df.drop('species', axis=1)
y = df['species']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize Naive Bayes classifier
nb_classifier = GaussianNB()

# Train the classifier
nb_classifier.fit(X_train, y_train)

# Make predictions on the test data
y_pred = nb_classifier.predict(X_test)

# Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))


Accuracy: 1.0
Classification Report:
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30



# Theorey

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

## Experiment

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)