# 4.2.1.3 Naive Bayes Classifier

## Introduction

- [Naive Bayes Classification Algorithm](https://medium.com/aws-tip/naive-bayes-classification-algorithm-b421438c50c6)
  
The Naive Bayes Classifier is a probabilistic machine learning model based on Bayes' theorem. It assumes that the presence of a particular feature in a class is independent of other features. Despite this "naive" assumption, Naive Bayes classifiers have been found to perform well in practice, especially in text classification and spam filtering tasks. Key points include:

- **Bayes' Theorem**: Naive Bayes classifiers calculate probabilities of classes based on prior probabilities and conditional probabilities of features given the class.
- **Independence Assumption**: Features are assumed to be conditionally independent given the class label, which simplifies the computation.
- **Types**: Common types include Gaussian Naive Bayes (for continuous features assuming a Gaussian distribution), Multinomial Naive Bayes (for discrete features, commonly used in text classification), and Bernoulli Naive Bayes (for binary features).
- **Scalability**: Naive Bayes classifiers are computationally efficient and can scale well with large datasets and high-dimensional feature spaces.

## Applications

Naive Bayes classifiers are particularly suitable for:
- **Text Classification**: Classifying documents or emails into categories based on word frequencies.
- **Spam Detection**: Identifying spam emails based on word occurrences and patterns.
- **Sentiment Analysis**: Analyzing sentiment in text data, such as social media posts or customer reviews.
- **Medical Diagnosis**: Classifying medical records into disease categories based on symptoms and patient data.



In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

In [2]:
# Load the dataset
iris = load_iris()
X = iris.data
y = iris.target

In [3]:
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [4]:
# Initialize the Naive Bayes classifier (Gaussian)
clf = GaussianNB()

# Train the classifier on the training data
clf.fit(X_train, y_train)

# Make predictions on the test data
y_pred = clf.predict(X_test)

In [5]:
# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

# Generate and print the classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

Accuracy: 1.0

Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



## Conclusion

Naive Bayes classifiers are simple yet powerful probabilistic models that assume feature independence given the class label. Key points to summarize:

- **Efficiency**: Naive Bayes classifiers are computationally efficient and require a small amount of training data to estimate the necessary parameters.
- **Performance**: Despite the "naive" assumption of feature independence, Naive Bayes classifiers often perform well in practice, especially in text and categorical data domains.
- **Types**: Different types of Naive Bayes classifiers (e.g., Gaussian, Multinomial, Bernoulli) are suited to different types of data distributions.
- **Applications**: Widely used in various applications including text classification, spam filtering, sentiment analysis, and medical diagnosis.

In conclusion, Naive Bayes classifiers are a versatile and effective choice for classification tasks, particularly in scenarios with categorical or text data where feature independence assumptions hold reasonably well.
