# Part 2.14: Supervised Learning - Naive Bayes Classifier

Naive Bayes classifiers are a family of simple probabilistic classifiers based on applying **Bayes' Theorem** with a 'naive' assumption: that the features are conditionally independent of each other, given the class.

### Bayes' Theorem
It states: `P(A|B) = [P(B|A) * P(A)] / P(B)`

In our context, this translates to:
`P(class|features) = [P(features|class) * P(class)] / P(features)`

The 'naive' assumption simplifies `P(features|class)` by assuming all features are independent, allowing us to multiply their individual probabilities.

In [1]:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load the Iris dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### Types of Naive Bayes Classifiers
- **GaussianNB**: Assumes that features follow a Gaussian (normal) distribution. Used for continuous data.
- **MultinomialNB**: Used for discrete counts. Commonly used in text classification (e.g., counting word occurrences).
- **BernoulliNB**: Used for binary/boolean features.

### Training a Gaussian Naive Bayes Model

In [2]:
# We use GaussianNB because the Iris dataset features are continuous
gnb = GaussianNB()
gnb.fit(X_train, y_train)

print(f"Accuracy on test set: {gnb.score(X_test, y_test):.4f}")

Accuracy on test set: 0.9778


### When to Use Naive Bayes
Despite its simplicity and the 'naive' assumption (which is often violated in reality), Naive Bayes can perform surprisingly well.

- **Pros**: Very fast, requires little training data, works well with high-dimensional data (e.g., text), and provides probabilistic predictions.
- **Cons**: The independence assumption is a strong one, so it may not be the best choice for complex problems where feature interactions are important.