# Naive Bayes

Naive Bayes is a family of probabilistic algorithms based on **Bayes' Theorem**, used for classification tasks in machine learning and statistics. It assumes independence among the features given the class label, which is why it is termed "naive." Despite this strong assumption of independence, Naive Bayes classifiers often perform surprisingly well in practice, especially for text classification tasks like spam detection and sentiment analysis.


### Bayes' Theorem

Bayes’ theorem describes the probability of occurrence of an event related to any condition.

If $X$ is a set of features and $C$ is a class in which our target might belong, then the probability that the target belongs to $C$ given the features $X = \{x_1, x_2, ..., x_n\}$,
$$
P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}
$$

Where:
- $P(C|X)$: Posterior probability of class \( C \) given feature set $X$.
- $P(X|C)$: Likelihood of feature set $X$ given class $C$.
- $P(C)$: Prior probability of class $C$
- $P(X)$: Marginal probability of feature set $X$.

### Naive Bayes Classifier
In the Naive Bayes classifier, the assumption of feature independence simplifies the computation of the likelihood:

$$
P(X|C) = P(x_1, x_2, \ldots, x_n | C) = P(x_1 | C) \cdot P(x_2 | C) \cdots P(x_n | C)
$$

Using this assumption, we can rewrite Bayes' Theorem for classification as:

$$
P(C|X) \propto P(C) \cdot P(X|C) = P(C) \cdot P(x_1 | C) \cdot P(x_2 | C) \cdots P(x_n | C)
$$

To classify a new observation, **we calculate $P(C|X)$ for each class and assign the class with the highest probability**.

### Types of Naive Bayes Classifiers

- **Gaussian Naive Bayes**: Assumes that the **features follow a Gaussian (normal) distribution**. This is commonly used when features are continuous.
  
- **Multinomial Naive Bayes**: Used primarily for text classification where **features are represented as the frequencies of words or terms**. This classifier is effective for multinomially distributed data.

- **Bernoulli Naive Bayes**: Similar to Multinomial Naive Bayes but **assumes binary/boolean features**. It is suitable for datasets where features indicate the presence or absence of a characteristic.

In [2]:
from sklearn.naive_bayes import GaussianNB

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn import datasets

In [10]:
dataset = datasets.load_iris()
X, y = dataset.data, dataset.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

In [11]:
classifier = GaussianNB()

In [12]:
classifier.fit(X_train, y_train)

In [13]:
y_predict = classifier.predict(X_test)

accuracy_score(y_test, y_pred=y_predict)

0.9777777777777777

In [14]:
res = classifier.predict([[5, 2, 1, 4]])
print(dataset.target_names[res])

['virginica']


| **Advantages of Naive Bayes** | **Limitations of Naive Bayes** |
|-------------------------------|-------------------------------|
| **Simplicity**: The model is easy to implement and understand. | **Independence Assumption**: The assumption that features are independent is often unrealistic, which can lead to poor performance in some cases. |
| **Efficiency**: Naive Bayes classifiers are computationally efficient and can handle large datasets effectively. | **Zero Probability Problem**: If a category in the training data has a feature value that doesn't occur, the model assigns a probability of zero to that category. This can be addressed with techniques like Laplace smoothing. |
| **Works Well with High-Dimensional Data**: Especially useful for text classification tasks with high-dimensional data. | |
| **Handles Missing Data**: Can perform well even when some feature values are missing. | |
