# Variants of Naive Bayes Algorithm


## Summary

* The Naive Bayes algorithm has three main variants: **Bernoulli**, **Multinomial**, and **Gaussian**.
* The choice of which variant to use depends entirely on the type of data and the distribution of the independent features in the dataset.
* **Bernoulli Naive Bayes** is ideal when the independent features follow a Bernoulli distribution, meaning the features are binary or have only two possible outcomes (e.g., yes/no, pass/fail, true/false).
* **Multinomial Naive Bayes** is specifically designed for text classification problems, such as spam detection, where text data must first be converted into numerical vectors using Natural Language Processing (NLP) techniques.
* 
**Gaussian Naive Bayes** is used when the independent features contain continuous numerical values that follow (or can be transformed to follow) a Gaussian or normal distribution (a bell curve). 



## Types of Naive Bayes Classifiers

### 1. Bernoulli Naive Bayes

**Bernoulli Naive Bayes** is used to solve classification problems when the independent features in a dataset follow a **Bernoulli distribution**. 

A Bernoulli distribution describes a scenario where there are only two possible outcomes. A classic real-world example is tossing a coin, which can only result in heads or tails.  In the context of a dataset, this means the features are binary. Examples of Bernoulli features include:

* Pass or Fail 


* Yes or No 


* Male or Female 


* 0 or 1 



If a dataset is overwhelmingly populated with features that have only two categories, Bernoulli Naive Bayes is the optimal choice for binary or multi-class classification. 

### 2. Multinomial Naive Bayes

**Multinomial Naive Bayes** is primarily used when the input data is in the form of **text**. A standard example of this is a **spam classification** model, where the algorithm evaluates the body of an email to predict if it is "Spam" or "Ham" (not spam). 

Because machine learning models cannot natively understand text sentences, the raw text data must first be converted into numerical values, or **vectors**. This conversion process utilizes **Natural Language Processing (NLP)** techniques.  Common NLP techniques for vectorizing text include:

* **Bag of Words (BoW)** 


* **TF-IDF** (Term Frequency-Inverse Document Frequency) 


* **Word2Vec** 



These techniques rely on formulas that evaluate text structure, such as calculating the total number of words or the frequency of unique words in a document. Multinomial Naive Bayes is designed to handle the numerical data generated by these specific NLP vectorization formulas. 

### 3. Gaussian Naive Bayes

**Gaussian Naive Bayes** is implemented when the independent features contain **continuous values** that follow a **Gaussian distribution**. 

A Gaussian distribution, commonly known as a **normal distribution** or **bell curve**, represents continuous numerical data. A classic example is the Iris dataset, where features like sepal length and petal width are continuous measurements.  Other common examples of continuous features include:

* Age 


* Height 


* Weight 



While ideally the features should naturally form a bell curve, Gaussian Naive Bayes can still be effective if the data is slightly left-skewed or right-skewed. Furthermore, if features follow a different distribution (like an exponential distribution), they can often be transformed into a normal distribution using mathematical formulas before applying the algorithm. 

If a dataset contains a mix of continuous features and multi-category features (non-binary), Gaussian Naive Bayes is generally the best approach.