# Naive Bayes Classifiers

Naive Bayes classifiers are a family of classifiers that are quite similar to linear classification models such as Linear SVM Classifier and LogisticRegression. However, they tend to be even faster in training. The price paid for this efficiency is that naive Bayes models often provide generalization perfor‐ mance that is slightly worse than linear classifiers like LogisticRegression and LinearSVC.

The reason that naive Bayes models are so efficient is that they learn parameters by looking at each feature individually, and collect simple per-class statistics from each feature

There are three kinds of naive Bayes classifiers implemented in scikitlearn:
* GaussianNB (for continuous data)
* BernoulliNB (for binary data)
* MultinomialNB (for count data - i.e. each feature represents an integer count of something, like how often a word appears in a sentence) 

BernoulliNB and MultinomialNB are mostly used in text data classification, and we will revisit them in Chapter 7 (Text Data).

#### BernoulliNB
The BernoulliNB classifier counts how often every feature of each class is not zero. This is most easily understood with an example:
```
X = np.array([[0, 1, 0, 1],
              [1, 0, 1, 1],
              [0, 0, 0, 1],
              [1, 0, 1, 0]])
y = np.array([0, 1, 0, 1])
```
The other two naive Bayes models, MultinomialNB and GaussianNB are slightly dif‐ ferent in what kind of statistics they compute:

#### MultinomialNB
MultinomialNB takes into account the average value of each feature for each class

#### GaussianNB
GaussianNB stores the average value as well as the standard deviation of each feature for each class.


#### Predictions
**To make a prediction, a data point is compared to the statistics for each of the classes, and the best matching class is predicted.**

### Parameters

The **`MultinomialNB`** and **`BernoulliNB`** have a **single parameter alpha**, which controls model complexity. The way alpha works is that the algorithm adds alpha many virtual data points to the data, that have positive values for all the features. This results in a “smoothing” of the statistics. A large alpha means more smoothing, resulting in less complex models. The algorithms performance is relatively robust to the setting of alpha, meaning that setting alpha is not critical for good performance However, tuning it usually improves accuracy somewhat.

The **`GaussianNB`** model seems to be rarely used by practitioners, while the other two variants of naive Bayes are widely used for sparse count data such as text. `MultinomialNB` usually performs better than BinaryNB, in particular on datasets with a relatively large number of non-zero features (i.e. large documents).


### Strengths and weaknesses
The naive Bayes models share many of the strengths and weaknesses of the linear models:
* They are very fast to train and to predict
* The training procedure is easy to understand. 
* The models work very well with high-dimensional sparse data, and are relatively robust to the parameters
* Naive Bayes models are great baseline models, and are often used on very large datasets, where training even a linear model might take too long.
