# Naive Bayes

These models are extremely fast and effective with large datasets with high dimensionality. Naive Bayes models have made a name for themselves because they are often quick and dirty. Usually there are few tunable parameters and so they make for a great baseline starter classification algorithm to throw at a dataset. 

### How does it work

Naive Bayes takes advantage of the Bayesian Theorem, it describes the relationship of conditional probabilities of statistical quantities. In short it means that it works to find the probability of a label, given a set of input features. This can be written as $ P\left( L\ |\ features \right) $ -> *Probability of L (label) given features*. This will give the probability chance that $L$ is the class label for the input with $features$. 

In terms of finding quantities the Bayesian Theorem espresses this as 

$$ P\left(L\ |\ features\right) = 
\frac{P\left(L\ |\ features\right)P\left(L\right)}{P\left(features\right)}
$$



When trying to determine between two different class labels ($L_1$ and $L_2$) we make a ratio of the probabilities of each label, given the features. 

$$ \frac{P(L_1 | features)}{P(L_2 | features)} = \frac{P(features | L_1)}{P(features | L_2} \frac{P(L_1)}{P(L_2)}$$

To put these probability computations to work we need a model to compute the probabilities of membership for each label on a set of features, $P(features | L_i)$. Such a model is called a generative model because it describes the hypothetical generative process for which the data came.

This itself is a very difficult task however this is where the *naive* in naive Bayes comes from. If we make very naive assumptions about the generative process for each label then those predictions can come together to make a (very loose) approximation for each class. Now the Bayesian Classification can continue!

One of the simpler classifiers is the Gaussian naive Bayes. This assumes that the data from each is sourced from a simple Gaussian distribution (normal bell curve distrubution). It will find the mean and standard deviation of each label, which is shown with the rings surrounding both groupings.

<img src="./assets/gaussian-NB.png" alt="guassian naive bayes grouping" style="width: 75%;"/>

### In Action

In [20]:
from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

iris = datasets.load_iris()
print(iris.data[:5])
print(iris.target[:5])

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]
[0 0 0 0 0]


In [21]:
clf = GaussianNB()
clf.fit(iris.data, iris.target)
y_pred =clf.predict(iris.data)
print("Number of mislabeled points out of a total", 
      iris.data.shape[0], 
      " points :", 
      (iris.target != y_pred).sum())

Number of mislabeled points out of a total 150  points : 6


### When to Use

Naive Bayes classification algorithms strong assumptions about the data. They will work tremendously well when the data happens to fit those assumptions, other times they will not perform with the likes of a more complicated model.

Useful for
- High dimensional data
- Speed for training, prediction, development (no params)
- Baseline performance marks