## Introduction

Naive Bayes, or called Naive Bayes classifier, is a classifier based on Bayes Theorem with the naive assumption that features are independent of each other.

In machine learning, naive Bayes classifiers are simple, probabilistic classifiers that use Bayes’ Theorem. Naive Bayes has strong (naive), independence assumptions between features. In simple terms, a naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. For example, a ball may be considered a soccer ball if it is hard, round, and about seven inches in diameter. Even if these features depend on each other or upon the existence of the other features, naive Bayes believes that all of these properties independently contribute to the probability that this ball is a soccer ball. This is why it is known as naive.

Naive Bayes models are easy to build. They are also very useful for very large datasets. Although, naive Bayes models are simple, they are known to outperform even the most highly sophisticated classification models. Because they also require a relatively short training time, they make a good alternative for use in classification problems.

<p>
        <img src = "assets/1.png/">
</p>

Here,

- P(c|x): posterior probability of class(c,target) given predictor(x,attributes). This represents the probability of c being true, provided x is true.
- P(c): is the prior probability ofclass. This is the observed probability of class out of all the observations.
- P(x|c): is the likelihood which is the probability of predictor-given class. This represents the probability of x being true, provided x is true.
- P(x): is the prior probability ofpredictor. This is the observed probability of predictor out of all the observations.

<p>
        <img src = "assets/2.png/">
</p>

<p>
        <img src = "assets/3.png/">
</p>

<p>
        <img src = "assets/4.png/">
</p>

<p>
        <img src = "assets/5.png/">
</p>

<p>
        <img src = "assets/6.png/">
</p>

### Example:

Consider a well-shuffled deck of playing cards. A card is picked from that deck at random. The objective is to find the probability of a King card, given that the card picked is red in color.

Here,

     P(King | Red Card) = ?

We’ll use,

     P(King | Red Card) = P(Red Card | King) x P(King) / P(Red Card)

So,

     P (Red Card | King) = Probability of getting a Red card given that the card chosen is King = 2 Red Kings / 4 Total Kings = ½

     P (King) = Probability that the chosen card is a King = 4 Kings / 52 Total Cards = 1 / 13

     (Red Card) = Probability that the chosen card is red = 26 Red cards / 52 Total Cards = 1/ 2

Hence, finding the posterior probability of randomly choosing a King given a Red card is:

     P (King | Red Card) = (1 / 2) x (1 / 13) / (1 / 2) = 1 / 13 or 0.077

<p>
        <img src = "assets/7.png/">
</p>

<p>
        <img src = "assets/8.png/">
</p>

<p>
        <img src = "assets/9.png/">
</p>

### Why is Naive Bayes so Efficient?

There are two reasons that make naive Bayes a very efficient algorithm for classification problems.

- Performance: The naive Bayes algorithm gives useful performances despite having correlated variables in the dataset, even though it has a basic assumption of independence among features. The reason for this is that in a given dataset, two attributes may depend on each other, but the dependence may distribute evenly in each of the classes. In this case, the conditional independence assumption of naive Bayes is violated, but it is still the optimal classifier. Further, what eventually affects the classification is the combination of dependencies among all attributes. If we just look at two attributes, there may exist strong dependence between them that affects the classification. When the dependencies among all attributes work together, however, they may cancel each other out and no longer affect the classification. Therefore, we argue that it is the distribution of dependencies among all attributes over classes that affects the classification of naive Bayes, not merely the dependencies themselves.


- Speed: The main cause for the fast speed of naive Bayes training is that it converges toward its asymptotic accuracy at a different rate than other methods, like logistic regression, support vector machines, and so on. Naive Bayes parameter estimates converge toward their asymptotic values in order of log(n) examples, where n is number of dimensions. In contrast, logistic regression parameter estimates converge more slowly, requiring order n examples. It is also observed that in several datasets logistic regression outperforms naive Bayes when many training examples are available in abundance, but naive Bayes outperforms logistic regression when training data is scarce.

## Code sample 1: MultinomialNB, BernoulliNB, and GaussianNB

**The difference between them is the underlying distribution.**

**Multi-variate Bernoulli Naive Bayes**:  The binomial model is useful if your feature vectors are binary (i.e., 0s and 1s). One application would be text classification with a bag of words model where the 0s 1s are "word occurs in the document" and "word does not occur in the document"

**Multinomial Naive Bayes**: The multinomial naive Bayes model is typically used for discrete counts. E.g., if we have a text classification problem, we can take the idea of bernoulli trials one step further and instead of "word occurs in the document" we have "count how often word occurs in the document", you can think of it as "number of times outcome number x_i is observed over the n trials"

**Gaussian Naive Bayes**: Here, we assume that the features follow a normal distribution. Instead of discrete counts, we have continuous features (e.g., the popular Iris dataset where the features are sepal width, petal width, sepal length, petal length).

In [5]:
# We have 4 documents with 6 words(Chinese, Beijing, Shanghai, Macao, Tokyo, Japan) for the training set. 
# The target class is 0 and 1 respectively

X = np.array([
    [2,1,0,0,0,0],
    [2,0,1,0,0,0],
    [1,0,0,1,0,0],
    [1,0,0,0,1,1]
])
y = np.array([0,0,0,1])

In [6]:
## MultinomialNB

import numpy as np
np.set_printoptions(precision=6)

class MultinomailNB(object):
    def __init__(self, alpha=1.0):
        self.alpha = alpha
    
    def fit(self, X, y):
        # group by class
        seperated = []
        

## REFERENCES:

- [Sebastian raschka Naive Bayes and Text Classification
](http://sebastianraschka.com/Articles/2014_naive_bayes_1.html)
- [All NB implementation](http://kenzotakahashi.github.io/naive-bayes-from-scratch-in-python.html)
- [NB from scratch](https://chrisalbon.com/machine_learning/naive_bayes/naive_bayes_classifier_from_scratch/)
- [NB scratch numpy](https://geoffruddock.com/naive-bayes-from-scratch-with-numpy/)
- [NB scratch python](https://blog.goodaudience.com/building-the-na%C3%AFve-bayes-classifier-from-scratch-in-python-b0717fa022d8)
- [NB explained](https://appliedmachinelearning.blog/2017/05/23/understanding-naive-bayes-classifier-from-scratch-python-code/)
- [gaussian NB](https://hackernoon.com/implementation-of-gaussian-naive-bayes-in-python-from-scratch-c4ea64e3944d)
- [Maths behind NB](https://heartbeat.fritz.ai/understanding-the-mathematics-behind-naive-bayes-ab6ee85f50d0)
- [NB maths intel](https://software.intel.com/content/www/us/en/develop/articles/mathematical-concepts-and-principles-of-naive-bayes.html)
- [Understanding the mathematics behind Naive Bayes](https://shuzhanfan.github.io/2018/06/understanding-mathematics-behind-naive-bayes/)
- [NB scratch tds1](https://towardsdatascience.com/na%C3%AFve-bayes-from-scratch-using-python-only-no-fancy-frameworks-a1904b37222d)
- [NB scratchh](https://medium.com/machine-learning-algorithms-from-scratch/naive-bayes-classification-from-scratch-in-python-e3a48bf5f91a)
- [NB python tds2 *](https://towardsdatascience.com/how-to-impliment-a-gaussian-naive-bayes-classifier-in-python-from-scratch-11e0b80faf5a)
- [Unfolding Naïve Bayes from Scratch !](https://towardsdatascience.com/unfolding-na%C3%AFve-bayes-from-scratch-2e86dcae4b01)
- [Naive Bayes Classifier Explained tds3 *](https://towardsdatascience.com/naive-bayes-classifier-explained-54593abe6e18)
