# Naive Bayes Classification
Naive Bayes classifier uses the Bayes rule to compute conditional probability of the class label $y$ given an input example $\mathbf{x}=[x_0, \ldots, x_{d-1}]^T\in\mathbb{R}^d$ as follows:
$$\mathrm{P}(y|\mathbf{x})=\frac{\mathrm{P}(\mathbf{x}|y)\mathrm{P}(y)}{\mathrm{P}(\mathbf{x})}$$
Since, $\mathrm{P}(\mathbf{x})$ is constant for all $y$, we can write
$$\mathrm{P}(y|\mathbf{x})∝\mathrm{P}(\mathbf{x}|y)\mathrm{P}(y)$$
Naive Bayes further takes a "naive" assumption that every pair of features $x_i$ and $x_j$, $(i\ne j)$, of $\mathbf{x}$ are conditionally independent given the class label $y$. Thus, using the conditional indpendent assumption, the naive Bayes reduced to
$$\mathrm{P}(y|\mathbf{x})∝\mathrm{P}(\mathbf{x}|y)\mathrm{P}(y) = \mathrm{P}(y)\prod_{i=0}^{d-1}\mathrm{P}(x_i|y)$$
The class label prediction is done as
$$\hat{y}=\underset{y}{\text{argmax}}\quad \mathrm{P}(y|\mathbf{x})= \underset{y}{\text{argmax}}\quad \mathrm{P}(y)\prod_{i=0}^{d-1}\mathrm{P}(x_i|y)$$

# Gaussian Naive Bayes
In Gaussian Naive Bayes, the conditional probability of each feature $x_i$ given the class label $y$ is modeled using an univariate Gaussian distribution with mean $\theta_i$ and standard deviation $\sigma_i$.
$$\mathrm{P}(x_i|y) = \frac{1}{\sqrt{2\pi\sigma_{iy}^2}}\text{exp}\left(-\frac{(x_i-\theta_{iy})^2}{\sigma_{iy}^2}\right)$$
The parameters $\theta_{iy}$ and $\sigma_{iy}$, $i=0, \ldots, d-1$ and $y=0, \ldots, C-1$ are estimated using maximum likelihood estimate.

Scikit-learn provides the class <em>sklearn.naive_bayes.GaussianNB</em> for Gaussian Naive Bayes. The complete documentation of the same can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB).

## Example


### Importing the neccessary modules

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
import numpy as np

### Preparing the dataset

In [2]:
X_train = np.array([[-1, -1],
                    [-2, -1],
                    [-3, -2],
                    [1, 1],
                    [2, 1],
                    [3, 2]])
y_train = np.array([1, 1, 1, 2, 2, 2])

### Training the Classifier

In [3]:
clf = GaussianNB()
clf.fit(X_train, y_train)

### Prediction

In [4]:
print(clf.predict([[-0.8, -1]]))

[1]


In [5]:
print(clf.predict_proba([[-0.8, -1]]))

[[9.99999949e-01 5.05653254e-08]]


In [6]:
print(clf.predict_log_proba([[-0.8, -1]]))

[[-5.05653266e-08 -1.67999998e+01]]


### Inspecting the parameters

In [7]:
clf.class_prior_

array([0.5, 0.5])

In [8]:
clf.theta_

array([[-2.        , -1.33333333],
       [ 2.        ,  1.33333333]])

In [9]:
clf.var_

array([[0.66666667, 0.22222223],
       [0.66666667, 0.22222223]])

# Bernoulli Naive Bayes
Consider the following classification task:

|                Inputs                   |  Class  |
| :-------------------------------------  | :------:|
|'This is the first document.'            |1        |
|'This document is the second document.'  |1        |
|'And this one is the third one.'         |2        |
|'Is this the first document?'            |2        |

Binary Bag-of-Words Features:

|'and'|'document'|'first'|'is'|'one'|'second'|'the'|'third'|'this'|
|:---:|:--------:|:-----:|:--:|:---:|:------:|:---:|:-----:|:----:|
|  0  |    1     |   1   | 1  |  0  |   0    |  1  |   0   |   1  |
|  0  |    1     |   0   | 1  |  0  |   1    |  1  |   0   |   1  |
|  1  |    0     |   0   | 1  |  1  |   0    |  1  |   1   |   1  |
|  0  |    1     |   1   | 1  |  0  |   0    |  1  |   0   |   1  |

Bernoulli Naive Bayes is used when the features are the binary variables. In Bernoulli Naive Bayes, the conditional probability of each feature $x_i$ given the class label $y$ is modelled as:
$$\mathrm{P}(x_i|y)=\theta_{iy}^{x_i}(1-\theta_{iy})^{(1-{x_i})}$$
where the parameters $\theta_{iy}^{x_i}$ is estimated as a smooth version of the likelihood,
$$\theta_{iy}=\frac{N_{yi}+\alpha}{N_y+2\alpha}$$
for $i=0, \ldots, d-1$ where $N_{yi}$ is the number of samples for which the class label is $y$ and the feature $x_i=1$, $N_y$ is the total number of samples with class label $y$, and $\alpha\ge 0$ is the smoothing parameter.

Scikit-learns provides the class <em>sklearn.naive_bayes.BernoulliNB</em> for performing Bernoulli Naive Bayes. The complete documentation of the same can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.BernoulliNB.html#sklearn.naive_bayes.BernoulliNB).



## Example

### Importing the neccessary modules

In [10]:
import numpy as np
from sklearn.naive_bayes import BernoulliNB
from sklearn.feature_extraction.text import CountVectorizer

### Preparing the dataset

In [11]:
corpus = [
    'This is the first document.',
    'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = CountVectorizer(binary=True)
X = vectorizer.fit_transform(corpus)
y = np.array([1, 1, 2, 2])
print(vectorizer.get_feature_names_out())
print(X.toarray())

['and' 'document' 'first' 'is' 'one' 'second' 'the' 'third' 'this']
[[0 1 1 1 0 0 1 0 1]
 [0 1 0 1 0 1 1 0 1]
 [1 0 0 1 1 0 1 1 1]
 [0 1 1 1 0 0 1 0 1]]


### Training the model

In [12]:
clf = BernoulliNB(alpha=0)
clf.fit(X, y)



### Inspecting the parameters

In [13]:
clf.class_log_prior_

array([-0.69314718, -0.69314718])

In [14]:
clf.feature_log_prob_

array([[-2.37189981e+01, -5.00000041e-11, -6.93147181e-01,
        -5.00000041e-11, -2.37189981e+01, -6.93147181e-01,
        -5.00000041e-11, -2.37189981e+01, -5.00000041e-11],
       [-6.93147181e-01, -6.93147181e-01, -6.93147181e-01,
        -5.00000041e-11, -6.93147181e-01, -2.37189981e+01,
        -5.00000041e-11, -6.93147181e-01, -5.00000041e-11]])

# Multinomial Naive Bayes
Consider the following text classification task:

|                Inputs                   |  Class  |
| :-------------------------------------  | :------:|
|'This is the first document.'            |1        |
|'This document is the second document.'  |1        |
|'And this one is the third one.'         |2        |
|'Is this the first document?'            |2        |

Bag-of-Words Features:

|'and'|'document'|'first'|'is'|'one'|'second'|'the'|'third'|'this'|
|:---:|:--------:|:-----:|:--:|:---:|:------:|:---:|:-----:|:----:|
|  0  |    1     |   1   | 1  |  0  |   0    |  1  |   0   |   1  |
|  0  |    2     |   0   | 1  |  0  |   1    |  1  |   0   |   1  |
|  1  |    0     |   0   | 1  |  2  |   0    |  1  |   1   |   1  |
|  0  |    1     |   1   | 1  |  0  |   0    |  1  |   0   |   1  |

Multinomial Naive Bayes is used when the features represent frequencies of some events. In multinomial Naive Bayes, the conditional probability of each sample $\mathbf{x}=[x_0, \ldots, x_{d-1}]^T\in\mathbb{R}^d$ given the class label $y$ is modelled as a multinomial distribution with some parameters $(\theta_{0y}, \theta_{1y}, \ldots, \theta_{(d-1)y})$. Mathematically,
$$\mathrm{P}(\mathbf{x}|y)=\frac{\left(\sum_{i=0}^{d-1}x_i\right)!}{\prod_{i=0}^{d-1}x_i!}\prod_{i=0}^{d-1}\theta_{iy}^{x_i}∝ \prod_{i=0}^{d-1}\theta_{iy}^{x_i}$$
where the paraneter $\theta_{iy}$s are estimated as a smooth version of the likelihood,
$$\theta_{iy}=\frac{N_{yi}+\alpha}{N_y+\alpha d}$$
where $N_{yi}$ is the number of times feature $i$ appears in a sample of class $y$, $N_y=\sum_{i=0}^{d-1} N_{yi}$, and $\alpha\ge 0$ is the smoothing parameter.

Scikit-learn provides the class <em>sklearn.naive_bayes.MultinomialNB</em> for Multinomial Naive Bayes. The complete documentation of the same can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html).

## Example

### Importing the neccessary modules

In [16]:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

### Preparing the data

In [17]:
corpus = [
    'This is the first document.',
    'This document is the second document.',
    'And this one is the third one.',
    'Is this the first document?',
]
vectorizer = CountVectorizer(binary=False)
X = vectorizer.fit_transform(corpus)
y = np.array([1, 1, 2, 2])
print(vectorizer.get_feature_names_out())
print(X.toarray())

['and' 'document' 'first' 'is' 'one' 'second' 'the' 'third' 'this']
[[0 1 1 1 0 0 1 0 1]
 [0 2 0 1 0 1 1 0 1]
 [1 0 0 1 2 0 1 1 1]
 [0 1 1 1 0 0 1 0 1]]


### Training the classifier

In [18]:
clf = MultinomialNB(alpha=0)
clf.fit(X, y)



### Inspecting the parameters

In [19]:
clf.class_log_prior_

array([-0.69314718, -0.69314718])

In [20]:
clf.feature_log_prob_

array([[-25.4237462 ,  -1.29928298,  -2.39789527,  -1.70474809,
        -25.4237462 ,  -2.39789527,  -1.70474809, -25.4237462 ,
         -1.70474809],
       [ -2.48490665,  -2.48490665,  -2.48490665,  -1.79175947,
         -1.79175947, -25.51075758,  -1.79175947,  -2.48490665,
         -1.79175947]])

# Categorical Naive Bayes
The example is taken from [here](https://medium.com/analytics-vidhya/use-naive-bayes-algorithm-for-categorical-and-numerical-data-classification-935d90ab273f).

Consider the following dataset:

| City   | Gender | Income           | Illness |
| :---   | :----  | :-----           | :-----  |
| Dallas | Male   | < 5000           | No      |
| NYC    | Male   | [5000 - 10000]   | No      |
| NYC    | Female | > 10000          | No      |
| NYC    | Female | > 10000          | No      |
| Dallas | Male   | [5000 - 10000]   | No      |
| Dallas | Female | < 5000           | Yes     |
| Dallas | Male   | < 5000           | Yes     |
| NYC    | Male   | [5000 - 10000]   | Yes     |

Categorical Naive Bayes assumes a categorical distribution for the $i$-th feature $x_i$ conditioned on the class label $y$. The probability of category  $t$ in feature $x_i$ given class $c$ is estimated as:
$$\mathrm{P}(x_i=t|y=c) = \frac{N_{tic}+\alpha}{N_c+\alpha T_i}$$
where $N_{tic}$ is the number of time category $t$ appears in feature $x_i$ in samples with class label $y=c$, $N_c$ is the number of samples with class label $y=c$, $T_i$ is the number of available categories with feature $i$, and $\alpha$ is the smoothing parameter.

Scikit-learn provides the class <em>sklearn.naive_bayes.CategoricalNB</em> for Categorical Naive Bayes. The complete documentation of the same can be found [here](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.CategoricalNB.html#sklearn.naive_bayes.CategoricalNB).

## Example

### Importing the neccessary modules

In [21]:
from sklearn.naive_bayes import CategoricalNB

### Preparing the data

In [22]:
rng = np.random.RandomState(1)
X = rng.randint(5, size=(6, 100))
y = np.array([1, 2, 3, 4, 5, 6])

### Training the classifier

In [23]:
clf = CategoricalNB(alpha=0)
clf.fit(X, y)



### Prediction

In [24]:
print(clf.predict(X[2:3]))

[3]


### Inspecting the parameters

In [25]:
clf.class_log_prior_

array([-1.79175947, -1.79175947, -1.79175947, -1.79175947, -1.79175947,
       -1.79175947])

In [26]:
clf.feature_log_prob_

[array([[-2.30258509e+01, -2.30258509e+01, -2.30258509e+01,
         -3.00000025e-10],
        [-2.30258509e+01, -3.00000025e-10, -2.30258509e+01,
         -2.30258509e+01],
        [-2.30258509e+01, -2.30258509e+01, -3.00000025e-10,
         -2.30258509e+01],
        [-2.30258509e+01, -2.30258509e+01, -3.00000025e-10,
         -2.30258509e+01],
        [-3.00000025e-10, -2.30258509e+01, -2.30258509e+01,
         -2.30258509e+01],
        [-2.30258509e+01, -2.30258509e+01, -2.30258509e+01,
         -3.00000025e-10]]),
 array([[-2.30258509e+01, -2.30258509e+01, -2.30258509e+01,
         -2.30258509e+01, -4.00000033e-10],
        [-4.00000033e-10, -2.30258509e+01, -2.30258509e+01,
         -2.30258509e+01, -2.30258509e+01],
        [-2.30258509e+01, -2.30258509e+01, -2.30258509e+01,
         -2.30258509e+01, -4.00000033e-10],
        [-2.30258509e+01, -2.30258509e+01, -4.00000033e-10,
         -2.30258509e+01, -2.30258509e+01],
        [-2.30258509e+01, -2.30258509e+01, -2.30258509e+01,
