# Basics

Object: Given an instance $\mathbf{x} = (x_1,\ldots,x_n) \in\mathbb{R}^n$, find its class label or estimate the probability $p(C_k|\mathbf{x})$ for $k=1,\ldots,K$.



\begin{align*}
\text{posterior probability} & = \frac{\text{prior probability}\times\text{likelihood}}{\text{evidence}}\quad\text{(Bayes' theorem)}\\
p(C_{k}|\mathbf{x}) & =\frac{p(C_{k})p(\mathbf{x}|C_{k})}{p(\mathbf{x})}\\
 & \propto p(C_{k})p(\mathbf{x}|C_{k})\\
 & = p(C_k) p(x_1|C_k) \cdots p(x_n|C_K)\quad\text{(Naive)}
\end{align*}


## MAP (Maximum A Posteriori) 

The class label $k$ of $\mathbf{x}$ is $\text{argmax}_k p(C_k) p(x_1|C_k) \cdots p(x_n|C_K)$.


## Gaussian NB

* Continuous data
* $p(x|C_k) = N(\mu_k, \sigma_k^2)$

```python
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB(priors, ...)
```

## Multinomial NB

* Discrete data
* Ex) $x_i$ is the frequency of the $i$-th word in a vocabulary
* $p(\mathbf{x}|C_k) = \displaystyle{\frac{(x_1+\cdots+x_n)!}{x_1!\cdots x_n!}p_1^{x_1}\cdots p_n^{x_n}}$ 

```python
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB(alpha, fit_prior, class_prior, ...)
```

## Bernoulli NB

* Discrete data
* Ex) $x_i = $ 0 or 1, depending on the occurrence or absense of the $i$-th word in a vocabulary
* $p(\mathbf{x}|C_k) = \displaystyle{\prod_{i=1}^n p_i^{x_i} (1-p_i)^{1-x_i}}$ 


```python
from sklearn.naive_bayes import BernoulliNB
clf = BernoulliNB(alpha, fit_prior, class_prior, ...)
```

## Complement NB

The Complement Naive Bayes classifier was designed to correct the “severe assumptions” made by the standard Multinomial Naive Bayes classifier. It is particularly suited for imbalanced data sets. (scikit-learn.org)

```python
from sklearn.naive_bayes import ComplementNB
clf = ComplementNB(alpha, fit_prior, class_prior, ...)
```


__partial_fit()__ : Naive Bayes classifiers can use the method partial_fit(). This method is expected to be called several times consecutively on different chunks of a dataset so as to implement out-of-core or online learning. This is especially useful when the whole dataset is too big to fit in memory at once. This method has some performance overhead hence it is better to call partial_fit on chunks of data that are as large as possible (as long as fitting in the memory budget) to hide the overhead. (Ref: scikit-learn.org)