> supervised learning algorithms

> Bayes’ theorem :

>$$P(y|x_{1}, \dots , x_{n}) = \frac{P(y)P(x_{1}, \dots, x_{n}|y)}{p(x_{1}, \dots,x_{n})}$$

> "naive" independence assumption between every pair of features

>$$P(x_{i}|y, x_{1}, \dots , x_{i-1}, x_{i+1}, \dots, x_{n}) = P(x_{i}|y)$$


## Gaussian Naive Bayes ##
The likelihood of the features is assumed to be Gaussian

$$P(x_{i}|y) = \frac{1}{\sqrt{2\pi\sigma_{y}^{2}}} exp(-\frac{(x_{i}-\mu_{y})^{2}}{2\sigma_{y}^{2}})$$

> $\sigma_{y}$ and $\mu_{y}$ are estimated using maximum likelihood 

In [1]:
from sklearn import datasets
iris = datasets.load_iris()
from sklearn.naive_bayes import GaussianNB
gnb = GaussianNB()
y_pred = gnb.fit(iris.data, iris.target).predict(iris.data)
print("Number of mislabeled points out of a total %d points : %d"
      % (iris.data.shape[0],(iris.target != y_pred).sum()))

Number of mislabeled points out of a total 150 points : 6


## Multinomial Naive Bayes ##
On multinomially distributed data

$n$ = number of features 

$y$ = class

$$\theta_{y} = (\theta_{y1}, \dots , \theta_{yn})$$

$$P(x_{i}|y) = \theta_{y}$$


** Relative frequency counting (*smooth version of max likelyhood*) **
$$\hat{\theta}_{yi} = \frac{N_{yi} + \alpha}{N_{y}+\alpha n}$$

where:

$N_{yi} = \sum_{x\in T}x_{i}$

$N_{y} = \sum_{i=1}^{|T|}$

$T$ is training set

$i$ is feature number 

$N_{yi}$ times feature $i$ appears in a sample of class $y$ in training set $T$ 

$N_{y}$ is total count of features for class $y$

$\alpha \ge 0$ features not present in learning samples

$\alpha = 1$ Laplace Smoothing

$\alpha \lt 0$ Lidstone smoothing

# Bernoulli Naive Bayes #
on multivariante bernouli distributions

$$P(x_{i}|y) = P(i|y)x_{i} + (1-P(i|y))(1-x_{i})$$

```python
from sklearn.naive_bayes import GaussianNB
```
> * class_prior_
> * class_count_
> * theta_
> * sigma_

| Parameters | type | value | dafault |
|------------|------|-------|---------|
| priors | array | shape | none |

| Methods | Parameters | return |
|---------|------------|--------|
| \_\_init\_\_ | priors = None | |
| fit | X, y, sample_weight=None | self |
| get_params | deep=True | params |
| partial_fit | X, y, classes=None, sample_weight=None | self |
| predict | X | C |
| predict_log_proba | X | C |
| predict_proba | X | C |
| score | X, y, sample_weight=None | score |
| set_params | \*\*params | self |

```python
from sklearn.naive_bayes import MultinomialNB
```
> * class_log_prior_
> * intercept_
> * feature_log_prob_
> * coef_
> * class_count_
> * feature_count_

| Parameters | type | value | dafault |
|------------|------|-------|---------|
| alpha | float | optional | 1.0 |
| fit_prior | boolean | optional | True |
| class_prior | array-like, size=[n_classes,] | optional | None |


| Methods | Parameters | return |
|---------|------------|--------|
| \_\_init\_\_ | alpha=1.0, fit_prior=True, class_prior=None | |
| fit | X, y, sample_weight=None | self |
| get_params | deep=True | params |
| partial_fit | X, y, classes=None, sample_weight=None | self |
| predict | X | C |
| predict_log_proba | X | C |
| predict_proba | X | C |
| score | X, y, sample_weight=None | score |
| set_params | \*\*params | self |


```python
from sklearn.naive_bayes import BernoulliNB
```
> * class_log_prior_
> * feature_log_prob_
> * class_count_
> * feature_count_

| Parameters | type | value | dafault |
|------------|------|-------|---------|
| alpha | float | optional | 1.0 |
| binarize | float | optional | 0.0 |
| fit_prior | boolean | optional | True |
| class_prior | array-like, size=[n_classes,] | optional | None |


| Methods | Parameters | return |
|---------|------------|--------|
| \_\_init\_\_ | alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None | |
| fit | X, y, sample_weight=None | self |
| get_params | deep=True | params |
| partial_fit | X, y, classes=None, sample_weight=None | self |
| predict | X | C |
| predict_log_proba | X | C |
| predict_proba | X | C |
| score | X, y, sample_weight=None | score |
| set_params | \*\*params | self |