## naive bayes

naive bayes is used for binary classification, for example to predict if a message is spam or not depending on the words included in the message.
For each word W we can use the bayes theorem to compute the probabilty of the message beeing spam S.

\begin{equation}
P(S|W) = \frac{P(W|S) \cdot P(S)}{P(W)}
\end{equation}

with the assumption, that the words inside a message $w_1,...,w_n$ occur independent on each other:

\begin{equation}
P(W=w_1,...,w_n|S) = P(W=w_1|S) \cdot ... \cdot P(W=w_n|S)
\end{equation}

We can compute the Probability of a message being spam dependent on n words inside the message.


\begin{equation}
P(S|W) = \frac{P(W=w_1,...,w_n|S) \cdot P(S)}{P(W=w_1,...,w_n)} \\
= \frac{P(W=w_1|S) \cdot ... \cdot P(W=w_n|S) \cdot P(S)}{P(W=w_1) \cdot ... \cdot P(W=w_n)}
\end{equation}

-----

### note:

Usually the product of the probabilities $P(W=w_1|S) \cdot ... \cdot P(W=w_n|S)$ is computed using logarithms $e^{\log{P(W=w_1|S)} + ... + \log{P(W=w_n|S)}}$ to avoid underflow (small numbers).

To avoid a probability of zero, because a word $w_i$ previously never occured in a spam message the probabilities are slightly modified. for example

\begin{equation}
P(w_i|S) = \frac{n\_w_i + n\_of\_spams\_with\_w_i}{2 \cdot n\_w_i + n\_of\_spams}
\end{equation}

Also the denominator is often discarded. Then the probability for a positive (spam) and negative (no spam) has to be computed and compared.

---

## continous distributions (gaussian)

If features $w_i$ are contiously distributed, like the time when the message was sent or it's size, we need to modify our approach. We split the datapoints for each feature into two groups (datapoints for positive and negative examples (messages with and without spam)). Then we assume a normal (gaussian) distribution of the datapoints of each group. We compute the mean and standard deviation for both groups. 

\begin{equation}
\mu = \sum_{i=1}^{N} \frac{x_i}{N} \\
\sigma^2 = \sum_{i=1}^{N} \frac{(x-\mu)^2}{N}
\end{equation}

Now for a new datapoint we can compute the probability of it belonging to the positive or negative group (messages with and without spam) with the formula for the normal distribution

\begin{equation}
P(x) = \frac{1}{\sigma\sqrt{2 \pi}} e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2}
\end{equation}


In [1]:
#bayesian (gaussian) inference for iris dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
gnb = GaussianNB()
y_pred = gnb.fit(X_train, y_train).predict(X_test)
print("Number of mislabeled points out of a total %d points : %d"
      % (X_test.shape[0], (y_test != y_pred).sum()))

Number of mislabeled points out of a total 75 points : 4
