# Naive Bayes

## Bayes Theorem

Bayes theorem is one of the earliest inference algorithm. Bayes Theorem calculates the probability of a certain event happening (e.g. a message being spam) based on the joint probabilistic distributions of certain other events (e.g. the appearance of certain words in a message).

\begin{equation*}
P(A|B) = \frac{P(A)P(B|A)}{P(B)}
\end{equation*}

Example: Brenda and Alex are employees at the office. We know that:
 * Alex works mostly in the office, 3 times a week
 * Brenda mainly travels, so she comes to the office once a week
NOTE: a week is made of 5 working days

We observe someone running through the office wearing a red sweather, but we're unable to recognise who she/he is. So, we make a guess. Based on the above data, we calculate the single probabilities of the single events:

\begin{equation*}
P(A) = \frac{3}{5}=0.60 \;\;\; P(B) = \frac{1}{5}=0.20
\end{equation*}

We have now to normalize these probability, since we do now that someone was wearing a red sweather. Therefore:

\begin{equation*}
P(A) = \frac{P(A)}{P(A)+P(B)} \;\;\; P(B) = \frac{P(B)}{P(A)+P(B)}\\\\
P(A) = \frac{\frac{3}{5}}{\frac{3}{5}+\frac{1}{5}}=0.75 \;\;\; P(B) = \frac{\frac{1}{5}}{\frac{3}{5}+\frac{1}{5}}=0.25
\end{equation*}

Now, we introduce more knowledge:
 * Alex wears a red sweather 2 days a week
 * Brenda wears a sweather 3 days a week


<img src=".\images\4.01_bayes-example.png" style="width: 600px;"/>

Therefore, the probability of having seen Alex wearing a red sweather is:

\begin{equation*}
P(A|R) = \frac{P(A)P(R|A)}{P(R)} \\
P(A|R) = \frac{P(A)P(R|A)}{P(A)P(R|A)+P( \neg A)P(R|\neg A)}\\
P(A|R) = \frac{0.75 \cdot 0.40}{0.75 \cdot 0.40 + 0.25 \cdot 0.60} = 66.7\%
\end{equation*}

## Naive Bayes

Spam email classifier can be built as a Naive Bayes Classifier. We check the words of an email against a sample of sentences we have.

<img src=".\images\4.02_spam-example_01.png" style="width: 400px;"/>

So, given an email that contains the word `easy`, the probability that is spam is $1/3$, while given an email that contains the word `money`, the proabbility that is spam is $2/3$ (total sentences in spam cat = 3).

#### Example

Suppose you have a bag with three standard 6-sided dice with face values [1,2,3,4,5,6] and two non-standard 6-sided dice with face values [2,3,3,4,4,5]. Someone draws a 
die from the bag, rolls it, and announces it was a 3. What is the probability that the die that was rolled was a standard die?

\begin{equation*}
P(std) = 3/5 \\
P(\neg std) = 2/5 \\
\\
P(std|'3') = \frac{P(std) \cdot P('3'|std)}{P(std) \cdot P('3'|std) + P(\neg std) \cdot P('3'|\neg std)} = \frac{3/5 \cdot 1/6}{3/5 \cdot 1/6+2/5 \cdot 1/3} = \frac{1/10}{1/10+2/15} = \frac{3}{7} = 43\%
\end{equation*}

In [3]:
# Import our libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


# Read in our dataset
df = pd.read_csv('data\smsspamcollection/SMSSpamCollection',
                   sep='\t', 
                   header=None, 
                   names=['label', 'sms_message'])

# Fix our response value
df['label'] = df.label.map({'ham':0, 'spam':1})

# Split our dataset into training and testing data
X_train, X_test, y_train, y_test = train_test_split(df['sms_message'], 
                                                    df['label'], 
                                                    random_state=1)

# Instantiate the CountVectorizer method
count_vector = CountVectorizer()

# Fit the training data and then return the matrix
training_data = count_vector.fit_transform(X_train)

# Transform testing data and return the matrix. Note we are not fitting the testing data into the CountVectorizer()
testing_data = count_vector.transform(X_test)

# Instantiate our model
naive_bayes = MultinomialNB()

# Fit our model to the training data
naive_bayes.fit(training_data, y_train)

# Predict on the test data
predictions = naive_bayes.predict(testing_data)

# Score our model
print('Accuracy score: ', format(accuracy_score(y_test, predictions)))
print('Precision score: ', format(precision_score(y_test, predictions)))
print('Recall score: ', format(recall_score(y_test, predictions)))
print('F1 score: ', format(f1_score(y_test, predictions)))

Accuracy score:  0.9885139985642498
Precision score:  0.9720670391061452
Recall score:  0.9405405405405406
F1 score:  0.9560439560439562
