# SPAM or HAM

## Probabilities
### Introduction

- Statistics and probability theory constitute a branch of mathematics for dealing with uncertainty. The probability theory provides a basis for the science of statistical inference from data
- Sample: (of size n) obtained from a mother population assumed to be represented by a probability
- Descriptive statistics: description of the sample
- Inferential statistics: making a decision or an inference from a sample of our problem

### Probabilities

A set of probability values for an experiment with sample space $S = \\{ O_1, O_2, \cdots, O_n \\}$  consists of some probabilities that satisfy: $$ 0 \leq p_i \leq 1, \hspace{0.5cm} i= 1,2, \cdots, n $$ and
$$ p_1 + p_2 + \cdots +p_n = 1 $$

The probability of outcome $O_i$ occurring is said to be $p_i$ and it is written:

$$ P(O_i) = p_i $$

In cases in which the $n$ outcomes are equally likely, then each probability will have a value of $\frac{1}{n}$

### Events
- Events: subset of the sample space
- The probability of an event $A$, $P(A)$, is obtained by the probabilities of the outcomes contained withing the event $A$
- An event is said to occur if one of the outcomes contained within the event occurs
- Complement of events: event $ A' $ is the event consisting of everything in the sample space $S$ that is not contained within $A$: $$
P(A) + P(A ') = 1$$

### Combinations of Events

1. Intersections
- $A \cap B$ consists of the outcomes contained within both events $A$ and $B$
- Probability of the intersection, $P(A \cap B) $, is the probability that both events occur simultaneously
- Properties:
    - $P(A \cap B) +P(A \cap B') = P(A)$
    - Mutually exclusive events: if $A \cap B = \emptyset$
    - $A \cap (B \cap C) = (A \cap B) \cap C $
2. Union
- Union of Events: $ A \cup B $ consists of the outcomes that are contained within at least one of the events $A$ and $B$
- The probability of this event, $P (A \cup B)$ is the probability that at least one of these events $A$ and $B$ occurs
- Properties:
    - If the events are mutually exclusive, then $P(A \cup B) = P(A) + P(B)$
    - $P( A \cup B) = P(A \cap B') + P(A' \cap B) + P(A \cap B)$
    - $P( A \cup B) = P(A) + P(B) - P(A \cap B)$
    - $P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(A \cap B) - P( B \cap C) - P( A \cap C) + P(A \cap B \cap C)$

### Conditional Probability
- Conditional Probability: of an event $A$ conditional on an event $B$ is:
$$P(A \mid B) = \frac{P(A \cap B)}{P(B)} \hspace{0.5cm}  \text{for } P(B) >0$$
- Properties:
    - $P (A \mid B) = \frac{P(A \cap B)}{P(B)} \Longrightarrow P(A \cap B) = P(B)P (A \mid B)$
    - $P (A \mid B \cap C) = \frac{P(A \cap B \cap C)}{P(B \cap C)} \Longrightarrow P(A \cap B \cap C) = P(B \cap C)P (A \mid B \cap C)$
    - In general, for a sequence of events $A_1, A_2, \cdots, A_n$:
    $$P(A_1, A_2, \cdots, A_n) = P(A_1)P(A_2 \mid A_1)P(A_3 \mid A_1 \cap A_2) \cdots P(A_n \mid A_1 \cap \cdots \cap A_{n-1})$$
- Two events A and B are independent if
    - $P(A \mid B) = P(A)$
    - $P(B \mid A) = P(B)$
    - $P(A \cap B) = P(A) \times P(B)$
    - Interpretation: events are independent if the knowledge about one event does not affect the probability of the other event

### Posterior Probabilities
- Law of total probability: Given $\{ A_1, A_2, \cdots, A_n \}$ a partition of sample space $S$, the probability of an event $B$, $P(B)$ can be expressed as:
$$P(B) = \sum_{i=1}^n P(A_i)P(B \mid A_i)$$
- Bayes' Theorem: Given $\{ A_1, A_2, \cdots, A_n \}$ a partition of a sample space, then the posterior probabilities of the event $A_i$ conditional on an event $B$ can be obtained from the probabilities $P(A_i)$ and $P(A_i \mid B)$ using the formula:
$$ P(A_i \mid B) = \frac{P(A_i)P(B \mid A_i)}{\sum_{j=1}^n P(A_j)P(B \mid A_j)}$$

## Load "toy" example

In [1]:
train_spam = ['send us your password', 'review our website', 'send your password', 'send us your account']
train_ham = ['Your activity report','benefits physical activity', 'the importance vows']
test_emails = {'spam':['renew your password', 'renew your vows'], 'ham':['benefits of our account', 'the importance of physical activity']}

## Print the Vocab

In [2]:
# make a vocabulary of unique words that occur in known spam emails
vocab_words_spam = []

for sentence in train_spam:
    sentence_as_list = sentence.split()
    for word in sentence_as_list:
        vocab_words_spam.append(word)     
vocab_words_spam = list(set(vocab_words_spam))

print(vocab_words_spam)

['our', 'website', 'us', 'send', 'your', 'review', 'password', 'account']


In [3]:
vocab_words_ham = []

for sentence in train_ham:
    sentence_as_list = sentence.split()
    for word in sentence_as_list:
        vocab_words_ham.append(word)
vocab_words_ham = list(set(vocab_words_ham))

print(vocab_words_ham)

['activity', 'physical', 'importance', 'the', 'vows', 'benefits', 'Your', 'report']


## Naive Bayes (Discrete)

The idea of this project is to write a simple Naive Bayes model to predict if a SMS message is spam or not.
Let us derive the necessary probabilities.
Naive Bayes is a models that relies on the Bayes' theorem:

$$
P(Y|X) = \frac{P(X|Y)\times P(Y)}{P(X)}
$$

For this dataset we can write the equation as:

$$\begin{aligned}
P(y|W_1, ... W_n) &= \frac{P(W_0, ... W_n|y)\times P(y)}{P(W_1, ... W_n)} \\
P(y|W_1, ... W_n) &= \frac{P(W_0 | W_1, ... W_n,y)\times ...\times P(y)}{P(W_0, ... W_n)} \\
P(y|W_1, ... W_n) &= \frac{P(y) \times \prod_{i=0}^{n}P(W_i|y)}{P(W_0, ... W_n)} \\
P(y|W_1, ... W_n) &= \frac{P(y) \times \prod_{i=0}^{n}P(W_i|y)}{P(y) \times \prod_{i=0}^{n}P(W_i|y) + P(\neg y) \times \prod_{i=0}^{n}P(W_i|\neg y)}
\end{aligned}$$

## Compute the likelihood

In [4]:
def compute_likelihood(vocab, train, label='spam'):
    likelihood = {}
    for w in vocab:
        count = 0
        for sentence in train:
            if w in sentence:
                #print(w+":", sentence)
                count += 1
        print(f"Number of {label} emails with the word '{w}': {count}")
        prob = (count + 1)/(len(train) + 2) # smoothing
        print(f"{label}ity of the word '{w}': {prob} ")
        likelihood[w.lower()] = prob
    return likelihood

In [5]:
likelihood_spam = compute_likelihood(vocab_words_spam, train_spam)
likelihood_spam

Number of spam emails with the word 'our': 4
spamity of the word 'our': 0.8333333333333334 
Number of spam emails with the word 'website': 1
spamity of the word 'website': 0.3333333333333333 
Number of spam emails with the word 'us': 2
spamity of the word 'us': 0.5 
Number of spam emails with the word 'send': 3
spamity of the word 'send': 0.6666666666666666 
Number of spam emails with the word 'your': 3
spamity of the word 'your': 0.6666666666666666 
Number of spam emails with the word 'review': 1
spamity of the word 'review': 0.3333333333333333 
Number of spam emails with the word 'password': 2
spamity of the word 'password': 0.5 
Number of spam emails with the word 'account': 1
spamity of the word 'account': 0.3333333333333333 


{'our': 0.8333333333333334,
 'website': 0.3333333333333333,
 'us': 0.5,
 'send': 0.6666666666666666,
 'your': 0.6666666666666666,
 'review': 0.3333333333333333,
 'password': 0.5,
 'account': 0.3333333333333333}

In [6]:
likelihood_ham = compute_likelihood(vocab_words_ham, train_ham, 'ham')
likelihood_ham

Number of ham emails with the word 'activity': 2
hamity of the word 'activity': 0.6 
Number of ham emails with the word 'physical': 1
hamity of the word 'physical': 0.4 
Number of ham emails with the word 'importance': 1
hamity of the word 'importance': 0.4 
Number of ham emails with the word 'the': 1
hamity of the word 'the': 0.4 
Number of ham emails with the word 'vows': 1
hamity of the word 'vows': 0.4 
Number of ham emails with the word 'benefits': 1
hamity of the word 'benefits': 0.4 
Number of ham emails with the word 'Your': 1
hamity of the word 'Your': 0.4 
Number of ham emails with the word 'report': 1
hamity of the word 'report': 0.4 


{'activity': 0.6,
 'physical': 0.4,
 'importance': 0.4,
 'the': 0.4,
 'vows': 0.4,
 'benefits': 0.4,
 'your': 0.4,
 'report': 0.4}

## Compute the prior probability

In [7]:
prior_spam = len(train_spam) / (len(train_spam)+(len(train_ham)))
print(f'Prior prob SPAM: {prior_spam}')
prior_ham = len(train_ham) / (len(train_spam)+(len(train_ham)))
print(f'Prior prob HAM: {prior_ham}')

Prior prob SPAM: 0.5714285714285714
Prior prob HAM: 0.42857142857142855


## Combine Prior Pos and Likelihood probabilities using Bayes

In [8]:
from functools import reduce

def Bayes(txt):
    probs_spam = []
    probs_ham = []

    txt_as_list = txt.split()
    for w in txt_as_list:
        if w in likelihood_spam:
            pr_WS = likelihood_spam[w]
        else:
            pr_WS = 1.0/(len(train_spam)+2)
        
        if w in likelihood_ham:
            pr_WH = likelihood_ham[w]
        else:
            pr_WH = 1.0/(len(train_ham)+2)
        
        probs_spam.append(pr_WS)
        probs_ham.append(pr_WH)
    
    p_if_spam = prior_spam * reduce(lambda num1, num2: num1 * num2, probs_spam, 1.0)
    p_if_ham = prior_ham * reduce(lambda num1, num2: num1 * num2, probs_ham, 1.0)
    return p_if_spam / (p_if_spam + p_if_ham)

## Classigy the "toy" example

In [9]:
for sentence in train_spam:
    prob_spam = Bayes(sentence)
    print(f'{sentence} -> {prob_spam}')

send us your password -> 0.9788566953797965
review our website -> 0.9391435011269722
send your password -> 0.9487666034155597
send us your account -> 0.9686168151879117


In [10]:
for sentence in train_ham:
    prob_spam = Bayes(sentence)
    print(f'{sentence} -> {prob_spam}')

Your activity report -> 0.11394712853236098
benefits physical activity -> 0.06041565973900434
the importance vows -> 0.08796622097114706


In [11]:
for sentence in test_emails['spam']:
    prob_spam = Bayes(sentence)
    print(f'{sentence} -> {prob_spam}')

renew your password -> 0.8223684210526315
renew your vows -> 0.43554006968641107


In [12]:
for sentence in test_emails['ham']:
    prob_spam = Bayes(sentence)
    print(f'{sentence} -> {prob_spam}')

benefits of our account -> 0.7627532340737124
the importance of physical activity -> 0.021838943903615127
