# Naive Bayes Classifier (NBC)

## 1. Probability Review
### 1.1 Conditional Probability
$$p(c|x) = \frac{p(c,x)}{p(x)}$$
### 1.2 Bayes' Theorem
$$p(c|x) = \frac{p(x|c)p(c)}{p(x)}$$
### 1.3 Principle of Classification
Denote $c_1$ as class 1, $c_2$ as class 2, $x$, $y$ are two independent features, if we have:
$$\begin{align*}
p(c_1|x,y)&>p(c_2|x,y)\\
\frac{p(x,y|c_1)p(c_1)}{p(x,y)}&>\frac{p(x,y|c_2)p(c_2)}{p(x,y)}\text{,}
\end{align*}$$
then we say the subject is more likely to be a member of $c_1$.

## 2. Text Classification
### 2.1 Text Processing

In [1]:
def create_dataset():
    postings=[['my','dog','has','flea','problems','help','please'],\
             ['maybe','not','take','him','to','dog','park','stupid'],\
             ['my','dalmation','is','so','cute','I','love','him'],\
             ['stop','posting','stupid','worthless','garbage'],\
             ['mr','licks','ate','my','steaks','how','to','stop','him'],\
             ['quit','buying','worthless','dog','food','stupid']]
    labels = [0,1,0,1,0,1] #1 is insulting words, 0 is not
    return postings, labels

def create_vocab_list(dataset):
    vocab_list = set([])
    for record in dataset:
        vocab_list = vocab_list|set(record)
    return list(vocab_list)

def record_to_vector(record, vocab_list):
    vector = [0]*len(vocab_list)
    for word in record:
        if word in vocab_list:
            vector[vocab_list.index(word)] = 1
        else:
            print('The word %s is not in the vocabulary list.'%str(word))
    return vector

In [3]:
postings, labels = create_dataset()
v_list = create_vocab_list(postings)
print(v_list)
vector_0 = record_to_vector(postings[0], v_list)
print(vector_0)

['maybe', 'dalmation', 'mr', 'cute', 'has', 'my', 'please', 'love', 'problems', 'help', 'steaks', 'food', 'so', 'stop', 'how', 'licks', 'take', 'is', 'posting', 'buying', 'quit', 'dog', 'stupid', 'him', 'worthless', 'flea', 'ate', 'park', 'not', 'garbage', 'I', 'to']
[0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0]


### 2.2 Training
Denote $w$ as a vector of a record, then we use:
$$p(c_i|w)=\frac{p(w|c_i)p(c_i)}{p(w)}$$
to calculate the probability of belonging to class $i$.  
If all features, *i.e.* $w_0,w_1,w_2,...$ are all indepent of each other, then we have:
$$\begin{align*}
p(w|c_i) &= p(w_0,w_1,w_2,\cdots|c_i)\\
&= p(w_0|c_i)p(w_1|c_i)p(w_2|c_i)\cdots p(w_N|c_i)
\end{align*}$$