## Naive Bayes

In simple terms, for the given training data, Naive Bayes first learns the joint probability distribution of the input and output based on the feature conditional independence hypothesis, and then uses Bayes' theorem to calculate the maximum posterior probability based on this distribution for new instances. Naive Bayes does not directly learn the joint probability distribution of the input and output, but learns the prior probability of the class and the conditional probability of the class.

$$
P(c|x) = \frac{P(x|c)P(c)}{P(x)} \rightarrow \text{Posterior Probability} = \frac{(\text{Likelihood})(\text{Class Prior Probability})}{\text{Predictor Prior Probability}}
$$

The "Naive" in Naive Bayes represents the assumption of conditional independence, which means that the features used for classification are conditionally independent under the condition that the class is determined. This assumption enables the learning of Naive Bayes to be realistic. The specific steps of the Naive Bayes algorithm are as follows:

Firstly, calculate the class prior probability (it can be calculated by MLE):
$$
p(y=c_{k})=\frac{1}{N} \sum ^{N}_{i=1} I(\tilde{y}_{i}=c_{k}), \enspace k=1,2, \cdots, K
$$

Then calculate the class conditional probability:
$$
p(x_{j}=a_{j,l} | y=c_{k}) = \frac{\sum ^{N}_{i=1}I(x_{i,j}=a_{j,l}, \tilde{y}_{i}=c_{k})}{\sum ^{N}_{i=1}I(\tilde{y}_{i}=c_{k})} \\
j=1,2, \cdots ,n, \enspace l=1,2, \cdots ,s_{j}, \enspace k=1,2, \cdots, K
$$

Finally, given a new instance, calculate the corresponding maximum posterior probability, and determine the category that it belongs:
$$
\hat{y}=\arg \max _{c_{k}} p\left(y=c_{k}\right) \prod_{j=1}^{n} p\left(x_{j} \mid y=c_{k}\right)
$$

In [19]:
import numpy as np
import pandas as pd

In [20]:
# generate data sample
X1 = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
X2 = ['S', 'M', 'M', 'S', 'S', 'S', 'M', 'M', 'L', 'L', 'L', 'M', 'M', 'L', 'L']
y = [-1, -1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1]
df = pd.DataFrame({'x1':x1, 'x2':x2, 'y':y})
df.head()

Unnamed: 0,x1,x2,y
0,1,S,-1
1,1,M,-1
2,1,M,1
3,1,S,1
4,1,S,-1


In [21]:
# extract the feature and label
X = df[['x1', 'x2']]
y = df[['y']]

In [22]:
# training of naive bayes
def fit(X, y):
    classes = y[y.columns[0]].unique() 
    class_count = y[y.columns[0]].value_counts() 
    class_prior = class_count/len(y)

    prior = dict()
    for col in X.columns:
        for j in classes:
            p_x_y = X[(y==j).values][col].value_counts()
            for i in p_x_y.index:
                prior[(col, i, j)] = p_x_y[i]/class_count[j]

    return classes, class_prior, prior

In [23]:
# define the prediction function
def predict(X_test): 
    res = []
    for c in classes:
        p_y = class_prior[c] 
        p_x_y = 1
        for i in X_test.items():
            p_x_y *= prior[tuple(list(i)+[c])] 
        res.append(p_y*p_x_y)
            
    return classes[np.argmax(res)]

In [24]:
X_test = {'x1': 2, 'x2': 'S'}
classes, class_prior, prior = fit(X, y)
print('prediction for test data:', predict(X_test))

prediction for test data: -1
