# 第 4 章 朴素贝叶斯法
## 4.2 朴素贝叶斯的参数估计
### 例 4.1
由下表的训练数据学习一个朴素贝叶斯分类器并*确定$x=(2, S)^T$的类标记$y$*。
表中$X^{(1)},X^{(2)}$为特征，取值的集合分别为$A_1=\{1,2,3\},A_2=\{S,M,L\}$，$Y$为类标记，$Y \in C =\{1, -1\}$
![](https://qiniu.lianghao.work/markdown/20220416190900.png)

In [75]:
# 计算先验 P(Y=1)
def Prior(C, Train_Y):
    priors = dict()
    for item in C:
        sum = 0
        for i in range(0, len(Train_Y)):
            if Train_Y[i] == item:
                sum = sum + 1
        if item not in priors.keys():
            priors[item] = sum
    return priors

def Joins(Train_X, Train_Y, A, C):
    condition = dict()
    for x in A:
        for c in C:
            sum = 0
            for i in range(0, len(Train_X)):
                if Train_X[i] == x and Train_Y[i] == c:
                    sum = sum + 1
            if (x, c) not in condition.keys():
                condition[(x, c)] = sum
    return condition

def NaiveBayes(data, Train_X, Train_Y, A, C):
    x1, x2 = data
    # 计算朴素贝叶斯所需的概率
    # 1. 先验概率
    prior = Prior(C, Train_Y)
    # 2. 条件概率概率
    conditions = []
    for i in range(0, len(Train_X)):
        conditions.append(Joins(Train_X[i], Train_Y, A[i], C))
    # 3. 计算后验
    posterior = []
    target = C[0]
    max = -1
    for item in C:
        tmp = (prior[item] / len(Train_Y))
        for i in range(0, len(data)):
            tmp = tmp * (conditions[i][(data[i], item)] / prior[item])
        # 寻找后验最大的类别
        if max < tmp:
            max = tmp
            target = item
    return target


In [51]:
# 取值集合
A1 = [1, 2, 3]
A2 = ['S', 'M', 'L']
C = [1, -1]
# 训练数据
X1 = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
X2 = ['S', 'M', 'M', 'S', 'S', 'S', 'M', 'M', 'L', 'L', 'L', 'M', 'M', 'L', 'L']
Y = [-1, -1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1]
data = [2, 'S']

In [78]:
y =  NaiveBayes(data, [X1, X2], Y, [A1, A2], C)
print("输入 x : ({0}, {1}) 经过朴素贝叶斯算法分类后的结果为 ：y = {2}".format(data[0], data[1], y))

输入 x : (2, S) 经过朴素贝叶斯算法分类后的结果为 ：y = -1


### 例 4.2
由下表的训练数据学习一个朴素贝叶斯分类器并*确定$x=(2, S)^T$的类标记$y$*，使用拉普拉斯平滑估计概率，即$\lambda=1$。
表中$X^{(1)},X^{(2)}$为特征，取值的集合分别为$A_1=\{1,2,3\},A_2=\{S,M,L\}$，$Y$为类标记，$Y \in C =\{1, -1\}$
![](https://qiniu.lianghao.work/markdown/20220416190900.png)

In [104]:
# 计算先验 P(Y=1)
def Prior(C, Train_Y, lam):
    priors = dict()
    for item in C:
        sum = 0
        for i in range(0, len(Train_Y)):
            if Train_Y[i] == item:
                sum = sum + 1
        if item not in priors.keys():
            priors[item] = (sum + lam) / (len(Train_Y) + len(C) * lam)
    return priors

def Joins(Train_X, Train_Y, A, C, lam):
    condition = dict()
    for x in A:
        for c in C:
            sum = 0
            for i in range(0, len(Train_X)):
                if Train_X[i] == x and Train_Y[i] == c:
                    sum = sum + 1
            sum_c = 0
            for i in range(0, len(Train_Y)):
                if Train_Y[i] == c:
                    sum_c = sum_c + 1
            if (x, c) not in condition.keys():
                condition[(x, c)] = (sum + lam )/ (sum_c + len(A) * lam)
    return condition

def NaiveBayes(data, Train_X, A, Train_Y, C, lam):
    x1, x2 = data
    # 计算朴素贝叶斯所需的概率
    # 1. 先验概率
    prior = Prior(C, Train_Y, lam)
    # 2. 条件概率概率
    conditions = []
    for i in range(0, len(Train_X)):
        conditions.append(Joins(Train_X[i], Train_Y, A[i], C, lam))
    # 3. 计算后验
    target = C[0]
    max = -1
    for item in C:
        tmp = prior[item]
        for i in range(0, len(data)):
            tmp = tmp * (conditions[i][(data[i], item)])
        # 寻找后验最大的类别
        print(tmp)
        if max < tmp:
            max = tmp
            target = item
    return target

In [80]:
# 取值集合
A1 = [1, 2, 3]
A2 = ['S', 'M', 'L']
C = [1, -1]
# 训练数据
X1 = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]
X2 = ['S', 'M', 'M', 'S', 'S', 'S', 'M', 'M', 'L', 'L', 'L', 'M', 'M', 'L', 'L']
Y = [-1, -1, 1, 1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, -1]
data = [2, 'S']

In [105]:
y =  NaiveBayes(data, [X1, X2], [A1, A2], Y, C, 1)
print("输入 x : ({0}, {1}) 经过朴素贝叶斯算法分类后的结果为 ：y = {2}".format(data[0], data[1], y))

0.0326797385620915
0.06100217864923746
输入 x : (2, S) 经过朴素贝叶斯算法分类后的结果为 ：y = -1


## 习题
[Reference](https://blog.csdn.net/qq_41626059/article/details/115598863?spm=1001.2014.3001.5502)
### 4.1 用极大似然估计法推出朴素贝叶斯法中的概率估计下列公式
\begin{aligned}
P(Y=c_k) &=\frac{\sum_{i=1}^N I(y_i=c_k)}{N}\\
P(X^{(j)}=a_{jl}|Y=c_k)&=\frac{\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i=a_k)}{\sum_{i=1}^N I(y_i=c_k)}
\end{aligned}
#### （1）先验概率的极大似然估计
假设训练数据集$T=\{(x_1, y_1),(x_2,y_2),\cdots, (x_N, y_N)\}$，其中训练集$T$中类别为$c_k$的数量为$n_k$，$P(y_i=c_k)=\theta_k$。
得到似然函数如下
$$
\begin{aligned}
P(y_1, y_2, \cdots, y_N|\theta_k) &= \prod\limits_{i=1}^N P(y_i|\theta_k)\\
&=\theta_k^{n_k}(1-\theta_k)^{(N-n_k)}
\end{aligned}
$$

取对数并对参数求导可得
$$
\begin{aligned}
L(\theta_k) &= n_k\log \theta_k+(N-n_k)\log(1-\theta_k)\\
L'(\theta_k) &= \frac{n_k}{\theta_k}-\frac{N-\theta_k}{1-\theta_k}
\end{aligned}
$$
令导数$L'(\theta_k)=0$，得当$\theta_k = \frac{n_k}{N}$时，似然函数取得最大值。
由于$n_k = \sum_{i=1}^N I(y_i=c_k)$，所以可得**先验概率**
$$
\begin{aligned}
P(Y=c_k)=\frac{\sum_{i=1}^N I(y_=c_k)}{N}
\end{aligned}
$$
#### （2）条件概率的极大似然估计
$$
\begin{aligned}
P(x^{(j)}=a_{jl}|Y=c_k)=\frac{P(x^{(j)}=a_{jl}, Y=c_k)}{P(Y-c_k)}
\end{aligned}
$$
其中，$P(Y-c_k)$就是前面推导出的先验概率，因此，只需要根据训练数据集推出联合概率分布$P(x^{(j)}=a_{jl}, Y=c_k)$即可。
假设训练数据集$T=\{(x_1, y_1),(x_2,y_2),\cdots, (x_N, y_N)\}$，其中$P(x^{(j)}=a_{jl}, Y=c_k)=\theta$，并且满足$x^{(j)}=a_{jl}, Y=c_k$的数据个数为$n$。
似然函数如下
$$
\begin{aligned}
P((x^{(j)}_1, y_1), (x^{(j)}_2, y_2)l, \cdots, (x^{(j)}_N, y_N)|\theta) &= \prod\limits_{i=1}^N P((x^{(j)}_i, y_i))\\
&=\theta^n(1-\theta)^{(N-n)}
\end{aligned}
$$

对似然函数取对数并求导

$$
\begin{aligned}
L(\theta) &= n\log \theta + (N-n)\log(1-\theta)\\
L'(\theta)&=\frac{n}{\theta} - \frac{N-n}{1-\theta}
\end{aligned}
$$
当$\theta = \frac{n}{N}$时，似然函数取最大值。由于$n = \sum_{i=1}^N I(x^{(j)}=a_{jl}, y_i=ck)$，所以
$$
\begin{aligned}
P(x^{(j)}=a_{jl}, Y=c_k) &= \frac{\sum_{i=1}^N I(x^{(j)}=a_{jl}, y_i=ck)}{N}\\
P(x^{(j)}=a_{jl} | Y=c_k) &= \frac{\sum_{i=1}^N I(x^{(j)}=a_{jl}, y_i=ck)}{\sum_{i=1}^N I(y_=c_k)}
\end{aligned}
$$
### 4.2 用贝叶斯估计法推出朴素贝叶斯法中的概率估计下列公式
$$
\begin{aligned}
P_\lambda(Y=c_k)&=\frac{\sum_{i=1}^N I(y_i=c_k)+\lambda}{N + K\lambda}\\
P_\lambda(X^{(j)}=a_{jl}|Y=c_k)&=\frac{\sum_{i=1}^NI(x_i^{(j)}=a_{jl},y_i = c_k)+\lambda}{\sum_{i=1}^NI(y_i=c_k)+S_j\lambda}
\end{aligned}
$$
#### （1）先验概率的贝叶斯估计
假设训练数据集$T=\{(x_1, y_1),(x_2,y_2),\cdots, (x_N, y_N)\}$，其中$P(y_i=c_k)=\theta_k$，并且满足$Y=c_k$的数据个数为$n_k=\sum_{i=1}^N P(y_i = c_k)$
由于贝叶斯估计要引入待估计参数的先验信息，这里假设参数是均匀分布的，此外$Y$的取值个数为$K$个，即$Y\in \{c_1, c_2, \cdots, c_K\}$，所以$P(\theta_k)=\theta_k^\lambda(1-\theta_k)^{(K-1)\lambda}$。其中，$Y$的取值有$K$种情况，$\lambda$表示每种情况的初始值，即每种情况有$\lambda$次发生
$$
\begin{aligned}
P(y_1, y_2, \cdots, y_N | \theta_k)\cdotp(\theta_k) &= \prod\limits_{i=1}^N P(y_i=c_k)\cdot P(\theta_k)\\
&=\theta_k^{n_k}(1-\theta_k)^{(N - n_k)} * \theta_k^\lambda(1-\theta_k)^{(K-1)\lambda}\\
&=\theta_k^{n_k + \lambda}(1-\theta_k)^{N - n_k+(K-1)\lambda}
\end{aligned}
$$
取对数，并求得可得$\theta_k=\frac{n_k+\lambda}{N+K\lambda}$，即
$$
\begin{aligned}
P(Y=c_k)= \frac{\sum_{i=1}^N I(y_i=c_k)+\lambda}{N +K\lambda}
\end{aligned}
$$
#### （2）条件概率的贝叶斯估计
$$
\begin{aligned}
P(x^{(j)}=a_{jl}|Y=c_k)=\frac{P(x^{(j)}=a_{jl}, Y=c_k)}{P(Y-c_k)}
\end{aligned}
$$
其中，$P(Y-c_k)$就是前面推导出的先验概率，因此，只需要根据训练数据集推出联合概率分布$P(x^{(j)}=a_{jl}, Y=c_k)$即可。
假设训练数据集$T=\{(x_1, y_1),(x_2,y_2),\cdots, (x_N, y_N)\}$，其中$P(x^{(j)}=a_{jl}, Y=c_k)=\theta$，并且满足$x^{(j)}=a_{jl}, Y=c_k$的数据个数为$n$。参数$\theta$先验信息， $P(\theta)=\theta^{\lambda}(1-\theta)^{(KS_j-1)\lambda}$, $S_j$为$x^{(j)}$可取的情况，$K$为$y_i$可取的情况
$$
\begin{aligned}
P((x^{(j)}_1, y_1), (x^{(j)}_2, y_2)l, \cdots, (x^{(j)}_N, y_N)|\theta) \cdot P(\theta) &= \prod\limits_{i=1}^N P((x^{(j)}_i, y_i)) \cdot P(\theta)\\
&=\theta^n(1-\theta)^{(N-n)} \cdot \theta^{\lambda}(1-\theta)^{(KS_j-1)\lambda}\\
&=\theta^{n + \lambda}(1-\lambda)^{N -n+(KS_j-1)\lambda}
\end{aligned}
$$
取对数并求导
$$
\begin{aligned}
L(\theta) &= (n+\lambda)\log \theta + (\lambda KS_j-\lambda+N - n)\log(1-\theta)\\
L'(\theta) &=\frac{n+\lambda}{\theta}-\frac{\lambda KS_j-\lambda+N - n}{1-\theta}
\end{aligned}
$$
令导数为0，可得$\theta = \frac{n+\lambda}{N+\lambda KS_j}$，由于$n = \sum_{i=1}^N I(x^{(j)}=a_{jl}, y_i=c_k)$，所以
$$
\begin{aligned}
P(x^{(j)}=a_{jl}, Y=c_k) = \frac{\sum_{i=1}^N I(x^{(j)}=a_{jl}, y_i=c_k) + \lambda}{N + \lambda KS_J}
\end{aligned}
$$
又加上$P(Y=c_k)= \frac{\sum_{i=1}^N I(y_i=c_k)+\lambda}{N +K\lambda}$，所以得贝叶斯估计的条件概率
$$
\begin{aligned}
P(x^{(j)}=a_{jl}|Y=c_k) &=\frac{\sum_{i=1}^N I(x^{(j)}=a_{jl}, y_i=c_k) + \lambda_0}{N + \lambda_0 KS_J} \cdot \frac{N +K\lambda_1}{\sum_{i=1}^N I(y_i=c_k)+\lambda_1}\\
&= asume \ \ lambda_1 = S_j \lambda_0\\
&= \frac{\sum_{i=1}^N I(x^{(j)}=a_{jl}, y_i=c_k) + \lambda_0}{N + \lambda_0 KS_j} \cdot \frac{N +\lambda_0 KS_j}{\sum_{i=1}^N I(y_i=c_k)+\lambda_0 S_j}\\
& = \frac{\sum_{i=1}^N I(x^{(j)}=a_{jl}, y_i=c_k) + \lambda_0}{\sum_{i=1}^N I(y_i=c_k)+\lambda_0 S_j}
\end{aligned}
$$