## Bayes判别

> https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html?highlight=gaussiannb#sklearn.naive_bayes.GaussianNB
> 
> https://scikit-learn.org/stable/modules/naive_bayes.html#gaussian-naive-bayes

---

> 解析 https://zhuanlan.zhihu.com/p/82402288
>
> https://zhuanlan.zhihu.com/p/296630090


---

- priors : array-like of shape (n_classes,) ：可给定分类的**先验概率**（默认是样本频率）

    Prior probabilities of the classes. If specified the priors are not adjusted according to the data.

In [17]:
import numpy as np
from sklearn.naive_bayes import GaussianNB

X = np.array([[1.24,1.27], 
              [1.36,1.74], 
              [1.38,1.64], 
              [1.38,1.82], 
              [1.38,1.90], 
              [1.40,1.70],
              [1.48,1.82], 
              [1.54,1.82], 
              [1.56,2.08],
              [1.14,1.78], 
              [1.18,1.96], 
              [1.20,1.86],
              [1.26,2.00], 
              [1.28,2.00], 
              [1.30,1.96]])
# y = np.hstack([np.ones(9), 2*np.ones(6)])
y = np.hstack([np.ones(4), 3*np.ones(5), 2*np.ones(6)]) # 多分类也可以做
X_test = np.array([[1.24,1.80], 
                   [1.28,1.84], 
                   [1.40,2.04]])

clf = GaussianNB().fit(X, y)

print(clf.predict(X_test))
print(clf.predict_proba(X_test))
print(clf.score(X, y)) # 准确率
print('-'*30)

print(clf.classes_) # 分类
print(clf.class_count_) # 分类个数统计
print('-'*30)

print(clf.class_prior_) # 先验概率==默认是clf.class_count_的归一化

[2. 2. 3.]
[[0.12203439 0.87136146 0.00660415]
 [0.18356162 0.78999271 0.02644566]
 [0.16404587 0.03210113 0.803853  ]]
0.8666666666666667
------------------------------
[1. 2. 3.]
[4. 6. 5.]
------------------------------
[0.26666667 0.4        0.33333333]


In [18]:
import numpy as np
from sklearn.naive_bayes import GaussianNB

X = np.array([[1.24,1.27], 
              [1.36,1.74], 
              [1.38,1.64], 
              [1.38,1.82], 
              [1.38,1.90], 
              [1.40,1.70],
              [1.48,1.82], 
              [1.54,1.82], 
              [1.56,2.08],
              [1.14,1.78], 
              [1.18,1.96], 
              [1.20,1.86],
              [1.26,2.00], 
              [1.28,2.00], 
              [1.30,1.96]])
# y = np.hstack([np.ones(9), 2*np.ones(6)])
y = np.hstack([np.ones(4), 3*np.ones(5), 2*np.ones(6)]) # 多分类也可以做
X_test = np.array([[1.24,1.80], 
                   [1.28,1.84], 
                   [1.40,2.04]])

clf = GaussianNB(priors=np.ones(np.unique(y).shape) * 1/3).fit(X, y)

print(clf.predict(X_test))
print(clf.predict_proba(X_test))
print(clf.score(X, y)) # 准确率
print('-'*30)

print(clf.classes_) # 分类
print(clf.class_count_) # 分类个数统计
print('-'*30)

print(clf.class_prior_) # 先验概率==默认是clf.class_count_的归一化

[2. 2. 3.]
[[0.17231011 0.82022995 0.00745995]
 [0.25097984 0.7200933  0.02892687]
 [0.19799652 0.02582982 0.77617366]]
0.8666666666666667
------------------------------
[1. 2. 3.]
[4. 6. 5.]
------------------------------
[0.33333333 0.33333333 0.33333333]


- train_test_split() ： 将样本划分为训练集和测试集，便于检验判别效果

In [19]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB


X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.7, random_state=0)
gnb = GaussianNB().fit(X_train, y_train)
y_pred = gnb.predict(X_test)
print("Number of mislabeled points out of a total %d points : %d"
      % (X_test.shape[0], (y_test != y_pred).sum()))

Number of mislabeled points out of a total 105 points : 7
