## Fisher判别法

> https://scikit-learn.org/stable/auto_examples/classification/plot_lda_qda.html

问题的初衷在于找到一条线将坐标点向该线上投影，将这条线的方向设为www，并用该www作为假设带入，最后解出最佳www

按照我们假设的www，将样本点向该直线中投影，即wTxw^Txw^Tx，求出每一类样本点在投影上的均值和方差（或者说是协方差矩阵）按照类间小，类内大的目标，设立目标函数求解www 

值得注意的是，我们求得的**w是最终投影的平面（在这里为一维直线）方向**，而不是感知机或逻辑斯蒂回归中的决策边界，这个问题一度让我以为自己的w求错了！！！

![](https://pic2.zhimg.com/v2-90f95fe7eb9e43dfb7587816b185bb99_b.jpg)

## 线性判别分析 sklearn.discriminant_analysis.LinearDiscriminantAnalysis

> https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html?highlight=lda
>
> https://zhuanlan.zhihu.com/p/33742983

- Fisher判别分析，利用投影技术进行降维，降维后计算组内偏差（此处可类比到方差分析中的随机误差），同时计算组间偏差（此处可类比方差分析中的各个因素水平之间的组间偏差），利用凸优化方法找到使得组内偏差最小化、组间偏差最大化的直线或者超平面来分割不同的类别。

- 不需要设置参数

In [51]:
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
np.set_printoptions(precision=8, suppress=True)

X = np.array([[1.24,1.27], 
              [1.36,1.74], 
              [1.38,1.64], 
              [1.38,1.82], 
              [1.38,1.90], 
              [1.40,1.70],
              [1.48,1.82], 
              [1.54,1.82], 
              [1.56,2.08],
              [1.14,1.78], 
              [1.18,1.96], 
              [1.20,1.86],
              [1.26,2.00], 
              [1.28,2.00], 
              [1.30,1.96]])
# y = np.hstack([np.ones(9), 2*np.ones(6)])
y = np.hstack([np.ones(4), 3*np.ones(5), 2*np.ones(6)]) # 多分类也可以做
X_test = np.array([[1.24,1.80], 
                   [1.28,1.84], 
                   [1.40,2.04]])

clf = LDA(tol=1e-8).fit(X, y)

print(clf.classes_) # 样本类别
print('-'*30)

print(clf.coef_) # 投影平面的系数
print(clf.intercept_)
print('-'*30)

print(clf.coef_ @ X_test.T + clf.intercept_ ) # 在判别直线上的投影值，最大的即为预测值 *************************************************

print(clf.predict(X_test))
print(clf.predict_proba(X_test))
print('-'*30)

print(clf.score(X, y)) # 准确率

# print(clf.explained_variance_ratio_) # 对特征的方差解释百分比---可以作为对变量重要性的某种权重***********************


[1. 2. 3.]
------------------------------
[[ 25.68042447 -16.39379925]
 [-56.47069511  21.83776131]
 [ 47.22049455 -13.09027417]]
[ -7.51223888  30.57098506 -43.32504515]
------------------------------
[[ -5.17735119  33.27733776 -40.81580136]
 [-38.22793045  -1.53002386 -77.83498521]
 [ 27.47868085  66.92711361  -3.92051209]]
[2. 2. 3.]
[[0.0064775  0.9932468  0.0002757 ]
 [0.03625682 0.9595745  0.00416867]
 [0.14598421 0.42307037 0.43094542]]
------------------------------
0.9333333333333333


## 二次判别分析 sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis

> https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis.html?highlight=lda

In [23]:
import numpy as np
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA
np.set_printoptions(precision=8, suppress=True)

X = np.array([[1.24,1.27], 
              [1.36,1.74], 
              [1.38,1.64], 
              [1.38,1.82], 
              [1.38,1.90], 
              [1.40,1.70],
              [1.48,1.82], 
              [1.54,1.82], 
              [1.56,2.08],
              [1.14,1.78], 
              [1.18,1.96], 
              [1.20,1.86],
              [1.26,2.00], 
              [1.28,2.00], 
              [1.30,1.96]])
# y = np.hstack([np.ones(9), 2*np.ones(6)])
y = np.hstack([np.ones(4), 3*np.ones(5), 2*np.ones(6)]) # 多分类也可以做
X_test = np.array([[1.24,1.80], 
                   [1.28,1.84], 
                   [1.40,2.04]])

clf = QDA(tol=1e-8).fit(X, y)

print(clf.classes_) # 样本类别
print('-'*30)

print(clf.predict(X_test))
print(clf.predict_proba(X_test))
print(clf.score(X, y)) # 准确率

[1. 2. 3.]
------------------------------
[2. 2. 3.]
[[0.00000001 0.94211519 0.0578848 ]
 [0.00001481 0.72893759 0.2710476 ]
 [0.19844083 0.19807925 0.60347992]]
1.0


## 可视化（超平面分割）

> https://scikit-learn.org/stable/auto_examples/classification/plot_lda_qda.html

## 模型评价 （交叉验证）sklearn.model_selection.cross_val_score

> 介绍 https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation
> 
> API https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html?highlight=cross_val_score#sklearn.model_selection.cross_val_score

- cv: int, cross-validation generator or an iterable, default=None ：样本分组方法cv=k，分成k组，k-1训练，1组评估

In [52]:
import numpy as np
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.model_selection import cross_val_score
np.set_printoptions(precision=8, suppress=True)

X = np.array([[1.24,1.27], 
              [1.36,1.74], 
              [1.38,1.64], 
              [1.38,1.82], 
              [1.38,1.90], 
              [1.40,1.70],
              [1.48,1.82], 
              [1.54,1.82], 
              [1.56,2.08],
              [1.14,1.78], 
              [1.18,1.96], 
              [1.20,1.86],
              [1.26,2.00], 
              [1.28,2.00], 
              [1.30,1.96]])
y = np.hstack([np.ones(9), 2*np.ones(6)])
X_test = np.array([[1.24,1.80], 
                   [1.28,1.84], 
                   [1.40,2.04]])

clf = LDA().fit(X, y)


print(clf.score(X, y)) # 准确率
print('-'*30)

scores = cross_val_score(clf, X, y, cv=5)
print(scores)
print(scores.mean())




1.0
------------------------------
[1. 1. 1. 1. 1.]
1.0
