## 这是关于评价模型的记录。
模型一共有以下几类：
- 主成分分析
- 因子分析
- DEA
- 熵值法
- 层次分析法
- 物元法
- 模糊评判法

### 熵权法：客观赋权

核心思想：通过数据的熵来选取其权重 -> 局部差异大的数据更为重要

基本步骤：
1. 选取指标：剔除次要指标
   1. 方法：最小均方差法/极大极小离差法
   2. 结果：m个主要指标${x_1, x_2, \cdots, x_n}$, n个样本
2. 极差变换法
   1. 极大型指标：$$a_{i j}^*=\frac{a_{i j}-\min _{1 \leq i \leq n} a_{i j}}{\max _{1 \leq i \leq n} a_{i j}-\min _{1 \leq i \leq n} a_{i j}}(1 \leq i \leq n, 1 \leq j \leq m)$$
   2. 极小型指标：$$a_{i j}^*=\frac{\max _{1 \leq i \leq n} a_{i j}-a_{i j}}{\max _{1 \leq i \leq n} a_{i j}-\min _{1 \leq i \leq n} a_{i j}}(1 \leq i \leq n, 1 \leq j \leq m)$$
   3. 结果：各样本的指标值区间在0.00-1.00之间
3. 利用熵计算各指标比重。假设$x_{ij}$指第i个样本在第j项指标下所占比重
   1. probability p：$p_{i j}=\frac{x_{i j}}{\sum_{i=1}^n x_{i j}}, \quad i=1, \cdots, n, j=1, \cdots, m$
   2. information entropy: $e_j=-k \sum_{i=1}^n p_{i j} \ln \left(p_{i j}\right), \quad j=1, \cdots, m$ ,$k=1 / \ln (n)>0$。熵越小，离散程度越小
   3. 获取离散度并normalize: $$d_j=1-e_j, \quad j=1, \cdots, m$$
$$w_j=\frac{d_j}{\sum_{j=1}^m d_j}, \quad j=1, \cdots, m$$
4. 获得得分
5. $s_i=\sum_{j=1}^m w_j x_{i j}, \quad i=1, \cdots, n$

注意事项：
1. 为什么这里要除以 $\ln (\mathrm{n})$ 这个常数?
在前面说过 $\mathrm{p}\left(\mathrm{x}_1\right)=\mathrm{p}\left(\mathrm{x}_2\right)=\ldots=\mathrm{p}\left(\mathrm{x}_{\mathrm{n}}\right)=1 / \mathrm{n}$ 时, $\mathrm{H}(\mathrm{x})$ 取最大值为 $\ln (\mathrm{n})$, 这里除以 $\ln (\mathrm{n})$ 能够使得信息嫡的始终位于 $[0,1]$ 区间上面。
1. ej 越大，即第 $\mathrm{j}$ 个指标的信息嫡越大，表明第 $\mathrm{j}$ 个指标的信息越多还是越少?
答案是越少。指标相同意味着这个指标的数据没有变 化, 也就是 信息少! 因此需要将其倒转, 即计算信息效用值。

In [13]:
import numpy as np
import matplotlib as plt

In [14]:
### Load data
A = [100, 90, 100, 84, 90, 100, 100, 100, 100]
B = [100, 100, 78.6, 100, 90, 100, 100, 100, 100]
C = [75, 100, 85.7, 100, 90, 100, 100, 100, 100]
D = [100, 100, 78.6, 100, 90, 100, 94.4, 100, 100]
E = [100, 90, 100, 100, 100, 90, 100, 100, 80]
F = [100, 100, 100, 100, 90, 100, 100, 85.7, 100]
G = [100, 100, 78.6, 100, 90, 100, 55.6, 100, 100]
H = [87.5, 100, 85.7, 100, 100, 100, 100, 100, 100]
I = [100, 100, 92.9, 100, 80, 100, 100, 100, 100]
J = [100, 90, 100, 100, 100, 100, 100, 100, 100]
K = [100, 100, 92.9, 100, 90, 100, 100, 100, 100]
data = [A, B, C, D, E, F, G, H, I, J, K]
data = np.array(data)
data

array([[100. ,  90. , 100. ,  84. ,  90. , 100. , 100. , 100. , 100. ],
       [100. , 100. ,  78.6, 100. ,  90. , 100. , 100. , 100. , 100. ],
       [ 75. , 100. ,  85.7, 100. ,  90. , 100. , 100. , 100. , 100. ],
       [100. , 100. ,  78.6, 100. ,  90. , 100. ,  94.4, 100. , 100. ],
       [100. ,  90. , 100. , 100. , 100. ,  90. , 100. , 100. ,  80. ],
       [100. , 100. , 100. , 100. ,  90. , 100. , 100. ,  85.7, 100. ],
       [100. , 100. ,  78.6, 100. ,  90. , 100. ,  55.6, 100. , 100. ],
       [ 87.5, 100. ,  85.7, 100. , 100. , 100. , 100. , 100. , 100. ],
       [100. , 100. ,  92.9, 100. ,  80. , 100. , 100. , 100. , 100. ],
       [100. ,  90. , 100. , 100. , 100. , 100. , 100. , 100. , 100. ],
       [100. , 100. ,  92.9, 100. ,  90. , 100. , 100. , 100. , 100. ]])

In [15]:
### Calculate weights from 0 to 1
### integrate the method into a function
def calc_weights(data_):
    data = data_.copy()
    ### first normalize the data
    for i in range(data.shape[1]):
        data[:, i] = (data[:, i] - min(data[:, i])) / (max(data[:, i]) - min(data[:, i]))
    norm_data = data
    ### calculate possibilities
    possibilities = np.zeros((data.shape[0], data.shape[1]))
    prob_log_prob = np.zeros((data.shape[0], data.shape[1]))
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            possibilities[i, j] = norm_data[i, j] / np.sum(norm_data[:, j])
            prob_log_prob[i, j] = possibilities[i, j] * np.log(possibilities[i, j]) if possibilities[i, j] != 0 else 0
    entropy = np.zeros(data.shape[1])
    ### To handle the zero case, we use the prob_log_prob to calculate the entropy
    for i in range(data.shape[1]):
        entropy[i] = -np.sum(prob_log_prob[:, i])/np.log(data.shape[0])
    duplicates = 1 - entropy
    weights = duplicates / np.sum(duplicates)
    return weights
weights = calc_weights(data)
print("The weights are: {}".format(weights))
### Calculate the score of each model
print(data)
score = np.sum(data * weights, axis=1)
print("The score of each model is: {}".format(score))

The weights are: [0.07578559 0.2191587  0.27137381 0.06559212 0.1051977  0.06559212
 0.06611572 0.06559212 0.06559212]
[[100.   90.  100.   84.   90.  100.  100.  100.  100. ]
 [100.  100.   78.6 100.   90.  100.  100.  100.  100. ]
 [ 75.  100.   85.7 100.   90.  100.  100.  100.  100. ]
 [100.  100.   78.6 100.   90.  100.   94.4 100.  100. ]
 [100.   90.  100.  100.  100.   90.  100.  100.   80. ]
 [100.  100.  100.  100.   90.  100.  100.   85.7 100. ]
 [100.  100.   78.6 100.   90.  100.   55.6 100.  100. ]
 [ 87.5 100.   85.7 100.  100.  100.  100.  100.  100. ]
 [100.  100.   92.9 100.   80.  100.  100.  100.  100. ]
 [100.   90.  100.  100.  100.  100.  100.  100.  100. ]
 [100.  100.   92.9 100.   90.  100.  100.  100.  100. ]]
The score of each model is: [95.7069621  93.14062354 93.17273781 92.77037549 95.84064938 98.01005572
 90.20508545 95.17203466 95.96929203 97.80841298 97.021269  ]


### 层次分析法：主观赋权