# 聚类效果评估

## 评估方法简介

| 序号 | 名称 | 值域 | 描述 |
| --- | :---: | :---: | :---: |
| 1.  | ARI (Adjust Rand Index) | $ ARI \in [-1, 1] $ | 调整兰德系数，度量两种划分方法一致性，一致性越高值越大，该度量对称 |
| 2.  | AMI (Adjust Mutual Information Score | $ AMI \in [0, 1] $| 调整互信息，度量两种划分分布的一致性，一致性越高值越大，该度量对称|
| 3.  | Homogeneity [1] | $ H(labels\_true, labels\_pred) \in [-1, 1] $ | 同质系数，该度量非对称 |
| 4.  | Completeness [1] |  | 完整系数，该度量非对称 |
| 5.  | V-measure [1] |  | homogeneity与completeness调和平均数 |
| 6.  | Silhouette Coefficient| [-1, 1] | 轮廓系数，取值越高，则同类样本越近，不同类样本越远 |
| 7.  | Calinski-Harabaz Index | | CH指标通过计算类中各点与类中心的距离平方和来度量类内的紧密度，通过计算各类中心点与数据集中心点距离平方和来度量数据集的分离度，CH指标由分离度与紧密度的比值得到。从而，CH越大代表着类自身越紧密，类与类之间越分散，即更优的聚类结果。|

## Reference

+ [[1] V-Measure: A conditional entropy-based external cluster evaluation
measure](http://www.aclweb.org/anthology/D07-1043)
+ [[2] 简书：聚类算法评估](https://www.jianshu.com/p/b9528df2f57a)
+ [[3] WIKI: Rand index](https://en.wikipedia.org/wiki/Rand_index)
+ [[4] 六大分群质量评估](https://blog.csdn.net/sinat_26917383/article/details/70577710)
+ [[5] 互信息](https://blog.csdn.net/pipisorry/article/details/51695283)

In [1]:
import matplotlib.pyplot as plt
from matplotlib import cm
import numpy as np
import pandas as pd
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import seaborn as sns

from sklearn.manifold import TSNE
from sklearn.decomposition import KernelPCA, PCA, TruncatedSVD, RandomizedPCA
from sklearn.metrics import pairwise

from cluster.models import KMeans, MiniBatchKMeans, SpectralClustering, AffinityPropagation
from cluster.dataset import load_time_series
from cluster.visual import plot_cluster_sequence, plot_cluster_dim_reduction
from cluster import evaluate

In [2]:
data = load_time_series(1)
data = data.groupby('datetime')['pwr'].sum()
data = pd.DataFrame(data.values.reshape(-1, 48), index=np.unique(pd.to_datetime(data.index).date.astype(str)))