# Evaluation

- dataset:Ering (full dataset)
- clustering: incremental k-means
- k \in [2,10]
- max_iter: 20
- frameworks: MOSCAT, EvolClustering, Baseline

The authors of EvolClustering suggest to set $w=0.1$ in their solution, so we did this. In case of MOSCAT, we utilized TOPSIS to select the weight, because it is a simple non-parametric method.
We defined the baseline as clustering without temporal context.

## Full Set of Solutions

In [1]:
import pandas as pd
import os
import math

evaluation_df=pd.read_csv(os.getcwd()+'/stats/evaluation_result.csv')
pd.set_option("display.max_rows", None)
evaluation_df

Unnamed: 0,method,k,w,av_sq,std_sq,av_tq,std_tq,av_total_score,av_purity,std_purity
0,baseline,2,0.0,0.801837,0.015935,0.970834,0.122834,0.886336,0.324769,0.012513
1,baseline,3,0.0,0.830962,0.015885,0.962887,0.123003,0.896924,0.447692,0.039401
2,baseline,4,0.0,0.846257,0.015803,0.963235,0.122646,0.904746,0.535077,0.062531
3,baseline,5,0.0,0.854566,0.017576,0.957133,0.121541,0.90585,0.524256,0.068737
4,baseline,6,0.0,0.864917,0.015733,0.955223,0.121381,0.91007,0.569128,0.070778
5,baseline,7,0.0,0.870752,0.014868,0.952162,0.120982,0.911457,0.585795,0.086227
6,baseline,8,0.0,0.876163,0.0148,0.949886,0.12057,0.913025,0.598186,0.075584
7,baseline,9,0.0,0.877081,0.01557,0.949427,0.120545,0.913254,0.602197,0.077878
8,baseline,10,0.0,0.88124,0.014582,0.947492,0.120209,0.914366,0.610524,0.073817
9,moscat,2,0.0,0.801837,0.015935,0.970834,0.122834,0.886336,0.324769,0.012513


## Solution Selection by TOPSIS

In [2]:
def calc_topsis(sq,tq):
    dist_to_max=math.sqrt((sq-1)**2+(tq-1)**2)
    dist_to_min=math.sqrt((sq-0)**2+(tq-0)**2)
    oq=dist_to_min/(dist_to_min+dist_to_max)
    return oq

moscat_results=evaluation_df.query('method=="moscat"').copy()

moscat_results['oq']=moscat_results.apply(lambda row: calc_topsis(row['av_sq'],row['av_tq']),axis=1)
moscat_results
topsis_max_idx=moscat_results.groupby('k')['oq'].idxmax()
topsis_moscat_result=moscat_results.loc[topsis_max_idx]
topsis_moscat_result
evol_result=evaluation_df.query('method=="evol" & w==0.1').copy()
baseline=evaluation_df.query('method=="baseline"').copy()
total_result=pd.concat([topsis_moscat_result,evol_result,baseline], ignore_index=True)
total_result.pivot(index='k',columns='method',values=['w','av_purity','std_purity'])
#moscat_results


Unnamed: 0_level_0,w,w,w,av_purity,av_purity,av_purity,std_purity,std_purity,std_purity
method,baseline,evol,moscat,baseline,evol,moscat,baseline,evol,moscat
k,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
2,0.0,0.1,0.0,0.324769,0.323692,0.324769,0.012513,0.01264,0.012513
3,0.0,0.1,0.0,0.447692,0.444872,0.447692,0.039401,0.045135,0.039401
4,0.0,0.1,0.8,0.535077,0.529692,0.536103,0.062531,0.069326,0.06098
5,0.0,0.1,0.8,0.524256,0.536615,0.526308,0.068737,0.06851,0.066168
6,0.0,0.1,0.8,0.569128,0.573026,0.566718,0.070778,0.076854,0.073561
7,0.0,0.1,0.8,0.585795,0.586615,0.589487,0.086227,0.087525,0.085105
8,0.0,0.1,0.7,0.598186,0.582032,0.599571,0.075584,0.076893,0.079256
9,0.0,0.1,0.7,0.602197,0.591725,0.60389,0.077878,0.079119,0.081597
10,0.0,0.1,0.8,0.610524,0.602151,0.61164,0.073817,0.075691,0.077506
