## Results

This document provides the results of the manuscript "Preference disaggregation on TOPSIS for sorting applied to an economic freedom assessment". 


The data is organized in lists:

- Decision_Matrix: Heritage evaluations over the 12 criteria for 180 countries in 4 different years
- Original_Classification: Heritage classifications for the different years
- Total_References: reference alternatives used for each simulation
- Total_P: profiles inferred and used in each simulation
- Total_w: weights inferred and used in each simulation
  
The indexes are:
    
- n_methods=2 # Number of methods   
    - [0] for POTOPSIS-Sort-B
    - [1] for PDTOPSIS-Sort-C
    
- n_refcomb=5 # Number of sizes of reference combinations
    - [0] -> 1 reference per class
    - [1] -> 2 references per class
    - [2] -> 3 references per class
    - [3] -> 4 references per class
    - [4] -> 5 references per class

- n_years=4 # Number of years
    - [0] -> 2017
    - [1] -> 2018
    - [2] -> 2019
    - [3] -> 2020

- n_sim=100 #Number of simulations
    - The index varies between [0] and [99]


The lists have been created using the index as below:

- Decision_Matrix = [[] for c in range (n_years)]
- Original_Classification = [[] for c in range (n_years)]
- Total_References=[[[[] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]
- Total_P = [[[[[] for d in range (n_methods)] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]
- Total_w = [[[[[] for d in range (n_methods)] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]

To acess a specific table:

- Decision_Matrix[year_index]    
- Original_Classification[year_index]
- Total_Classification[simulation_index][refcomb_index][year_index][method_index]
- Total_References[simulation_index][refcomb_index][year_index]
- Total_P[simulation_index][refcomb_index][year_index][method_index]
- Total_w[simulation_index][refcomb_index][year_index][method_index]

### Extracting data

The data is provided as a spydata file and can be read with the cell below. 

In [1]:
import pickle
import tarfile
filename = 'data_manuscript.spydata'
tar = tarfile.open(filename, "r")
tar.extractall()
extracted_files = tar.getnames()
for f in extracted_files:
    if f.endswith('.pickle'):
         with open(f, 'rb') as fdesc:
             data = pickle.loads(fdesc.read())
locals().update(data)

### Accessing a specific result

Lets acess the profiles obtained by PDTOPSI-Sort-C in a specific simulation.

- [81] -> simulation number 80
- [2] -> 3 references per class
- [3] -> year 2019 
- [1] -> PDTOPSIS-Sorc-C

In [2]:
Total_P[81][2][3][1]

array([[88.73327898, 81.13325172, 88.16661029, 88.83328673, 82.88661199,
        95.52217698, 87.76660885, 83.59992724, 87.75828196, 89.86660999,
        84.99993226, 79.99991432],
       [80.66659486, 66.46656498, 60.56655256, 83.83328673, 77.88661199,
        90.52217698, 79.43325893, 72.16659648, 82.75828196, 84.86660999,
        78.33325661, 59.9998847 ],
       [63.5332868 , 46.96662145, 43.06661995, 78.83328673, 72.88661199,
        85.52217698, 74.23329299, 67.16659648, 77.75828196, 79.79996288,
        58.33328379, 42.499953  ],
       [55.13330188, 41.96662145, 38.06661995, 73.83328673, 67.88661199,
        72.18331892, 60.03330265, 56.29996873, 72.75828196, 68.73331823,
        39.99996617, 37.499953  ],
       [30.3       , 28.63333333, 18.76666667, 68.83328673, 62.88661199,
        67.18331892, 43.76666667, 30.43333333, 60.83333333, 63.73331823,
        33.33333333, 16.66666667]])

The 15 references used in this case were:

In [3]:
Total_References[81][2][3]

Unnamed: 0_level_0,Property Rights,Judicial Effectiveness,Government Integrity,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Labor Freedom,Monetary Freedom,Trade Freedom,Investment Freedom,Financial Freedom,Category
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Ireland,86.6,64.4,82.8,76.4,78.8,91.4,82.7,75.9,85.3,86.4,90.0,70.0,1
Australia,82.8,86.1,89.3,63.0,61.6,91.8,87.8,84.0,86.2,88.2,80.0,90.0,1
Singapore,96.8,92.9,92.4,90.3,91.1,80.0,92.8,90.9,85.6,94.8,85.0,80.0,1
Malaysia,86.5,74.6,49.4,85.7,84.6,80.2,87.8,74.5,81.6,82.0,60.0,50.0,2
Estonia,83.2,73.7,85.9,81.1,53.3,99.9,73.5,57.3,78.6,86.4,90.0,70.0,2
Latvia,72.3,51.1,46.4,76.9,58.3,96.5,77.0,72.4,80.2,86.4,85.0,60.0,2
Belarus,63.2,48.4,37.4,88.8,54.1,95.4,76.4,74.8,69.8,82.0,30.0,20.0,3
Kosovo,66.3,54.1,39.2,92.6,76.5,94.0,75.0,61.0,78.4,76.2,65.0,30.0,3
Colombia,61.1,32.8,46.1,70.4,77.0,85.5,71.3,78.0,77.5,81.2,80.0,70.0,3
Solomon Islands,54.6,53.7,33.5,65.6,34.2,76.6,67.7,72.0,84.3,48.0,15.0,30.0,4


## Classification

The function below apllies TOPSIS-Sort-B or TOPSIS-Sort-B given a set of alternatives and a set of Profiles.

In [5]:
def TOPSIS_Sort_np (A, P, D, w, direction, norm_type, B_or_C):
    '''
    This function calculates TOPSIS-Sort-B or TOPSIS-Sort-C results.
    
    inputs:
        # A (m,n) -> Decision matrix of m alternatives and evaluations over n criteria
        # P (p, n) -> Profiles matrix
        # D (2,n) -> Domain alternatives
        # w -> vector of weigths
        # direction -> vector of criteria direction (1 indicates a benefit criterion)
        # norm_type -> Max or MaxMin
        # B_or_C -> 'B' for TOPSIS-Sort-B and 'C' for TOPSIS-Sort-C
    '''
    
    M= np.concatenate((A, P, D), axis=0)
    if norm_type == "Max":
        M_normalized = M / np.max(M, axis=0)
    if norm_type == "MaxMin":
        M_normalized = (M - np.min(M, axis=0)) / (np.max(M, axis=0) - np.min(M, axis=0))    
    V = M_normalized * w
    Ideal = np.array([np.max(V[:,j]) if direction[j] == 1 else np.min(V[:,j]) for j in range(V.shape[1])])
    Anti_Ideal = np.array([np.min(V[:,j]) if direction[j] == 1 else np.max(V[:,j]) for j in range(V.shape[1])])
    pos_dist = np.apply_along_axis(np.linalg.norm, 1, (V - Ideal))
    neg_dist = np.apply_along_axis(np.linalg.norm, 1, (V - Anti_Ideal))
    cl = neg_dist/(pos_dist + neg_dist)
    cl_A = cl[:A.shape[0]]
    cl_P = cl[A.shape[0]:A.shape[0]+P.shape[0]]
    if B_or_C == 'B':
        result =[]
        for i in range (len (cl_A)):
            serie = pd.Series([cl_A[i]>=cl_P[k] for k in range (len (cl_P))])
            result.append (serie.argmax() + 1 if serie.any() else len(cl_P) + 1)
    else:
        result =[]
        for i in range (len (cl_A)):
            serie = pd.Series([cl_A[i]-cl_P[k] for k in range (len (cl_P))])
            result.append (serie.abs().argmin() + 1)
    
    return result


Lets obtain the classifications for our instances.

In [9]:
import numpy as np
import pandas as pd

# Set some information about the MCDA model
n_sim = 100 # we have results for 100 simulations
n_refcomb = 5 # 5 numbers of references per class were used
n_years= 4 # 4 years were considered
n_methods = 2 # two methods (PDTOPSIS-Sort-B and PDTOPSIS-Sort-C)
m = 180 # 180 alternatives
n = 12 # 12 criteria
direction = [1 for i in range (n)] # all criteria are benefit criteria
D=np.array([[100 for i in range (12)],\
            [0 for i in range (12)]]) # Domain dummy alternatives
    
A = [np.array(Decision_Matrix[i]) for i in range (n_years)]

Total_Classification = [[[[[] for d in range (n_methods)] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]

for sim in range (n_sim):  
    for comb in range (n_refcomb):
        for an in range (n_years):
            for met in range (n_methods):
                if met == 0:
                    Total_Classification[sim][comb][an][met]= TOPSIS_Sort_np (A[an], Total_P[sim][comb][an][met], D, Total_w[sim][comb][an][met], direction, 'Max', 'B')
                else:
                    Total_Classification[sim][comb][an][met]= TOPSIS_Sort_np (A[an], Total_P[sim][comb][an][met], D, Total_w[sim][comb][an][met], direction, 'Max', 'C')                    
                #print(f"sim = {str(sim+1)}, n_refs = {str(comb+1)}, year = {str(2017+an)}, method = {str(met)}")


In [10]:
# Accessing a specific classification

Total_Classification[81][2][3][1]

[4,
 3,
 5,
 4,
 4,
 3,
 1,
 2,
 3,
 4,
 4,
 4,
 3,
 3,
 4,
 4,
 3,
 5,
 3,
 3,
 4,
 3,
 3,
 4,
 4,
 5,
 3,
 4,
 4,
 2,
 5,
 5,
 2,
 4,
 3,
 4,
 3,
 4,
 3,
 5,
 3,
 2,
 5,
 2,
 4,
 4,
 4,
 4,
 4,
 4,
 5,
 5,
 2,
 4,
 4,
 3,
 2,
 3,
 4,
 2,
 2,
 4,
 4,
 3,
 4,
 4,
 4,
 4,
 4,
 1,
 3,
 2,
 4,
 3,
 5,
 1,
 2,
 3,
 3,
 2,
 3,
 3,
 4,
 5,
 3,
 3,
 3,
 4,
 2,
 5,
 4,
 5,
 2,
 2,
 3,
 4,
 5,
 2,
 4,
 4,
 3,
 4,
 2,
 3,
 4,
 4,
 4,
 4,
 4,
 5,
 4,
 4,
 2,
 1,
 4,
 4,
 4,
 5,
 3,
 2,
 4,
 4,
 3,
 4,
 4,
 3,
 3,
 3,
 3,
 2,
 5,
 3,
 4,
 3,
 3,
 3,
 3,
 4,
 4,
 4,
 3,
 3,
 5,
 1,
 3,
 3,
 4,
 4,
 2,
 3,
 4,
 5,
 5,
 2,
 2,
 2,
 5,
 4,
 3,
 3,
 4,
 5,
 4,
 4,
 4,
 4,
 3,
 5,
 4,
 4,
 2,
 2,
 2,
 3,
 4,
 4,
 5,
 4,
 4,
 5]

One can run the code below can be used to access the similarities between the methods and the foundation classification.

- Similarity_B presents the similarities for PDTOPSIS-Sort-B
- Similarity_C presents the similarities for PDTOPSIS-Sort-C

The similarities without considering the alternatives used as references can be accessed with

- Similarity_B_NoRefs 
- Similarity_C_NoRefs

In [11]:
Total_Percentage = [[[[[] for d in range (n_methods)] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]
Total_Percentage_NoRefs=[[[[[] for d in range (n_methods)] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]
Total_Percentage_Refs = [[[[[] for d in range (n_methods)] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]

for sim in range(n_sim):
    for comb in range(n_refcomb):
        for an in range (n_years):
            for met in range (n_methods):
                Total_Classification[sim][comb][an][met] = pd.Series(Total_Classification[sim][comb][an][met], index = Original_Classification[an].index, name="Class")
                Total_Percentage[sim][comb][an][met]=np.sum(Total_Classification[sim][comb][an][met]==Original_Classification[an])/len(Total_Classification[sim][comb][an][met])
                Total_Percentage_NoRefs[sim][comb][an][met]=np.sum(Total_Classification[sim][comb][an][met].drop(index=Total_References[sim][comb][an].index)==Original_Classification[an].drop(index=Total_References[sim][comb][an].index))/len(Total_Classification[sim][comb][an][met].drop(index=Total_References[sim][comb][an].index))
                Total_Percentage_Refs[sim][comb][an][met]=np.sum(Total_Classification[sim][comb][an][met].filter(items=Total_References[sim][comb][an].index, axis= 'index')==Original_Classification[an].filter(items=Total_References[sim][comb][an].index, axis= 'index'))/len(Total_Classification[sim][comb][an][met].filter(items=Total_References[sim][comb][an].index, axis= 'index'))

New_Sim_B = np.zeros((4*n_years,n_refcomb)) 
New_Sim_C = np.zeros((4*n_years,n_refcomb))
New_Sim_B_NoRefs = np.zeros((4*n_years,n_refcomb)) 
New_Sim_C_NoRefs = np.zeros((4*n_years,n_refcomb))

for i in range (n_years):
    for j in range (n_refcomb):
        Bvector=np.zeros(n_sim)
        Cvector=np.zeros(n_sim)
        Bvector_NoRefs=np.zeros(n_sim)
        Cvector_NoRefs=np.zeros(n_sim)
               
        for k in range (n_sim):
            Bvector[k]=Total_Percentage[k][j][i][0]
            Cvector[k]=Total_Percentage[k][j][i][1]
            Bvector_NoRefs[k]=Total_Percentage_NoRefs[k][j][i][0]
            Cvector_NoRefs[k]=Total_Percentage_NoRefs[k][j][i][1]

        New_Sim_B[i*4,j]=np.mean(Bvector)
        New_Sim_B[i*4+1,j]=np.std(Bvector)
        New_Sim_B[i*4+2,j]=np.max(Bvector)
        New_Sim_B[i*4+3,j]=np.min(Bvector)
        New_Sim_C[i*4,j]=np.mean(Cvector)
        New_Sim_C[i*4+1,j]=np.std(Cvector)
        New_Sim_C[i*4+2,j]=np.max(Cvector)
        New_Sim_C[i*4+3,j]=np.min(Cvector)
            
        New_Sim_B_NoRefs[i*4,j]=np.mean(Bvector_NoRefs)
        New_Sim_B_NoRefs[i*4+1,j]=np.std(Bvector_NoRefs)
        New_Sim_B_NoRefs[i*4+2,j]=np.max(Bvector_NoRefs)
        New_Sim_B_NoRefs[i*4+3,j]=np.min(Bvector_NoRefs)
        New_Sim_C_NoRefs[i*4,j]=np.mean(Cvector_NoRefs)
        New_Sim_C_NoRefs[i*4+1,j]=np.std(Cvector_NoRefs)
        New_Sim_C_NoRefs[i*4+2,j]=np.max(Cvector_NoRefs)
        New_Sim_C_NoRefs[i*4+3,j]=np.min(Cvector_NoRefs)       
        
# Organize the dataframes to get the results          
Similarity_B=pd.DataFrame(New_Sim_B,columns=['1','2','3','4','5'], index=['mean17','std17','max17','min17','mean18','std18','max18','min18','mean19','std19','max19','min19','mean20','std20','max20','min20'])
Similarity_C=pd.DataFrame(New_Sim_C,columns=['1','2','3','4','5'], index=['mean17','std17','max17','min17','mean18','std18','max18','min18','mean19','std19','max19','min19','mean20','std20','max20','min20'])

Similarity_B_NoRefs=pd.DataFrame(New_Sim_B_NoRefs,columns=['1','2','3','4','5'], index=['mean17','std17','max17','min17','mean18','std18','max18','min18','mean19','std19','max19','min19','mean20','std20','max20','min20'])
Similarity_C_NoRefs=pd.DataFrame(New_Sim_C_NoRefs,columns=['1','2','3','4','5'], index=['mean17','std17','max17','min17','mean18','std18','max18','min18','mean19','std19','max19','min19','mean20','std20','max20','min20'])


In [15]:
# The columns indicate the number of references per class.
Similarity_C_NoRefs

Unnamed: 0,1,2,3,4,5
mean17,0.817257,0.849294,0.839394,0.858188,0.85071
std17,0.092207,0.072684,0.066818,0.072099,0.063698
max17,0.948571,0.952941,0.963636,0.9625,0.967742
min17,0.474286,0.652941,0.636364,0.51875,0.658065
mean18,0.833714,0.844471,0.826182,0.830375,0.838516
std18,0.090291,0.080643,0.079322,0.065586,0.059624
max18,0.988571,0.970588,0.975758,0.95625,0.948387
min18,0.497143,0.523529,0.624242,0.64375,0.709677
mean19,0.808686,0.829765,0.833152,0.822688,0.826516
std19,0.098071,0.086159,0.081133,0.073056,0.06554


To access the mean confusion matrices, we used the code below.


In [16]:
Data_ConfusionMatrix = [[[[pd.concat([Original_Classification[c].rename("Original").astype(int), pd.Series(Total_Classification[a][b][c][d], name="Predicted")],axis=1) for d in range (n_methods)] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]
Confusion_Matrices = [[[[pd.crosstab(Data_ConfusionMatrix[a][b][c][d]["Original"], Data_ConfusionMatrix[a][b][c][d]["Predicted"], rownames=['Original'], colnames=['Predicted']) for d in range (n_methods)] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]
Mean_Confusion_Matrix = [[[[] for d in range(n_methods)] for c in range (n_years)] for b in range(n_refcomb)]

Data_ConfusionMatrix_NoRefs = [[[[pd.concat([Original_Classification[c].drop(index=Total_References[a][b][c].index).rename("Original").astype(int), pd.Series(Total_Classification[a][b][c][d].drop(index=Total_References[a][b][c].index), name="Predicted")],axis=1) for d in range (n_methods)] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]
Confusion_Matrices_NoRefs = [[[[pd.crosstab(Data_ConfusionMatrix_NoRefs[a][b][c][d]["Original"], Data_ConfusionMatrix_NoRefs[a][b][c][d]["Predicted"], rownames=['Original'], colnames=['Predicted']) for d in range (n_methods)] for c in range (n_years)] for b in range (n_refcomb)] for a in range (n_sim)]
Mean_Confusion_Matrix_NoRefs = [[[[] for d in range(n_methods)] for c in range (n_years)] for b in range(n_refcomb)]

for d in range (n_methods):
    for c in range(n_years):
        for b in range (n_refcomb):
            concatenated_cm = Confusion_Matrices[0][b][c][d]
            concatenated_cm_NoRefs = Confusion_Matrices_NoRefs[0][b][c][d]
            for a in range (1,n_sim):
                concatenated_cm = pd.concat((concatenated_cm, Confusion_Matrices[a][b][c][d]))
                concatenated_cm_NoRefs = pd.concat((concatenated_cm_NoRefs, Confusion_Matrices_NoRefs[a][b][c][d]))
           
            Mean_Confusion_Matrix[b][c][d] = concatenated_cm.groupby(concatenated_cm.index).mean()
            Mean_Confusion_Matrix_NoRefs[b][c][d] = concatenated_cm_NoRefs.groupby(concatenated_cm_NoRefs.index).mean()

In [17]:
# Mean_Confusion_Matrix_NoRefs[comb][year][method]

Mean_Confusion_Matrix_NoRefs[3][3][1]

Predicted,1,2,3,4,5
Original,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1.51087,0.61,0.0,0.0,0.0
2,0.391304,20.6,6.04,0.0,0.0
3,0.0,2.08,48.72,7.19,0.01
4,0.0,0.0,4.72,52.1,1.18
5,0.0,0.0,0.0,5.4,9.6
