 ## 1. Setup
 Carrega bibliotecas e funções para cálculo das partições fuzzy, além de definir
 os parâmetros para a execução do algoritmo MVFCMddV.

 Também uma seed é fixada para facilitar análises posteriores, ela foi obtida
 executando:
 ```pyton
 import sys
 import random
 random.SystemRandom().randint(0, 2**32-1)
 ```

 ## 1. Setup
 Carrega bibliotecas e funções para cálculo das partições fuzzy, além de definir
 os parâmetros para a execução do algoritmo MVFCMddV.

 Também uma seed é fixada para facilitar análises posteriores, ela foi obtida
 executando:
 ```pyton
 import sys
 import random
 random.SystemRandom().randint(0, 2**32-1)
 ```

In [3]:
from mvfuzzy import MVFuzzy
import pandas as pd
import numpy as np
import copy
from pytictoc import TicToc
from sklearn import preprocessing
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics.cluster import adjusted_rand_score

RANDOM_SEED = 495924220
PARAM_K = 10
PARAM_m = 1.6
PARAM_T = 150
PARAM_e = 10**-10


 ## 2. Prepara os dados de entrada

In [4]:
# lê os dados
mfeat_fac = pd.read_csv(
    "mfeat/mfeat-fac", sep="\\s+", header=None, dtype=float)
mfeat_fou = pd.read_csv(
    "mfeat/mfeat-fou", sep="\\s+", header=None, dtype=float)
mfeat_kar = pd.read_csv(
    "mfeat/mfeat-kar", sep="\\s+", header=None, dtype=float)

# calcula partição à priori
apriori_partition = 1 + (np.array(range(0, 2000)) // 200)

# normaliza
scaler = preprocessing.MinMaxScaler()
norm_fac = scaler.fit_transform(mfeat_fac)
norm_fou = scaler.fit_transform(mfeat_fou)
norm_kar = scaler.fit_transform(mfeat_kar)

# calcula as matrizes de dissimilaridade
D = np.zeros((2000, 2000, 3))
D[:, :, 0] = euclidean_distances(norm_fac)
D[:, :, 1] = euclidean_distances(norm_fou)
D[:, :, 2] = euclidean_distances(norm_kar)


 ## 3. Execução do algoritmos
 1. Fixa seed inicial para prover repetibilidade
 2. Executa 100 vezes
 3. Guarda resultado para aquele com menor J (função objetivo)

 > ainda é possível que resultado varie caso, durante a execução, o numpy seja chamado em outro código (execução em paralelo), pelo que entendi do FAQ. Porém isso nunca ocorrerá em nosso cenário, logo a reprodutibilidade é garantida em nosso cenário.

In [5]:
t = TicToc()
best_result = MVFuzzy()
mvf = MVFuzzy()
best_iteration = 0
np.random.seed(RANDOM_SEED)
J_previous = float("Inf")
t.tic()
for i in range(0, 100):
    print("Current iteration:", i, end="\r", flush=True)
    mvf.run(D, PARAM_K, PARAM_m, PARAM_T, PARAM_e)
    if mvf.lastAdequacy < J_previous:
        J_previous = mvf.lastAdequacy
        best_result = copy.copy(mvf)
        best_iteration = i + 1
t.toc("Fuzzy algorithm 100x: ")


Current iteration: 0Current iteration: 1Current iteration: 2Current iteration: 3Current iteration: 4Current iteration: 5Current iteration: 6Current iteration: 7Current iteration: 8Current iteration: 9Current iteration: 10Current iteration: 11Current iteration: 12Current iteration: 13Current iteration: 14Current iteration: 15Current iteration: 16Current iteration: 17Current iteration: 18Current iteration: 19Current iteration: 20Current iteration: 21Current iteration: 22Current iteration: 23Current iteration: 24Current iteration: 25Current iteration: 26Current iteration: 27Current iteration: 28Current iteration: 29Current iteration: 30Current iteration: 31Current iteration: 32Current iteration: 33Current iteration: 34Current iteration: 35Current iteration: 36Current iteration: 37Current iteration: 38Current iteration: 39Current iteration: 40Current iteration: 41Current iteration: 42Current iteration: 43Current iteration: 44Current iteration: 4

In [11]:
np.save("fuzzy_bestMedoids", best_result.bestMedoidVectors)
np.save("fuzzy_bestMembership", best_result.bestMembershipVectors)
np.save("fuzzy_bestWeights", best_result.bestWeightVectors)


 ## 4. Resultados do Particionamento com MVFCMddV

In [12]:
crisp_mvf_partition = best_result.toCrispPartition()
rand_score = adjusted_rand_score(apriori_partition, crisp_mvf_partition)
final_medoids_vector = best_result.bestMedoidVectors


In [18]:
print("Adjusted Rand Score:", rand_score)
print("Best iteration (from 100):", best_iteration)


Adjusted Rand Score: 0.3079554488321771
Best iteration (from 100): 79


In [19]:
print(final_medoids_vector)


[[1689  392   25]
 [1689  392   25]
 [1689  392   25]
 [ 749  392 1890]
 [ 695  392  654]
 [ 695  392  654]
 [1722  392   25]
 [ 436  525 1494]
 [1689  392   25]
 [1722  392   25]]


In [20]:
partition_byCluster = [[] for x in range(0, PARAM_K)]
n_elems = crisp_mvf_partition.shape[0]
for i in range(0, n_elems):
    k_cluster = crisp_mvf_partition[i]
    partition_byCluster[k_cluster - 1].append(i)

for k in range(0, PARAM_K):
    cur_list = partition_byCluster[k]
    print("Cluster {} ({} elements):\n{}".format(k+1, len(cur_list), cur_list))
    print("-----------")


Cluster 1 (0 elements):
[]
-----------
Cluster 2 (0 elements):
[]
-----------
Cluster 3 (89 elements):
[251, 264, 290, 331, 521, 776, 810, 827, 836, 855, 863, 869, 878, 879, 944, 982, 988, 1023, 1026, 1052, 1078, 1084, 1200, 1206, 1207, 1211, 1217, 1218, 1235, 1239, 1240, 1243, 1246, 1249, 1256, 1257, 1275, 1276, 1279, 1284, 1293, 1295, 1296, 1297, 1305, 1306, 1309, 1310, 1313, 1314, 1315, 1317, 1322, 1323, 1327, 1329, 1334, 1338, 1339, 1344, 1348, 1349, 1351, 1357, 1359, 1362, 1364, 1367, 1369, 1370, 1372, 1373, 1384, 1391, 1395, 1397, 1398, 1606, 1624, 1680, 1702, 1705, 1727, 1736, 1761, 1768, 1793, 1838, 1885]
-----------
Cluster 4 (398 elements):
[201, 202, 203, 204, 205, 211, 214, 215, 217, 219, 224, 227, 229, 230, 233, 234, 235, 236, 237, 244, 247, 252, 253, 256, 257, 258, 260, 261, 263, 267, 268, 274, 275, 278, 279, 281, 282, 283, 284, 285, 287, 289, 293, 299, 300, 304, 308, 309, 311, 315, 316, 318, 319, 324, 325, 327, 328, 329, 333, 334, 335, 336, 337, 338, 340, 342, 346, 348, 

In [24]:
pd.DataFrame(crisp_mvf_partition).to_csv("fuzzy_crisp_partition.csv", index=False,header=False)
