# Reproducing MIST for Short Text Clustering

## Introduction
The paper [MIST: Mutual Information Maximization for Short Text Clustering](https://aclanthology.org/2024.acl-long.610/) introduces a method to fine-tune a pre-trained BERT model by jointly optimizing a clustering loss and two representation learning losses. The authors have shared their source code on [GitHub](https://github.com/c4n/clustering_mist/).

The task is to visit the GitHub repository and compile a Colab notebook to replicate the experiments using the **SearchSnippets** dataset. If the dataset is too large for your Colab account, you may use the **AgNews** dataset or a subset of the original dataset.

**Note:** You are **not required to write your own code from scratch** for this assignment. Instead, focus on understanding and organizing the provided code. The main challenge is to make the provided code functional. This includes:
- Fixing any errors or dependencies required to run the code.
- Organizing the unstructured code into a clear and runnable sequence in the Colab notebook.
- Adapting the code to work with the chosen dataset (e.g., SearchSnippets, AgNews, or a subset).

Once you successfully run the code, present the results in the Colab notebook, including:
- ACC and NMI on the target dataset, as reported by the authors in Table 1.

In [2]:
import argparse

from sentence_transformers import SentenceTransformer
from sentence_transformers_local import models, SentenceTransformerSequential
from models.Transformers import SCCLBert
from learners.cluster import ClusterLearner

from dataloader.dataloader import augment_loader, augment_loader_split
from training import training

from utils.kmeans import get_kmeans_centers
from utils.randomness import set_global_random_seed
import torch

MODEL_CLASS = {
    "distil": 'distilbert-base-nli-stsb-mean-tokens',
    "robertabase": 'roberta-base-nli-stsb-mean-tokens',
    "robertalarge": 'roberta-large-nli-stsb-mean-tokens',
    "msmarco": 'distilroberta-base-msmarco-v2',
    "xlm": "xlm-r-distilroberta-base-paraphrase-v1",
    "bertlarge": 'bert-large-nli-stsb-mean-tokens',
    "bertbase": 'bert-base-nli-stsb-mean-tokens',
    "paraphrase": "paraphrase-mpnet-base-v2",
    "paraphrase-distil": "paraphrase-distilroberta-base-v2",
    "paraphrase-Tiny" : "paraphrase-TinyBERT-L6-v2"
}

parser = argparse.ArgumentParser()
# parser.add_argument('--gpuid', nargs="+", type=int, default=[0], help="The list of gpuid, ex:--gpuid 3 1. Negative value means cpu-only")
parser.add_argument('--seed', type=int, default=0, help="")
parser.add_argument('--print_freq', type=float, default=100, help="")  
parser.add_argument('--result_path', type=str, default='./results/')

parser.add_argument('--bert', type=str, default='paraphrase', help="")
#parser.add_argument('--bert', type=str, default='distil', help="")

parser.add_argument('--bert_model', type=str, default='bert-base-uncased', help="")
parser.add_argument('--note', type=str, default='_search_snippets_distil_lre-4_JSD', help="")

# Dataset
# stackoverflow/stackoverflow_true_text
parser.add_argument('--dataset', type=str, default='search_snippets', help="")
#parser.add_argument('--dataset', type=str, default='stackoverflow', help="")
# parser.add_argument('--data_path', type=str, default='./datasets/stackoverflow/')
parser.add_argument('--max_length', type=int, default=32)
parser.add_argument('--train_val_ratio', type=float, default= [0.9, 0.1])

# Data for train and test
# ###### AgNews
# parser.add_argument('--data_path', type=str, default='./datasets/')
# parser.add_argument('--dataname', type=str, default='agnewsdataraw-8000', help="")
# parser.add_argument('--dataname_val', type=str, default='agnewsdataraw-8000', help="")
# parser.add_argument('--num_classes', type=int, default=4, help="")
# ####### SearchSnippets
parser.add_argument('--data_path', type=str, default='./datasets/augmented/contextual_20_2col_bert/')
## parser.add_argument('--dataname', type=str, default='train_search_snippets.csv', help="")
## parser.add_argument('--dataname_val', type=str, default='test_search_snippets.csv', help="")
parser.add_argument('--dataname', type=str, default='search_snippets', help="")
parser.add_argument('--dataname_val', type=str, default='search_snippets', help="")
parser.add_argument('--num_classes', type=int, default=8, help="")
# # ###### StackOverFlow
# parser.add_argument('--data_path', type=str, default='./datasets/stackoverflow/')
# parser.add_argument('--dataname', type=str, default='stackoverflow', help="")
# parser.add_argument('--dataname_val', type=str, default='stackoverflow_', help="")
# parser.add_argument('--num_classes', type=int, default=20, help="")
# ###### Biomedical
# # parser.add_argument('--data_path', type=str, default='./datasets/biomedical/')
# parser.add_argument('--dataname', type=str, default='biomedical', help="")
# parser.add_argument('--dataname_val', type=str, default='biomedical', help="")
# parser.add_argument('--num_classes', type=int, default=20, help="")
# ######## Tweet
# parser.add_argument('--data_path', type=str, default='./datasets/')
# parser.add_argument('--dataname', type=str, default='tweet_remap_label', help="")
# parser.add_argument('--dataname_val', type=str, default='tweet_remap_label', help="")
# parser.add_argument('--num_classes', type=int, default=89, help="")
# ######## GoogleNewsTS
# parser.add_argument('--data_path', type=str, default='./datasets/')
# parser.add_argument('--dataname', type=str, default='TS', help="")
# parser.add_argument('--dataname_val', type=str, default='TS', help="")
# parser.add_argument('--num_classes', type=int, default=152, help="")
# ######## GoogleNewsT
# parser.add_argument('--data_path', type=str, default='./datasets/')
# parser.add_argument('--dataname', type=str, default='T', help="")
# parser.add_argument('--dataname_val', type=str, default='T', help="")
# parser.add_argument('--num_classes', type=int, default=152, help="")
# ######## GoogleNewsS
# parser.add_argument('--data_path', type=str, default='./datasets/')
# parser.add_argument('--dataname', type=str, default='S', help="")
# parser.add_argument('--dataname_val', type=str, default='S', help="")
# parser.add_argument('--num_classes', type=int, default=152, help="")

# Learning parameters
parser.add_argument('--lr', type=float, default=1e-6, help="") #learning rate
parser.add_argument('--lr_scale', type=int, default=100, help="")
parser.add_argument('--max_iter', type=int, default=3000)
parser.add_argument('--batch_size', type=int, default=256) #batch size

# CNN Setting
#parser.add_argument('--out_channels', type=int, default=768)
#parser.add_argument('--use_cnn', type=str, default='cnn_1')
#parser.add_argument('--use_cnn', type=str, default='cnn_3')
#parser.add_argument('--use_cnn', type=str, default='cnn_5')
#parser.add_argument('--use_cnn', type=str, default='cnn_7')
#parser.add_argument('--use_cnn', type=str, default='cnn_cat')
#parser.add_argument('--use_cnn', type=str, default='cnn_avg')

# Contrastive learning
parser.add_argument('--use_head', type=bool, default=False)
parser.add_argument('--use_normalize', type=bool, default=False)

parser.add_argument('--weighted_local', type=bool, default=False, help="")
#parser.add_argument('--normalize_method', type=str, default='inverse_prob', help="")
parser.add_argument('--normalize_method', type=str, default='none', help="")

parser.add_argument('--contrastive_local_scale', type=float, default=0.002)  #  unused!!!
parser.add_argument('--contrastive_global_scale', type=float, default=0.008) #  unused!!!
parser.add_argument('--temperature', type=float, default=0.5, help="temperature required by contrastive loss")
parser.add_argument('--base_temperature', type=float, default=0.1, help="temperature required by contrastive loss")

# Clustering
parser.add_argument('--clustering_scale', type=float, default=0.02) #scale of clustering loss
parser.add_argument('--use_perturbation', action='store_true', help="")
parser.add_argument('--alpha', type=float, default=1)

args = parser.parse_args(args=[])
# args.use_gpu = args.gpuid[0] >= 0
args.resPath = None
args.tensorboard = None

  from tqdm.autonotebook import tqdm, trange


In [3]:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"   # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]='4'

# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print(torch.cuda.device_count())

#Additional Info when using cuda
if device.type == 'cuda':
    print(torch.cuda.get_device_name(0))
    print('Memory Usage:')
    print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
    print('Cached:   ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')

Using device: cuda
1
NVIDIA GeForce RTX 3090
Memory Usage:
Allocated: 0.0 GB
Cached:    0.0 GB


In [4]:
import timeit
from datetime import datetime,timezone
start = timeit.default_timer()
now_utc = datetime.now(timezone.utc)
print('Time UTC:', now_utc)

# resPath, tensorboard = setup_path(args)
# args.resPath, args.tensorboard = resPath, tensorboard
set_global_random_seed(args.seed)

# Dataset loader
train_loader = augment_loader(args)

# torch.cuda.set_device(args.gpuid[0])
# torch.cuda.set_device(device)

# Initialize cluster centers
# by performing k-means after getting embeddings from Sentence-BERT with mean-pooling(defualt)
sbert = SentenceTransformer(MODEL_CLASS[args.bert])
cluster_centers = get_kmeans_centers(sbert, train_loader, args.num_classes) 



# Model
# 1. Transformer model 
# use Huggingface/transformers model (like BERT, RoBERTa, XLNet, XLM-R) for mapping tokens to embeddings
# word_embedding_model = models.Transformer(MODEL_CLASS[args.bert])

word_embedding_model = models.Transformer('sentence-transformers/paraphrase-mpnet-base-v2')
# word_embedding_model = models.Transformer('sentence-transformers/stanford-sentiment-treebank-roberta.2021-03-11')

# model = SentenceTransformer('distilbert-base-nli-mean-tokens')
dimension = word_embedding_model.get_word_embedding_dimension()
# word_embedding_model = torch.nn.DataParallel(word_embedding_model)


# 2. CNN model
# cnn = models.CNN(in_word_embedding_dimension = word_embedding_model.get_word_embedding_dimension(), 
#                  use_cnn = args.use_cnn, out_channels = word_embedding_model.get_word_embedding_dimension())

# 3. Pooling 
# pooling_model = models.Pooling(cnn.get_word_embedding_dimension(),
#                                pooling_mode_mean_tokens=True,
#                                pooling_mode_cls_token=False,
#                                pooling_mode_max_tokens=False)
pooling_model = models.Pooling(dimension,
                               pooling_mode_mean_tokens=True,
                               pooling_mode_cls_token=False,
                               pooling_mode_max_tokens=False, 
                               pooling_mode_weighted_tokens=False)

# 4. Feature extractor 
#feature_extractor = SentenceTransformerSequential(modules=[word_embedding_model, cnn, pooling_model])
feature_extractor = SentenceTransformerSequential(modules=[word_embedding_model, pooling_model], device = 'cuda')

# 5. main model
model = SCCLBert(feature_extractor, cluster_centers=cluster_centers, alpha = args.alpha, use_head = args.use_head)  


# Optimizer 
optimizer = torch.optim.Adam([
    {'params':word_embedding_model.parameters(), 'lr': args.lr*6},
#    {'params':cnn.parameters(), 'lr': args.lr*50},
    {'params':pooling_model.parameters()},
#    {'params':model.head.parameters(), 'lr': args.lr*args.lr_scale},
    {'params':model.cluster_centers, 'lr': args.lr*60}], lr=args.lr)
# # optimizer = torch.optim.Adam(lr=1e-4,params=model.parameters())
# optimizer = torch.optim.AdamW([
#     {'params':word_embedding_model.parameters(), 'lr': args.lr},
# #    {'params':cnn.parameters(), 'lr': args.lr*50},
#     {'params':pooling_model.parameters()},
# #    {'params':model.head.parameters(), 'lr': args.lr*args.lr_scale},
#     {'params':model.cluster_centers, 'lr': args.lr*20}], lr=args.lr)
# # optimizer = torch.optim.Adam(lr=1e-4,params=model.parameters())
print(optimizer)


# Set up the trainer    
learner = ClusterLearner(model, feature_extractor, optimizer, args.temperature, args.base_temperature,
                         args.contrastive_local_scale, args.contrastive_global_scale, args.clustering_scale, use_head = args.use_head, use_normalize = args.use_normalize)
# learner = torch.nn.DataParallel(learner)
learner = learner.cuda()

# split train - validation
if args.train_val_ratio != -1:
    train_loader, val_loader = augment_loader_split(args)
    training(train_loader, learner, args, val_loader = val_loader)
# normal
else:
    training(train_loader, learner, args) 

Time UTC: 2024-12-15 10:19:49.285107+00:00




all_embeddings:(12340, 768), true_labels:12340, pred_labels:12340
true_labels tensor([2, 0, 0,  ..., 3, 3, 1], dtype=torch.int32)
pred_labels tensor([7, 5, 5,  ..., 4, 3, 7], dtype=torch.int32)
Iterations:28, Clustering ACC:0.735, centers:(8, 768)




initial_cluster_centers =  torch.Size([8, 768])
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 6e-06
    maximize: False
    weight_decay: 0

Parameter Group 1
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 1e-06
    maximize: False
    weight_decay: 0

Parameter Group 2
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 5.9999999999999995e-05
    maximize: False
    weight_decay: 0
)
train_sample 0.9 11106
val_sample 0.1 1234

=3000/44=Iterations/Batches




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 0
[Representation] Clustering scores: {'NMI': 0.5883744500110832, 'ARI': 0.5556472276166045, 'AMI': 0.5879240687397914}
[Representation] ACC: 0.7351
[Representation] ACC sklearn: 0.1198
[Model] Clustering scores: {'NMI': 0.588927963988443, 'ARI': 0.55613840166552, 'AMI': 0.5884778852208414}
[Model] ACC: 0.7380
[Model] ACC sklearn: 0.1201
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 0
[Representation] Clustering scores: {'NMI': 0.54283282569678, 'ARI': 0.4802585920484761, 'AMI': 0.5382049629008621}
[Representation] ACC: 0.7026
[Representation] ACC sklearn: 0.1677
[Model] Clustering scores: {'NMI': 0.5684975407599983, 'ARI': 0.5236469928977487, 'AMI': 0.5641359437547796}
[Model] ACC: 0.7123
[Model] ACC sklearn: 0.1199
Time UTC: 2024-12-15 10:20:42.274099+00:00
Current running time 56.5 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 100
[Representation] Clustering scores: {'NMI': 0.5565845265972801, 'ARI': 0.5138398940254673, 'AMI': 0.5560997662874516}
[Representation] ACC: 0.7014
[Representation] ACC sklearn: 0.0481
[Model] Clustering scores: {'NMI': 0.5583664883240512, 'ARI': 0.5167418553356183, 'AMI': 0.557878021985363}
[Model] ACC: 0.7341
[Model] ACC sklearn: 0.1268
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 100
[Representation] Clustering scores: {'NMI': 0.40631523732748137, 'ARI': 0.30033088046030587, 'AMI': 0.40026428682725407}
[Representation] ACC: 0.5438
[Representation] ACC sklearn: 0.1070
[Model] Clustering scores: {'NMI': 0.5304626338756427, 'ARI': 0.47274532293958776, 'AMI': 0.5256641045938877}
[Model] ACC: 0.7002
[Model] ACC sklearn: 0.1264
Time UTC: 2024-12-15 10:23:51.753795+00:00
Current running time 245.98 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 200
[Representation] Clustering scores: {'NMI': 0.5973079809186246, 'ARI': 0.5533413509553512, 'AMI': 0.5968657805993931}
[Representation] ACC: 0.7417
[Representation] ACC sklearn: 0.0296
[Model] Clustering scores: {'NMI': 0.5738011793103598, 'ARI': 0.5303305358734085, 'AMI': 0.5733296854862201}
[Model] ACC: 0.7423
[Model] ACC sklearn: 0.1252
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 200
[Representation] Clustering scores: {'NMI': 0.4863211198345654, 'ARI': 0.3580735787631082, 'AMI': 0.48108188371649224}
[Representation] ACC: 0.5827
[Representation] ACC sklearn: 0.1313
[Model] Clustering scores: {'NMI': 0.5421206410558095, 'ARI': 0.482625583499306, 'AMI': 0.53744636323428}
[Model] ACC: 0.7058
[Model] ACC sklearn: 0.1248
Time UTC: 2024-12-15 10:27:00.309971+00:00
Current running time 434.53 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 300
[Representation] Clustering scores: {'NMI': 0.568177078003534, 'ARI': 0.5377914350540703, 'AMI': 0.5677052635036133}
[Representation] ACC: 0.7384
[Representation] ACC sklearn: 0.0235
[Model] Clustering scores: {'NMI': 0.5955097168154359, 'ARI': 0.5528551181663958, 'AMI': 0.5950626462803255}
[Model] ACC: 0.7536
[Model] ACC sklearn: 0.1243
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 300
[Representation] Clustering scores: {'NMI': 0.5314540102384446, 'ARI': 0.43784623352598706, 'AMI': 0.5267105837363004}
[Representation] ACC: 0.6807
[Representation] ACC sklearn: 0.2212
[Model] Clustering scores: {'NMI': 0.5621648304291471, 'ARI': 0.5017206063368715, 'AMI': 0.5576932158436584}
[Model] ACC: 0.7188
[Model] ACC sklearn: 0.1207
Time UTC: 2024-12-15 10:30:19.101277+00:00
Current running time 633.33 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 400
[Representation] Clustering scores: {'NMI': 0.5007980553109965, 'ARI': 0.4354925936306103, 'AMI': 0.5002408929229755}
[Representation] ACC: 0.6307
[Representation] ACC sklearn: 0.1509
[Model] Clustering scores: {'NMI': 0.6107682832841811, 'ARI': 0.5687236481285064, 'AMI': 0.6103382241531408}
[Model] ACC: 0.7613
[Model] ACC sklearn: 0.1213
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 400
[Representation] Clustering scores: {'NMI': 0.46563914142911944, 'ARI': 0.35232747653607055, 'AMI': 0.46009719576350205}
[Representation] ACC: 0.5940
[Representation] ACC sklearn: 0.1661
[Model] Clustering scores: {'NMI': 0.5739786291204095, 'ARI': 0.5160162379975406, 'AMI': 0.569627692526341}
[Model] ACC: 0.7269
[Model] ACC sklearn: 0.1199
Time UTC: 2024-12-15 10:33:38.618755+00:00
Current running time 832.84 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 500
[Representation] Clustering scores: {'NMI': 0.5340002818055997, 'ARI': 0.45844912793782294, 'AMI': 0.5334885685859473}
[Representation] ACC: 0.6846
[Representation] ACC sklearn: 0.0228
[Model] Clustering scores: {'NMI': 0.6223780381269352, 'ARI': 0.5779414713355685, 'AMI': 0.6219610601099916}
[Model] ACC: 0.7650
[Model] ACC sklearn: 0.1160
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 500
[Representation] Clustering scores: {'NMI': 0.4381031287795747, 'ARI': 0.31931616530603096, 'AMI': 0.4322411634639327}
[Representation] ACC: 0.5519
[Representation] ACC sklearn: 0.1256
[Model] Clustering scores: {'NMI': 0.5903614226849468, 'ARI': 0.5289342719785575, 'AMI': 0.5861758685906898}
[Model] ACC: 0.7318
[Model] ACC sklearn: 0.1110
Time UTC: 2024-12-15 10:36:51.184861+00:00
Current running time 1025.41 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 600
[Representation] Clustering scores: {'NMI': 0.5139149531504051, 'ARI': 0.4014380696952306, 'AMI': 0.5133784894402259}
[Representation] ACC: 0.6452
[Representation] ACC sklearn: 0.0357
[Model] Clustering scores: {'NMI': 0.6287057859599336, 'ARI': 0.584967629125307, 'AMI': 0.6282961548459907}
[Model] ACC: 0.7669
[Model] ACC sklearn: 0.1134
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 600
[Representation] Clustering scores: {'NMI': 0.49705516246572756, 'ARI': 0.3865233746615766, 'AMI': 0.49192861850122305}
[Representation] ACC: 0.6280
[Representation] ACC sklearn: 0.1321
[Model] Clustering scores: {'NMI': 0.6044055961525053, 'ARI': 0.5496234268646591, 'AMI': 0.6003780591405247}
[Model] ACC: 0.7407
[Model] ACC sklearn: 0.1110
Time UTC: 2024-12-15 10:40:02.646855+00:00
Current running time 1216.87 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 700
[Representation] Clustering scores: {'NMI': 0.6063658573825399, 'ARI': 0.54767932793583, 'AMI': 0.6059357407269187}
[Representation] ACC: 0.7400
[Representation] ACC sklearn: 0.1415
[Model] Clustering scores: {'NMI': 0.641550037316387, 'ARI': 0.5998503721738273, 'AMI': 0.6411550861658225}
[Model] ACC: 0.7730
[Model] ACC sklearn: 0.1119
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 700
[Representation] Clustering scores: {'NMI': 0.54496144217891, 'ARI': 0.42943905655199194, 'AMI': 0.5401802623032488}
[Representation] ACC: 0.6483
[Representation] ACC sklearn: 0.1013
[Model] Clustering scores: {'NMI': 0.6243007875903884, 'ARI': 0.5708357946069434, 'AMI': 0.6204886481205639}
[Model] ACC: 0.7488
[Model] ACC sklearn: 0.1062
Time UTC: 2024-12-15 10:43:15.209663+00:00
Current running time 1409.43 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 800
[Representation] Clustering scores: {'NMI': 0.6446125176765142, 'ARI': 0.6126230835222168, 'AMI': 0.6442248570482558}
[Representation] ACC: 0.7840
[Representation] ACC sklearn: 0.1544
[Model] Clustering scores: {'NMI': 0.6590440336469946, 'ARI': 0.610936526538618, 'AMI': 0.6586682723420337}
[Model] ACC: 0.7771
[Model] ACC sklearn: 0.1053
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 800
[Representation] Clustering scores: {'NMI': 0.675612493301304, 'ARI': 0.6471260271582773, 'AMI': 0.6722795798315586}
[Representation] ACC: 0.8039
[Representation] ACC sklearn: 0.0081
[Model] Clustering scores: {'NMI': 0.6549418471751497, 'ARI': 0.5975156179656804, 'AMI': 0.6514354911470586}
[Model] ACC: 0.7618
[Model] ACC sklearn: 0.1021
Time UTC: 2024-12-15 10:46:31.734194+00:00
Current running time 1605.96 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 900
[Representation] Clustering scores: {'NMI': 0.6814220263284605, 'ARI': 0.6616446649508609, 'AMI': 0.6810688439274196}
[Representation] ACC: 0.8209
[Representation] ACC sklearn: 0.0534
[Model] Clustering scores: {'NMI': 0.6833393288957831, 'ARI': 0.6325029157572309, 'AMI': 0.6829913474163076}
[Model] ACC: 0.7738
[Model] ACC sklearn: 0.0925
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 900
[Representation] Clustering scores: {'NMI': 0.6678594757799639, 'ARI': 0.6120227002325779, 'AMI': 0.66449212625415}
[Representation] ACC: 0.7577
[Representation] ACC sklearn: 0.1070
[Model] Clustering scores: {'NMI': 0.671064992256385, 'ARI': 0.6157852742276134, 'AMI': 0.6677284577418604}
[Model] ACC: 0.7585
[Model] ACC sklearn: 0.0867
Time UTC: 2024-12-15 10:49:45.946835+00:00
Current running time 1800.17 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 1000
[Representation] Clustering scores: {'NMI': 0.6892636247811338, 'ARI': 0.637564149816848, 'AMI': 0.6889226929510684}
[Representation] ACC: 0.7732
[Representation] ACC sklearn: 0.0125
[Model] Clustering scores: {'NMI': 0.6892907806672607, 'ARI': 0.637606651831182, 'AMI': 0.6889498752355067}
[Model] ACC: 0.7728
[Model] ACC sklearn: 0.0947
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 1000
[Representation] Clustering scores: {'NMI': 0.6690105113923668, 'ARI': 0.6123933974903855, 'AMI': 0.6656558546125627}
[Representation] ACC: 0.7569
[Representation] ACC sklearn: 0.0162
[Model] Clustering scores: {'NMI': 0.6690025086691911, 'ARI': 0.6124074715210736, 'AMI': 0.6656476492994177}
[Model] ACC: 0.7577
[Model] ACC sklearn: 0.0891
Time UTC: 2024-12-15 10:52:57.027543+00:00
Current running time 1991.25 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 1100
[Representation] Clustering scores: {'NMI': 0.6908756514483364, 'ARI': 0.6413097812910427, 'AMI': 0.6905369965922138}
[Representation] ACC: 0.7713
[Representation] ACC sklearn: 0.0371
[Model] Clustering scores: {'NMI': 0.6910384862021396, 'ARI': 0.6414499693117798, 'AMI': 0.6907000066538922}
[Model] ACC: 0.7714
[Model] ACC sklearn: 0.0957
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 1100
[Representation] Clustering scores: {'NMI': 0.6347128577946898, 'ARI': 0.6188443879738239, 'AMI': 0.6309879604904622}
[Representation] ACC: 0.7925
[Representation] ACC sklearn: 0.1856
[Model] Clustering scores: {'NMI': 0.659887209981505, 'ARI': 0.6049368794309445, 'AMI': 0.6564481323703141}
[Model] ACC: 0.7528
[Model] ACC sklearn: 0.0924
Time UTC: 2024-12-15 10:56:15.610118+00:00
Current running time 2189.83 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 1200
[Representation] Clustering scores: {'NMI': 0.6905849303848424, 'ARI': 0.642991684533856, 'AMI': 0.6902461419354206}
[Representation] ACC: 0.7700
[Representation] ACC sklearn: 0.0205
[Model] Clustering scores: {'NMI': 0.690791541676223, 'ARI': 0.6431364778915799, 'AMI': 0.6904529755763655}
[Model] ACC: 0.7701
[Model] ACC sklearn: 0.0957
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 1200
[Representation] Clustering scores: {'NMI': 0.6580215455969669, 'ARI': 0.6463549119161982, 'AMI': 0.6544979518627313}
[Representation] ACC: 0.8120
[Representation] ACC sklearn: 0.3274
[Model] Clustering scores: {'NMI': 0.6610908494031652, 'ARI': 0.608881955451585, 'AMI': 0.6576665341527342}
[Model] ACC: 0.7569
[Model] ACC sklearn: 0.0900
Time UTC: 2024-12-15 10:59:30.571493+00:00
Current running time 2384.8 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 1300
[Representation] Clustering scores: {'NMI': 0.6893322888867832, 'ARI': 0.6442962028803365, 'AMI': 0.6889923171113789}
[Representation] ACC: 0.7681
[Representation] ACC sklearn: 0.0136
[Model] Clustering scores: {'NMI': 0.6892618534041661, 'ARI': 0.6441825379204384, 'AMI': 0.6889218028709321}
[Model] ACC: 0.7679
[Model] ACC sklearn: 0.0954
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 1300
[Representation] Clustering scores: {'NMI': 0.6711648099266595, 'ARI': 0.6230392933981262, 'AMI': 0.6678403993900834}
[Representation] ACC: 0.7682
[Representation] ACC sklearn: 0.0300
[Model] Clustering scores: {'NMI': 0.6711013804163546, 'ARI': 0.6232764050090052, 'AMI': 0.6677763475426043}
[Model] ACC: 0.7682
[Model] ACC sklearn: 0.0883
Time UTC: 2024-12-15 11:03:37.527107+00:00
Current running time 2631.75 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 1400
[Representation] Clustering scores: {'NMI': 0.6907208228503027, 'ARI': 0.6436814701491772, 'AMI': 0.6903823094986093}
[Representation] ACC: 0.7704
[Representation] ACC sklearn: 0.3006
[Model] Clustering scores: {'NMI': 0.6904989964416614, 'ARI': 0.643501150185973, 'AMI': 0.6901602390726075}
[Model] ACC: 0.7703
[Model] ACC sklearn: 0.0974
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 1400
[Representation] Clustering scores: {'NMI': 0.6745291260209473, 'ARI': 0.6214241706328593, 'AMI': 0.6712409673494162}
[Representation] ACC: 0.7634
[Representation] ACC sklearn: 0.1167
[Model] Clustering scores: {'NMI': 0.6745291260209473, 'ARI': 0.6214241706328593, 'AMI': 0.6712409673494162}
[Model] ACC: 0.7634
[Model] ACC sklearn: 0.0940
Time UTC: 2024-12-15 11:10:29.445475+00:00
Current running time 3043.67 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 1500
[Representation] Clustering scores: {'NMI': 0.689888841552657, 'ARI': 0.6430349027053287, 'AMI': 0.6895494842477965}
[Representation] ACC: 0.7687
[Representation] ACC sklearn: 0.3040
[Model] Clustering scores: {'NMI': 0.6899059039540396, 'ARI': 0.643015599297558, 'AMI': 0.6895665626293422}
[Model] ACC: 0.7687
[Model] ACC sklearn: 0.0966
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 1500
[Representation] Clustering scores: {'NMI': 0.6717720724533357, 'ARI': 0.6194698040590961, 'AMI': 0.6684584066874719}
[Representation] ACC: 0.7634
[Representation] ACC sklearn: 0.0138
[Model] Clustering scores: {'NMI': 0.6717720724533357, 'ARI': 0.6194698040590961, 'AMI': 0.6684584066874719}
[Model] ACC: 0.7634
[Model] ACC sklearn: 0.0924
Time UTC: 2024-12-15 11:18:56.678880+00:00
Current running time 3550.9 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 1600
[Representation] Clustering scores: {'NMI': 0.6893404567339448, 'ARI': 0.6434991501139983, 'AMI': 0.6890005662771266}
[Representation] ACC: 0.7675
[Representation] ACC sklearn: 0.1506
[Model] Clustering scores: {'NMI': 0.6893404567339448, 'ARI': 0.6434991501139983, 'AMI': 0.6890005662771266}
[Model] ACC: 0.7675
[Model] ACC sklearn: 0.0965
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 1600
[Representation] Clustering scores: {'NMI': 0.6740722870964301, 'ARI': 0.6230586676416504, 'AMI': 0.6707800776338112}
[Representation] ACC: 0.7666
[Representation] ACC sklearn: 0.2820
[Model] Clustering scores: {'NMI': 0.6740722870964301, 'ARI': 0.6230586676416504, 'AMI': 0.6707800776338112}
[Model] ACC: 0.7666
[Model] ACC sklearn: 0.0916
Time UTC: 2024-12-15 11:27:24.726349+00:00
Current running time 4058.95 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 1700
[Representation] Clustering scores: {'NMI': 0.6889019561320217, 'ARI': 0.6431750366403335, 'AMI': 0.6885615999017681}
[Representation] ACC: 0.7675
[Representation] ACC sklearn: 0.1077
[Model] Clustering scores: {'NMI': 0.6888984993709129, 'ARI': 0.6431678153431408, 'AMI': 0.688558138774933}
[Model] ACC: 0.7674
[Model] ACC sklearn: 0.0966
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 1700
[Representation] Clustering scores: {'NMI': 0.6826246707605889, 'ARI': 0.632129570727939, 'AMI': 0.6794189523303734}
[Representation] ACC: 0.7690
[Representation] ACC sklearn: 0.1045
[Model] Clustering scores: {'NMI': 0.6826246707605889, 'ARI': 0.632129570727939, 'AMI': 0.6794189523303734}
[Model] ACC: 0.7690
[Model] ACC sklearn: 0.0932
Time UTC: 2024-12-15 11:34:16.515762+00:00
Current running time 4470.74 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 1800
[Representation] Clustering scores: {'NMI': 0.6896258410862887, 'ARI': 0.6433090113163235, 'AMI': 0.6892863413572846}
[Representation] ACC: 0.7695
[Representation] ACC sklearn: 0.0430
[Model] Clustering scores: {'NMI': 0.6894227474112783, 'ARI': 0.6431455004271833, 'AMI': 0.6890830259508464}
[Model] ACC: 0.7694
[Model] ACC sklearn: 0.0986
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 1800
[Representation] Clustering scores: {'NMI': 0.6745437203761322, 'ARI': 0.6236974964983636, 'AMI': 0.6712606476815105}
[Representation] ACC: 0.7618
[Representation] ACC sklearn: 0.0867
[Model] Clustering scores: {'NMI': 0.675801970937532, 'ARI': 0.6248433709120168, 'AMI': 0.6725314230777354}
[Model] ACC: 0.7626
[Model] ACC sklearn: 0.0972
Time UTC: 2024-12-15 11:40:35.524820+00:00
Current running time 4849.75 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 1900
[Representation] Clustering scores: {'NMI': 0.6898633565794288, 'ARI': 0.6420530637391745, 'AMI': 0.6895241568168027}
[Representation] ACC: 0.7669
[Representation] ACC sklearn: 0.0128
[Model] Clustering scores: {'NMI': 0.6898633565794288, 'ARI': 0.6420530637391745, 'AMI': 0.6895241568168027}
[Model] ACC: 0.7669
[Model] ACC sklearn: 0.0966
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 1900
[Representation] Clustering scores: {'NMI': 0.6635177423460011, 'ARI': 0.6098855440758535, 'AMI': 0.6601219420591755}
[Representation] ACC: 0.7593
[Representation] ACC sklearn: 0.1078
[Model] Clustering scores: {'NMI': 0.6635177423460011, 'ARI': 0.6098855440758535, 'AMI': 0.6601219420591755}
[Model] ACC: 0.7593
[Model] ACC sklearn: 0.0908
Time UTC: 2024-12-15 11:46:46.477968+00:00
Current running time 5220.7 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 2000
[Representation] Clustering scores: {'NMI': 0.6897857025342135, 'ARI': 0.6423487779067756, 'AMI': 0.6894463997354771}
[Representation] ACC: 0.7679
[Representation] ACC sklearn: 0.0235
[Model] Clustering scores: {'NMI': 0.6897857025342135, 'ARI': 0.6423487779067756, 'AMI': 0.6894463997354771}
[Model] ACC: 0.7679
[Model] ACC sklearn: 0.0977
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 2000
[Representation] Clustering scores: {'NMI': 0.6719916090815296, 'ARI': 0.619326795016393, 'AMI': 0.6686825232499461}
[Representation] ACC: 0.7593
[Representation] ACC sklearn: 0.0300
[Model] Clustering scores: {'NMI': 0.6719916090815296, 'ARI': 0.619326795016393, 'AMI': 0.6686825232499461}
[Model] ACC: 0.7593
[Model] ACC sklearn: 0.0997
Time UTC: 2024-12-15 11:52:10.447256+00:00
Current running time 5544.67 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 2100
[Representation] Clustering scores: {'NMI': 0.6864879388326321, 'ARI': 0.6385074579320896, 'AMI': 0.6861451856792306}
[Representation] ACC: 0.7636
[Representation] ACC sklearn: 0.3010
[Model] Clustering scores: {'NMI': 0.6864879388326321, 'ARI': 0.6385074579320896, 'AMI': 0.6861451856792306}
[Model] ACC: 0.7636
[Model] ACC sklearn: 0.0960
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 2100
[Representation] Clustering scores: {'NMI': 0.6707411005950065, 'ARI': 0.6161719714696686, 'AMI': 0.6674182428956703}
[Representation] ACC: 0.7618
[Representation] ACC sklearn: 0.0194
[Model] Clustering scores: {'NMI': 0.6707411005950065, 'ARI': 0.6161719714696686, 'AMI': 0.6674182428956703}
[Model] ACC: 0.7618
[Model] ACC sklearn: 0.0908
Time UTC: 2024-12-15 11:56:44.263063+00:00
Current running time 5818.49 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 2200
[Representation] Clustering scores: {'NMI': 0.6840360484826155, 'ARI': 0.6369621999395324, 'AMI': 0.6836905673535391}
[Representation] ACC: 0.7634
[Representation] ACC sklearn: 0.3577
[Model] Clustering scores: {'NMI': 0.6840360484826155, 'ARI': 0.6369621999395324, 'AMI': 0.6836905673535391}
[Model] ACC: 0.7634
[Model] ACC sklearn: 0.0965
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 2200
[Representation] Clustering scores: {'NMI': 0.6186683905985598, 'ARI': 0.5650905992176157, 'AMI': 0.6148302105972393}
[Representation] ACC: 0.7253
[Representation] ACC sklearn: 0.0251
[Model] Clustering scores: {'NMI': 0.6636869003354838, 'ARI': 0.6078507377545888, 'AMI': 0.6602923709018221}
[Model] ACC: 0.7569
[Model] ACC sklearn: 0.0932
Time UTC: 2024-12-15 12:01:19.684334+00:00
Current running time 6093.91 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 2300
[Representation] Clustering scores: {'NMI': 0.6859861159808053, 'ARI': 0.6390384499705305, 'AMI': 0.6856426893634687}
[Representation] ACC: 0.7657
[Representation] ACC sklearn: 0.0882
[Model] Clustering scores: {'NMI': 0.6859999407867327, 'ARI': 0.6390447727328239, 'AMI': 0.6856565302438444}
[Model] ACC: 0.7656
[Model] ACC sklearn: 0.0971
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 2300
[Representation] Clustering scores: {'NMI': 0.6753462470392626, 'ARI': 0.6180399089819927, 'AMI': 0.6720705314382573}
[Representation] ACC: 0.7601
[Representation] ACC sklearn: 0.3509
[Model] Clustering scores: {'NMI': 0.6765144417859068, 'ARI': 0.6191509677083683, 'AMI': 0.6732503171449977}
[Model] ACC: 0.7609
[Model] ACC sklearn: 0.0981
Time UTC: 2024-12-15 12:05:46.276382+00:00
Current running time 6360.5 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 2400
[Representation] Clustering scores: {'NMI': 0.6856253686367197, 'ARI': 0.6394529371835337, 'AMI': 0.6852815356733943}
[Representation] ACC: 0.7672
[Representation] ACC sklearn: 0.0194
[Model] Clustering scores: {'NMI': 0.6858110049617407, 'ARI': 0.6396406289476413, 'AMI': 0.6854673700477245}
[Model] ACC: 0.7673
[Model] ACC sklearn: 0.0985
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 2400
[Representation] Clustering scores: {'NMI': 0.6741989046076874, 'ARI': 0.6208527807522582, 'AMI': 0.6709103751208388}
[Representation] ACC: 0.7618
[Representation] ACC sklearn: 0.1167
[Model] Clustering scores: {'NMI': 0.6741989046076874, 'ARI': 0.6208527807522582, 'AMI': 0.6709103751208388}
[Model] ACC: 0.7618
[Model] ACC sklearn: 0.0948
Time UTC: 2024-12-15 12:09:30.971154+00:00
Current running time 6585.2 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 2500
[Representation] Clustering scores: {'NMI': 0.683174078522663, 'ARI': 0.6362182345545522, 'AMI': 0.6828277268887564}
[Representation] ACC: 0.7634
[Representation] ACC sklearn: 0.0461
[Model] Clustering scores: {'NMI': 0.683174078522663, 'ARI': 0.6362182345545522, 'AMI': 0.6828277268887564}
[Model] ACC: 0.7634
[Model] ACC sklearn: 0.0979
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 2500
[Representation] Clustering scores: {'NMI': 0.6688713315129967, 'ARI': 0.6176644111680677, 'AMI': 0.6655326086077709}
[Representation] ACC: 0.7601
[Representation] ACC sklearn: 0.1653
[Model] Clustering scores: {'NMI': 0.6680027686500074, 'ARI': 0.617431405798018, 'AMI': 0.6646549477666138}
[Model] ACC: 0.7601
[Model] ACC sklearn: 0.0964
Time UTC: 2024-12-15 12:13:55.337092+00:00
Current running time 6849.56 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 2600
[Representation] Clustering scores: {'NMI': 0.6835167635254237, 'ARI': 0.6368163057994354, 'AMI': 0.6831707859152593}
[Representation] ACC: 0.7648
[Representation] ACC sklearn: 0.2812
[Model] Clustering scores: {'NMI': 0.6835167635254237, 'ARI': 0.6368163057994354, 'AMI': 0.6831707859152593}
[Model] ACC: 0.7648
[Model] ACC sklearn: 0.0989
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 2600
[Representation] Clustering scores: {'NMI': 0.6186027839720297, 'ARI': 0.562131494634582, 'AMI': 0.6147637013405954}
[Representation] ACC: 0.7229
[Representation] ACC sklearn: 0.0592
[Model] Clustering scores: {'NMI': 0.6685813280364835, 'ARI': 0.618801585250268, 'AMI': 0.6652375886621589}
[Model] ACC: 0.7618
[Model] ACC sklearn: 0.0932
Time UTC: 2024-12-15 12:22:52.961152+00:00
Current running time 7387.19 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 2700
[Representation] Clustering scores: {'NMI': 0.6812497015237421, 'ARI': 0.6335240300283025, 'AMI': 0.6809013210667234}
[Representation] ACC: 0.7618
[Representation] ACC sklearn: 0.1123
[Model] Clustering scores: {'NMI': 0.6812497015237421, 'ARI': 0.6335240300283025, 'AMI': 0.6809013210667234}
[Model] ACC: 0.7618
[Model] ACC sklearn: 0.0978
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 2700
[Representation] Clustering scores: {'NMI': 0.6622199870686268, 'ARI': 0.60889435957141, 'AMI': 0.6588124018701552}
[Representation] ACC: 0.7593
[Representation] ACC sklearn: 0.1053
[Model] Clustering scores: {'NMI': 0.6620637944239035, 'ARI': 0.6084767712433882, 'AMI': 0.6586550820265472}
[Model] ACC: 0.7593
[Model] ACC sklearn: 0.0916
Time UTC: 2024-12-15 12:27:25.645256+00:00
Current running time 7659.87 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 2800
[Representation] Clustering scores: {'NMI': 0.6843716913918424, 'ARI': 0.6378560429350111, 'AMI': 0.684026631786134}
[Representation] ACC: 0.7651
[Representation] ACC sklearn: 0.0531
[Model] Clustering scores: {'NMI': 0.6843716913918424, 'ARI': 0.6378560429350111, 'AMI': 0.684026631786134}
[Model] ACC: 0.7651
[Model] ACC sklearn: 0.0989
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 2800
[Representation] Clustering scores: {'NMI': 0.6729451865221023, 'ARI': 0.6196987965153802, 'AMI': 0.6696463893929244}
[Representation] ACC: 0.7626
[Representation] ACC sklearn: 0.3744
[Model] Clustering scores: {'NMI': 0.6729451865221023, 'ARI': 0.6196987965153802, 'AMI': 0.6696463893929244}
[Model] ACC: 0.7626
[Model] ACC sklearn: 0.0948
Time UTC: 2024-12-15 12:31:07.211122+00:00
Current running time 7881.44 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 2900
[Representation] Clustering scores: {'NMI': 0.6850231383167724, 'ARI': 0.6376923759092498, 'AMI': 0.6846788311813125}
[Representation] ACC: 0.7672
[Representation] ACC sklearn: 0.1155
[Model] Clustering scores: {'NMI': 0.6848202552892154, 'ARI': 0.637528958999353, 'AMI': 0.6844757267764942}
[Model] ACC: 0.7672
[Model] ACC sklearn: 0.1015
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 2900
[Representation] Clustering scores: {'NMI': 0.662678001070684, 'ARI': 0.6071011102450165, 'AMI': 0.6592784627408529}
[Representation] ACC: 0.7520
[Representation] ACC sklearn: 0.0122
[Model] Clustering scores: {'NMI': 0.662678001070684, 'ARI': 0.6071011102450165, 'AMI': 0.6592784627408529}
[Model] ACC: 0.7520
[Model] ACC sklearn: 0.0997
Time UTC: 2024-12-15 12:34:48.742482+00:00
Current running time 8102.97 seconds




------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 3000
[Representation] Clustering scores: {'NMI': 0.6850827872396028, 'ARI': 0.6379499905825005, 'AMI': 0.6847385464325233}
[Representation] ACC: 0.7659
[Representation] ACC sklearn: 0.4520
[Model] Clustering scores: {'NMI': 0.6850827872396028, 'ARI': 0.6379499905825005, 'AMI': 0.6847385464325233}
[Model] ACC: 0.7659
[Model] ACC sklearn: 0.0996
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 3000
[Representation] Clustering scores: {'NMI': 0.618653925655043, 'ARI': 0.5612183679113015, 'AMI': 0.6148103675654887}
[Representation] ACC: 0.7318
[Representation] ACC sklearn: 0.3428
[Model] Clustering scores: {'NMI': 0.6658783871488453, 'ARI': 0.6111781831840079, 'AMI': 0.6625087207392094}
[Model] ACC: 0.7593
[Model] ACC sklearn: 0.0948
Time UTC: 2024-12-15 12:38:20.047799+00:00
Current running time 8314.27 seconds




### The results:

```
------------- Evaluate Training Set -------------
------------- 44 batches -------------
all_pred 8
step: 3000
[Representation] Clustering scores: {'NMI': 0.6850827872396028, 'ARI': 0.6379499905825005, 'AMI': 0.6847385464325233}
[Representation] ACC: 0.7659
[Representation] ACC sklearn: 0.4520
[Model] Clustering scores: {'NMI': 0.6850827872396028, 'ARI': 0.6379499905825005, 'AMI': 0.6847385464325233}
[Model] ACC: 0.7659
[Model] ACC sklearn: 0.0996
------------- Evaluate Validation Set -------------
------------- 5 batches -------------
all_pred 8
step: 3000
[Representation] Clustering scores: {'NMI': 0.618653925655043, 'ARI': 0.5612183679113015, 'AMI': 0.6148103675654887}
[Representation] ACC: 0.7318
[Representation] ACC sklearn: 0.3428
[Model] Clustering scores: {'NMI': 0.6658783871488453, 'ARI': 0.6111781831840079, 'AMI': 0.6625087207392094}
[Model] ACC: 0.7593
[Model] ACC sklearn: 0.0948
Time UTC: 2024-12-15 12:38:20.047799+00:00
Current running time 8314.27 seconds
```

From the paper report: The performace at SearchSnippets dataset is 76.72 ACC and 67.69 NMI. The number we reproduce is 75.93 ACC and 66.5p NMI. The difference is not significant.