This notebook contains examples of TextClustering model usage for Banking77 datasets.

In [78]:
import pandas as pd
import numpy as np

from omegaconf import OmegaConf

import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

from utils.data_utils import *
from text_clustering import TextClustering
from utils.evaluation_utils import evaluate_classic_clustering

%load_ext autoreload
%autoreload

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Load data

To load and split Banking or Demo datasets, you may use functions from ```utils/data_utils.py```. You need to place .csv file with texts (and, possibly, cluester markup) in ```BANKING_PATH``` directory. This directory should contain subdirectory ```embeds/``` with texts embeddings saved as a torch.Tensor of shape ```[n_texts, input_dim]```. 

To add a custom dataset, you need a dataframe containing texts (column 'text') and markup (columns 'cluster') and a tensor of embeddings. Make sure that ```len(dataframe) == embeds.shape[0]```.

In [79]:
BANKING_PATH = "data/banking77/"

data_all, clusters_all, embeds_t5_all = load_banking_data(BANKING_PATH)

NUM_CLUSTERS = 10

base_embeds, _, base_data, base_clusters, _ = sample_banking_clusters(
    dataframe=data_all,
    raw_embeds=embeds_t5_all,
    cluster_num_list=np.linspace(0, NUM_CLUSTERS - 1, NUM_CLUSTERS),
    noise_cluster_num_list=None,
    noise_frac=0
)

INP_DIM = base_embeds.shape[1]

# Classic algorithms

To initialize a classic clustering model, you need to choose dimensionality reduction type (None, 'pca', 'umap', 'tsne' or 'pacmap'), number of components (aka ```feat_dim```), clustering type ('kmeans', 'gmm', 'hdbscan', 'mean_shift' or 'spectral') and its hyper-parameters (```min_samples``` and ```min_cluster_size``` for HDBSCAN; ```bandwidth``` for MeanShift).
Once TextClustering model with the necessary parameters is initialized, it can be fitted and evaluated in the following way.

In [80]:
model = TextClustering(
    n_clusters=NUM_CLUSTERS,
    inp_dim=INP_DIM,
    train_dataset=base_embeds,
    data_frame=base_data,
    feat_dim=10,
    kind="classic clustering",
    dim_reduction_type="umap",
    clustering_type="gmm",
    random_state=42
)

model.fit(base_embeds)
_, clusters = model.transform_and_cluster(base_embeds)

metrics = model.evaluate(use_true_clusters=True)

print(f"Dim red time: {model.times['dim_red']}")
print(f"Clust time: {model.times['clust']}")

silhouette_score: 0.06390000134706497
adjusted_rand_score: 0.6536
adjusted_mutual_info_score: 0.804
average_topic_coherence: 0.7264
Dim red time: 10.43466591835022
Clust time: 0.13676953315734863


To run a series of experiments with grid search and save results in a .csv file, you may use the following function defined in ```evaluation_utils.py```.

In [29]:
results_banking = evaluate_classic_clustering(
    dataset="banking",
    data_path=BANKING_PATH,
    num_clasters=3,
    dim_reduction_type_list=[None, "pca", "pacmap"],
    n_components_list=[None, 10, 50],
    clustering_type_list=["kmeans", "gmm", "hdbscan", "spectral"], # "mean_shift"
    random_state=42,
    min_samples_list=[5, 10],
    min_cluster_size_list=[10, 30, 60],
    bandwidth_list=[1.00, 1.50, 2.00],
    verbose=True,
    save_df=False
)

  0%|                                                                                            | 0/36 [00:00<?, ?it/s]

dim_red: None, n_comp: None, clustering: kmeans


  3%|██▎                                                                                 | 1/36 [00:01<00:48,  1.40s/it]

silhouette_score: 0.10859999805688858
adjusted_rand_score: 0.3873
adjusted_mutual_info_score: 0.5238
average_topic_coherence: 0.7914
--------------------------------------------------------------------------------
dim_red: None, n_comp: None, clustering: gmm


  6%|████▋                                                                               | 2/36 [00:04<01:27,  2.56s/it]

silhouette_score: 0.13809999823570251
adjusted_rand_score: 0.9721
adjusted_mutual_info_score: 0.9557
average_topic_coherence: 0.7914
--------------------------------------------------------------------------------
dim_red: None, n_comp: None, clustering: hdbscan
Grid search for HDBSCAN parameters



  0%|                                                                                             | 0/6 [00:00<?, ?it/s][A

min_samples: 5, min_cluster_size: 10



 17%|██████████████▏                                                                      | 1/6 [00:02<00:12,  2.44s/it][A

silhouette_score: -0.04439999908208847
adjusted_rand_score: 0.1016
adjusted_mutual_info_score: 0.3382
average_topic_coherence: 0.7753
min_samples: 5, min_cluster_size: 30



 33%|████████████████████████████▎                                                        | 2/6 [00:05<00:10,  2.68s/it][A

silhouette_score: -0.0032999999821186066
adjusted_rand_score: 0.216
adjusted_mutual_info_score: 0.4331
average_topic_coherence: 0.8559
min_samples: 5, min_cluster_size: 60



 50%|██████████████████████████████████████████▌                                          | 3/6 [00:08<00:08,  2.70s/it][A

silhouette_score: -0.0032999999821186066
adjusted_rand_score: 0.216
adjusted_mutual_info_score: 0.4331
average_topic_coherence: 0.8559
min_samples: 10, min_cluster_size: 10



 67%|████████████████████████████████████████████████████████▋                            | 4/6 [00:10<00:05,  2.58s/it][A

silhouette_score: -0.04450000077486038
adjusted_rand_score: 0.1166
adjusted_mutual_info_score: 0.361
average_topic_coherence: 0.8618
min_samples: 10, min_cluster_size: 30



 83%|██████████████████████████████████████████████████████████████████████▊              | 5/6 [00:12<00:02,  2.53s/it][A

silhouette_score: 0.012400000356137753
adjusted_rand_score: 0.1302
adjusted_mutual_info_score: 0.2781
average_topic_coherence: 0.8088
min_samples: 10, min_cluster_size: 60



100%|█████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:15<00:00,  2.51s/it][A
  8%|███████                                                                             | 3/36 [00:19<04:32,  8.27s/it]

silhouette_score: 0.012400000356137753
adjusted_rand_score: 0.1302
adjusted_mutual_info_score: 0.2781
average_topic_coherence: 0.8088
silhouette_score: 0.012400000356137753
adjusted_rand_score: 0.1302
adjusted_mutual_info_score: 0.2781
average_topic_coherence: 0.8088
--------------------------------------------------------------------------------
dim_red: None, n_comp: None, clustering: spectral


 11%|█████████▎                                                                          | 4/36 [00:21<02:58,  5.58s/it]

silhouette_score: 0.13079999387264252
adjusted_rand_score: 0.8111
adjusted_mutual_info_score: 0.7803
average_topic_coherence: 0.598
--------------------------------------------------------------------------------
dim_red: pca, n_comp: 10, clustering: kmeans


 47%|███████████████████████████████████████▏                                           | 17/36 [00:22<00:14,  1.32it/s]

silhouette_score: 0.10869999974966049
adjusted_rand_score: 0.3759
adjusted_mutual_info_score: 0.5089
average_topic_coherence: 0.7957
--------------------------------------------------------------------------------
dim_red: pca, n_comp: 10, clustering: gmm


 50%|█████████████████████████████████████████▌                                         | 18/36 [00:24<00:14,  1.23it/s]

silhouette_score: 0.13840000331401825
adjusted_rand_score: 1.0
adjusted_mutual_info_score: 1.0
average_topic_coherence: 0.8051
--------------------------------------------------------------------------------
dim_red: pca, n_comp: 10, clustering: hdbscan
Grid search for HDBSCAN parameters



  0%|                                                                                             | 0/6 [00:00<?, ?it/s][A

min_samples: 5, min_cluster_size: 10



 17%|██████████████▏                                                                      | 1/6 [00:01<00:06,  1.33s/it][A

silhouette_score: 0.0917000025510788
adjusted_rand_score: 0.6476
adjusted_mutual_info_score: 0.7034
average_topic_coherence: 0.7408
min_samples: 5, min_cluster_size: 30



 33%|████████████████████████████▎                                                        | 2/6 [00:02<00:05,  1.28s/it][A

silhouette_score: 0.09260000288486481
adjusted_rand_score: 0.6503
adjusted_mutual_info_score: 0.7048
average_topic_coherence: 0.7402
min_samples: 5, min_cluster_size: 60



 50%|██████████████████████████████████████████▌                                          | 3/6 [00:04<00:04,  1.40s/it][A

silhouette_score: 0.0917000025510788
adjusted_rand_score: 0.6476
adjusted_mutual_info_score: 0.7034
average_topic_coherence: 0.7408
min_samples: 10, min_cluster_size: 10



 67%|████████████████████████████████████████████████████████▋                            | 4/6 [00:05<00:03,  1.51s/it][A

silhouette_score: 0.040800001472234726
adjusted_rand_score: 0.4077
adjusted_mutual_info_score: 0.556
average_topic_coherence: 0.8595
min_samples: 10, min_cluster_size: 30



 83%|██████████████████████████████████████████████████████████████████████▊              | 5/6 [00:07<00:01,  1.56s/it][A

silhouette_score: 0.06639999896287918
adjusted_rand_score: 0.4977
adjusted_mutual_info_score: 0.6047
average_topic_coherence: 0.8209
min_samples: 10, min_cluster_size: 60



100%|█████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:08<00:00,  1.50s/it][A
 53%|███████████████████████████████████████████▊                                       | 19/36 [00:32<00:29,  1.73s/it]

silhouette_score: 0.06710000336170197
adjusted_rand_score: 0.4973
adjusted_mutual_info_score: 0.604
average_topic_coherence: 0.8305
silhouette_score: 0.06710000336170197
adjusted_rand_score: 0.4973
adjusted_mutual_info_score: 0.604
average_topic_coherence: 0.8305
--------------------------------------------------------------------------------
dim_red: pca, n_comp: 10, clustering: spectral


 56%|██████████████████████████████████████████████                                     | 20/36 [00:34<00:27,  1.71s/it]

silhouette_score: 0.13510000705718994
adjusted_rand_score: 0.8901
adjusted_mutual_info_score: 0.847
average_topic_coherence: 0.7365
--------------------------------------------------------------------------------
dim_red: pca, n_comp: 50, clustering: kmeans


 58%|████████████████████████████████████████████████▍                                  | 21/36 [00:36<00:25,  1.70s/it]

silhouette_score: 0.1071000024676323
adjusted_rand_score: 0.3909
adjusted_mutual_info_score: 0.5266
average_topic_coherence: 0.7913
--------------------------------------------------------------------------------
dim_red: pca, n_comp: 50, clustering: gmm


 61%|██████████████████████████████████████████████████▋                                | 22/36 [00:37<00:23,  1.69s/it]

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9941
adjusted_mutual_info_score: 0.9895
average_topic_coherence: 0.8051
--------------------------------------------------------------------------------
dim_red: pca, n_comp: 50, clustering: hdbscan
Grid search for HDBSCAN parameters



  0%|                                                                                             | 0/6 [00:00<?, ?it/s][A

min_samples: 5, min_cluster_size: 10



 17%|██████████████▏                                                                      | 1/6 [00:01<00:06,  1.24s/it][A

silhouette_score: 0.05490000173449516
adjusted_rand_score: 0.4501
adjusted_mutual_info_score: 0.5814
average_topic_coherence: 0.8277
min_samples: 5, min_cluster_size: 30



 33%|████████████████████████████▎                                                        | 2/6 [00:02<00:04,  1.23s/it][A

silhouette_score: 0.05570000037550926
adjusted_rand_score: 0.4539
adjusted_mutual_info_score: 0.5901
average_topic_coherence: 0.8274
min_samples: 5, min_cluster_size: 60



 50%|██████████████████████████████████████████▌                                          | 3/6 [00:03<00:03,  1.19s/it][A

silhouette_score: 0.054099999368190765
adjusted_rand_score: 0.4467
adjusted_mutual_info_score: 0.5791
average_topic_coherence: 0.8287
min_samples: 10, min_cluster_size: 10



 67%|████████████████████████████████████████████████████████▋                            | 4/6 [00:04<00:02,  1.13s/it][A

silhouette_score: 0.016899999231100082
adjusted_rand_score: 0.2771
adjusted_mutual_info_score: 0.477
average_topic_coherence: 0.8025
min_samples: 10, min_cluster_size: 30



 83%|██████████████████████████████████████████████████████████████████████▊              | 5/6 [00:06<00:01,  1.31s/it][A

silhouette_score: 0.01549999974668026
adjusted_rand_score: 0.2718
adjusted_mutual_info_score: 0.4737
average_topic_coherence: 0.8035
min_samples: 10, min_cluster_size: 60



100%|█████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:07<00:00,  1.31s/it][A
 64%|█████████████████████████████████████████████████████                              | 23/36 [00:45<00:39,  3.01s/it]

silhouette_score: 0.015399999916553497
adjusted_rand_score: 0.2731
adjusted_mutual_info_score: 0.4815
average_topic_coherence: 0.8044
silhouette_score: 0.015399999916553497
adjusted_rand_score: 0.2731
adjusted_mutual_info_score: 0.4815
average_topic_coherence: 0.8044
--------------------------------------------------------------------------------
dim_red: pca, n_comp: 50, clustering: spectral


 67%|███████████████████████████████████████████████████████▎                           | 24/36 [00:47<00:31,  2.66s/it]

silhouette_score: 0.13040000200271606
adjusted_rand_score: 0.8118
adjusted_mutual_info_score: 0.7731
average_topic_coherence: 0.5788
--------------------------------------------------------------------------------
dim_red: pacmap, n_comp: 10, clustering: kmeans


 81%|██████████████████████████████████████████████████████████████████▊                | 29/36 [00:51<00:10,  1.53s/it]

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
--------------------------------------------------------------------------------
dim_red: pacmap, n_comp: 10, clustering: gmm


 83%|█████████████████████████████████████████████████████████████████████▏             | 30/36 [00:55<00:11,  1.92s/it]

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
--------------------------------------------------------------------------------
dim_red: pacmap, n_comp: 10, clustering: hdbscan
Grid search for HDBSCAN parameters



  0%|                                                                                             | 0/6 [00:00<?, ?it/s][A

min_samples: 5, min_cluster_size: 10



 17%|██████████████▏                                                                      | 1/6 [00:03<00:17,  3.53s/it][A

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
min_samples: 5, min_cluster_size: 30



 33%|████████████████████████████▎                                                        | 2/6 [00:06<00:13,  3.47s/it][A

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
min_samples: 5, min_cluster_size: 60



 50%|██████████████████████████████████████████▌                                          | 3/6 [00:10<00:10,  3.43s/it][A

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
min_samples: 10, min_cluster_size: 10



 67%|████████████████████████████████████████████████████████▋                            | 4/6 [00:13<00:06,  3.37s/it][A

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
min_samples: 10, min_cluster_size: 30



 83%|██████████████████████████████████████████████████████████████████████▊              | 5/6 [00:22<00:05,  5.19s/it][A

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
min_samples: 10, min_cluster_size: 60



100%|█████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:26<00:00,  4.45s/it][A
 86%|███████████████████████████████████████████████████████████████████████▍           | 31/36 [01:22<00:31,  6.34s/it]

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
--------------------------------------------------------------------------------
dim_red: pacmap, n_comp: 10, clustering: spectral


 89%|█████████████████████████████████████████████████████████████████████████▊         | 32/36 [01:29<00:26,  6.55s/it]

silhouette_score: 0.0738999992609024
adjusted_rand_score: 0.4956
adjusted_mutual_info_score: 0.6696
average_topic_coherence: 0.7573
--------------------------------------------------------------------------------
dim_red: pacmap, n_comp: 50, clustering: kmeans


 92%|████████████████████████████████████████████████████████████████████████████       | 33/36 [01:37<00:20,  6.79s/it]

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
--------------------------------------------------------------------------------
dim_red: pacmap, n_comp: 50, clustering: gmm


 94%|██████████████████████████████████████████████████████████████████████████████▍    | 34/36 [01:44<00:13,  6.89s/it]

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
--------------------------------------------------------------------------------
dim_red: pacmap, n_comp: 50, clustering: hdbscan
Grid search for HDBSCAN parameters



  0%|                                                                                             | 0/6 [00:00<?, ?it/s][A

min_samples: 5, min_cluster_size: 10



 17%|██████████████▏                                                                      | 1/6 [00:06<00:34,  6.95s/it][A

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
min_samples: 5, min_cluster_size: 30



 33%|████████████████████████████▎                                                        | 2/6 [00:14<00:30,  7.52s/it][A

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
min_samples: 5, min_cluster_size: 60



 50%|██████████████████████████████████████████▌                                          | 3/6 [00:21<00:22,  7.34s/it][A

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
min_samples: 10, min_cluster_size: 10



 67%|████████████████████████████████████████████████████████▋                            | 4/6 [00:28<00:14,  7.03s/it][A

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
min_samples: 10, min_cluster_size: 30



 83%|██████████████████████████████████████████████████████████████████████▊              | 5/6 [00:36<00:07,  7.20s/it][A

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
min_samples: 10, min_cluster_size: 60



100%|█████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:45<00:00,  7.58s/it][A
 97%|████████████████████████████████████████████████████████████████████████████████▋  | 35/36 [02:30<00:16, 16.84s/it]

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
--------------------------------------------------------------------------------
dim_red: pacmap, n_comp: 50, clustering: spectral


100%|███████████████████████████████████████████████████████████████████████████████████| 36/36 [02:39<00:00,  4.43s/it]

silhouette_score: 0.13819999992847443
adjusted_rand_score: 0.9882
adjusted_mutual_info_score: 0.979
average_topic_coherence: 0.8057
--------------------------------------------------------------------------------





In [30]:
results_banking

Unnamed: 0,Dim reduction,Clustering,Best params,Adjusted Rand,Adjusted Mutual Info,Silhouette Score,Avg topic coherence,"Dim red time, sec","Clustering time, sec"
0,-,kmeans,-,0.3873,0.5238,0.1086,0.7914,0.0,0.0229
1,-,gmm,-,0.9721,0.9557,0.1381,0.7914,0.0,0.9545
2,-,hdbscan,min_s = 5 min_cl_s = 30,0.216,0.4331,-0.0033,0.8559,0.0,0.5799
3,-,spectral,-,0.8111,0.7803,0.1308,0.598,0.0,0.092
4,"pca, feat_dim = 10",kmeans,-,0.3759,0.5089,0.1087,0.7957,0.0846,0.024
5,"pca, feat_dim = 10",gmm,-,1.0,1.0,0.1384,0.8051,0.1441,0.035
6,"pca, feat_dim = 10",hdbscan,min_s = 5 min_cl_s = 30,0.6503,0.7048,0.0926,0.7402,0.132,0.0333
7,"pca, feat_dim = 10",spectral,-,0.8901,0.847,0.1351,0.7365,0.1306,0.1137
8,"pca, feat_dim = 50",kmeans,-,0.3909,0.5266,0.1071,0.7913,0.4223,0.0242
9,"pca, feat_dim = 50",gmm,-,0.9941,0.9895,0.1382,0.8051,0.077,0.0399


# Deep clustering

To train a DeepClustering (DEC) model, the same interface can be used. 

In [53]:
model = TextClustering(
    n_clusters=NUM_CLUSTERS,
    inp_dim=INP_DIM,
    train_dataset=base_embeds,
    data_frame=base_data,
    feat_dim=10,
    kind="deep clustering",
    random_state=42
)

model.fit(base_embeds)
_, clusters = model.transform_and_cluster(base_embeds)

metrics = model.evaluate(use_true_clusters=True)

print(f"Dim red time: {model.times['dim_red']}")
print(f"Clust time: {model.times['clust']}")

Phase 1: train embeddings


  0%|          | 0/20 [00:00<?, ?it/s]

Phase 2: train clusters


  0%|          | 0/8 [00:00<?, ?it/s]

silhouette_score: 0.04659999907016754
adjusted_rand_score: 0.3495
adjusted_mutual_info_score: 0.6227
average_topic_coherence: 0.7557
Dim red time: 76.73716926574707
Clust time: 44.125908613204956


# Contrastive Hierarchical Clustering

CoHiClust model can be trained using the same TextClustering wrapper. For this model, you should use OmegaConf configuration file (they are available for Banking and Demo datasets or you can create your own). 

In [57]:
dataset_name = "banking" + str(NUM_CLUSTERS)
cohiclust_cfg = OmegaConf.load(f'cfg/{dataset_name}.yaml')
cohiclust_cfg

{'model': {'name': 'custom', 'inp_dim': 1024, 'out_dim': 128, 'drop_prob': 0.25, 'linear_dims_list': [2048, 2048, 1024]}, 'dataset': {'dataset_name': 'banking10', 'number_classes': 5, 'n_neighb': 5}, 'tree': {'tree_level': 4}, 'simclr': {'temperature': 0.5, 'k': 10, 'feature_dim_projection_head': 128}, 'training': {'epochs': 50, 'batch_size': 64, 'pretraining_epochs': 20, 'start_pruning_epochs': 30, 'leaves_to_prune': 6}}

In [77]:
model = TextClustering(
    n_clusters=NUM_CLUSTERS,
    inp_dim=INP_DIM,
    train_dataset=base_embeds,
    data_frame=base_data,
    cohiclust_cfg=cohiclust_cfg,
    feat_dim=10,
    kind="cohiclust",
    random_state=42
)

model.fit(base_embeds)
_, clusters = model.transform_and_cluster(base_embeds)

metrics = model.evaluate(use_true_clusters=True)

print(f"Dim red time: {model.times['dim_red']}")
print(f"Clust time: {model.times['clust']}")

Computing approximate KNN matrix
Computing approximate KNN matrix
Computing approximate KNN matrix


  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/20 [00:00<?, ?it/s]

  0%|          | 0/9 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

  0%|          | 0/29 [00:00<?, ?it/s]

silhouette_score: 0.055399999022483826
adjusted_rand_score: 0.7863
adjusted_mutual_info_score: 0.8578
average_topic_coherence: 0.7775
Dim red time: 0
Clust time: 70.25820064544678
