This repo forked from TopMost!

Preparing libraries

Python 3.10

Install the following libraries

numpy 1.26.4
torch_kmeans 0.2.0
pytorch 2.2.0
sentence_transformers 2.2.2
scipy 1.10
bertopic 0.16.0
gensim 4.2.0

Install java
Download this java jar to ./evaluations/pametto.jar
Download and extract this processed Wikipedia corpus to ./data/wikipedia/ as an external reference corpus.
```
    |- wikipedia
        |- wikipedia_bd
            |- ...
            |- wikipedia_bd.histogram
```

Note: step 1, 2, and 3 can be done if conda is installed and run bash setupenv.sh

Usage

To run and evaluate our model, run the following command:

python main.py    --dataset [20NG|YahooAnswers|IMDB|AGNews] \
                    --model OTClusterTM \
                    --num_topics 50 \
                    --num_groups [20|10|2,3,4|2,3] \
                    --dropout 0 \
                    --seed 0 \
                    --beta_temp 0.2 \
                    --epochs 500 --device cuda --lr 0.002 --lr_scheduler StepLR \
                    --batch_size 200 --lr_step_size 125 --use_pretrainWE  \
                    --weight_ECR 250 --alpha_ECR 20 \
                    --weight_DCR 40 --alpha_DCR 20 \
                    --weight_TCR 200 --alpha_TCR 20 \
                    --wandb_prj [Name of project to save on wandb] \

Parameters

20NG: ~ 250
YahooAnswers: ~ 40|60
IMDB: ~100-150
AGNews: ~100-150
The alpha_*CRs should keep to be equal to 20

Acknowledgement

Some part of this implementation is based on TopMost. We also utilizes Palmetto for the evaluation of topic coherence.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
data		data
docs		docs
tests		tests
topmost		topmost
tutorials		tutorials
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
OTClusterTM_extractclusters.py		OTClusterTM_extractclusters.py
README.md		README.md
TopiCOT_v3.pdf		TopiCOT_v3.pdf
code.txt		code.txt
eval_grouping.py		eval_grouping.py
eval_run.py		eval_run.py
kmeanclustering.py		kmeanclustering.py
main.py		main.py
plmclustering.py		plmclustering.py
prep_20NG.py		prep_20NG.py
prep_IMDB.py		prep_IMDB.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
setup.py		setup.py
setupenv.sh		setupenv.sh
test.py		test.py
viz_OTClusterTM.py		viz_OTClusterTM.py
z_note.txt		z_note.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This repo forked from TopMost!

Preparing libraries

Usage

Parameters

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

HoangTran223/TopiCOT_Mine_TRAM

Folders and files

Latest commit

History

Repository files navigation

This repo forked from TopMost!

Preparing libraries

Usage

Parameters

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages