Skip to content

HoangTran223/TopiCOT_Mine_TRAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repo forked from TopMost!

Preparing libraries

  1. Python 3.10
  2. Install the following libraries
    numpy 1.26.4
    torch_kmeans 0.2.0
    pytorch 2.2.0
    sentence_transformers 2.2.2
    scipy 1.10
    bertopic 0.16.0
    gensim 4.2.0
    
  3. Install java
  4. Download this java jar to ./evaluations/pametto.jar
  5. Download and extract this processed Wikipedia corpus to ./data/wikipedia/ as an external reference corpus.
        |- wikipedia
            |- wikipedia_bd
                |- ...
                |- wikipedia_bd.histogram
    

Note: step 1, 2, and 3 can be done if conda is installed and run bash setupenv.sh

Usage

To run and evaluate our model, run the following command:

python main.py    --dataset [20NG|YahooAnswers|IMDB|AGNews] \
                    --model OTClusterTM \
                    --num_topics 50 \
                    --num_groups [20|10|2,3,4|2,3] \
                    --dropout 0 \
                    --seed 0 \
                    --beta_temp 0.2 \
                    --epochs 500 --device cuda --lr 0.002 --lr_scheduler StepLR \
                    --batch_size 200 --lr_step_size 125 --use_pretrainWE  \
                    --weight_ECR 250 --alpha_ECR 20 \
                    --weight_DCR 40 --alpha_DCR 20 \
                    --weight_TCR 200 --alpha_TCR 20 \
                    --wandb_prj [Name of project to save on wandb] \

Parameters

  • 20NG: ~ 250
  • YahooAnswers: ~ 40|60
  • IMDB: ~100-150
  • AGNews: ~100-150
  • The alpha_*CRs should keep to be equal to 20

Acknowledgement

Some part of this implementation is based on TopMost. We also utilizes Palmetto for the evaluation of topic coherence.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published