Scale-Invariant Infinite Hierarchical Topic Model (ihLDA)

Paper

Shusei Eshima and Daichi Mochihashi. 2023. Scale-Invariant Infinite Hierarchical Topic Model. In Findings of the Association for Computational Linguistics: ACL 2023. Link.

Requirements

Python 3.9.6

Cython==0.29.28
gensim==4.2.0
matplotlib==3.5.1
nltk==3.7
numpy==1.23.4
pandas==1.4.2
scikit-learn==1.0.2

Usage

Data Preparation

$ python preprocessing.py

The input/sample_raw folder contains ten sample documents for testing purposes (note that this is not enough data to obtain any meaningful results).

Fitting the Model

$ python setup.py build_ext --inplace
$ python main.py --output_path ./output/

# or if the default settings are fine, just run
$ python run.py

The output folder contains the following files:

fig_tssb/: the structure of the root tree.
model/: the output is saved every 1000 iterations.
filenames.csv: the list of file names and doc_ids.
info.csv: the number of topics.
parameters.csv: the hyperparameters for each iteration.
perplexity.csv: the perplexity.
TopWords_prob.csv: the topic-word distribution of top words.
model_temp.pkl: the temporary model object. This allows us to resume the iteration, but the random seed will be reset if you resume the iteration.
txtdata.pkl: the data object.
settings.txt: the settings of the model.

Evaluation

evaluate.ipynb calls the evaluation function in evaluate_helper.py.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
input		input
.gitignore		.gitignore
README.md		README.md
dataread.pyx		dataread.pyx
evaluate.ipynb		evaluate.ipynb
evaluate_helper.py		evaluate_helper.py
hpy_cy.pxd		hpy_cy.pxd
hpy_cy.pyx		hpy_cy.pyx
htssb.py		htssb.py
htssb_cy.pxd		htssb_cy.pxd
htssb_cy.pyx		htssb_cy.pyx
ihLDA.py		ihLDA.py
ihLDA_cy.pyx		ihLDA_cy.pyx
main.py		main.py
node.py		node.py
node_cy.pxd		node_cy.pxd
node_cy.pyx		node_cy.pyx
preprocessing.py		preprocessing.py
run.py		run.py
save_model.pxd		save_model.pxd
save_model.pyx		save_model.pyx
setup.py		setup.py
tool.pxd		tool.pxd
tool.pyx		tool.pyx
tssb.py		tssb.py
tssb_cy.pxd		tssb_cy.pxd
tssb_cy.pyx		tssb_cy.pyx
visualization.py		visualization.py

Shusei-E/ihLDA

Folders and files

Latest commit

History

Repository files navigation

Scale-Invariant Infinite Hierarchical Topic Model (ihLDA)

Paper

Requirements

Usage

Data Preparation

Fitting the Model

Evaluation

About

Resources

Stars

Watchers

Forks

Languages