This repository is the implementation of the paper "An Information-Theoretic Approach to Data Selection in Generative Topic Modeling".
Install Python 3.11.9 in your local directory and install the requirements.
Following directories should be created for our experiments.
./model_results/
-> storing the output of the experiment./dataset/
-> storing the data used for the experiment
open src/document_selection_algorithm.ipynb
.
open src/visualization.ipynb
.
Michael Evan Santoso - is0534is@ed.ritsumei.ac.jp