This repository contains the source code used for the paper:
Re-Ranking Words to Improve Interpretability of Automatically Generated Topics (2019) Areej Alokaili, Nikolaos Aletras and Mark Stevenson in Proceedings of the 13th International Conference on Computational Semantics - Long Papers, pp43-54, Gothenburg, Sweden.
The code executes the following steps:
- Load and preprocess data.
- Train an LDA model on the processed data.
- Display the topics learned with their default order.
- Rerank topic words using one of the ranking methods.
- Display the re-ranked topics.
- data.txt: contains sample input data extracted from 20 Newsgroups. Note: this is not the dataset used in the paper as those datasets are not available for public distribution.
- reranking_main.py : loads and preprocesses the data, learns topics using Gensim and finally re-ranking the topics terms to improve their interpretability.
- methods.py: contains all methods used by reranking_main.py.
Running the code
- Clone or Download this project
- Set up the parameters in reranking_main.py
- Excute reranking_main.py
- The code will display:
- The learned latent topics with their words ranked using the original method "Rank_orig".
- The topics with thier words re-ranked using one of the methods described in the paper.
- The average coherence of the topics is also shown.