Common-sense Topic Modeling

This repository contains the code for reproducing the results reported in the paper "Discovering Interpretable Topicsby Leveraging Common Sense Knowledge" (under review).

Dependencies

Before running any code in this repo, please install the following dependencies:

numpy
pandas
gensim
faiss
nltk
sklearn
tqdm

train.py

This is the master script for training all models.

usage: train.py [-h] [-m {lda,kmeans,gmm,nmf}] [-n NUMBER_OF_TOPICS] [-t THRESH] [-c CLASSES] -d DATASET_PATH [-i ITERATIONS] [-p PREPROCESSING] [-r REDO] [-s SAVE_PATH] [-e CACHE_PATH]
                [-x] [-g GLOVE_PATH]

Topic Model Training

optional arguments:
  -h, --help            show this help message and exit
  -m {lda,kmeans,gmm,nmf}, --model {lda,kmeans,gmm,nmf}
                        Which model to user
  -n NUMBER_OF_TOPICS, --number_of_topics NUMBER_OF_TOPICS
                        Number of topics ('auto' = number of labels)
  -t THRESH, --thresh THRESH
                        Cut-off threshold for numberbatch similarity
  -c CLASSES, --classes CLASSES
                        Specify which labels to keep (comma-separated, or 'all')
  -d DATASET_PATH, --dataset_path DATASET_PATH
                        Path to the dataset CSV
  -i ITERATIONS, --iterations ITERATIONS
                        Maximum number of iterations
  -p PREPROCESSING, --preprocessing PREPROCESSING
                        Specify which preprocessing pipeline to use
  -r REDO, --redo REDO  Number of runs (with different seeds) to do
  -s SAVE_PATH, --save_path SAVE_PATH
                        Path to save the results
  -e CACHE_PATH, --cache_path CACHE_PATH
                        Where to cache preprocessed data
  -x, --external_vocab  Whether or not to add external vocabulary
  -g GLOVE_PATH, --glove_path GLOVE_PATH
                        Path to pickled GloVe embeddings

Human Evaluation

Under human_eval, you can find the results from the end-user surveys done to evaluate the performance of different topic models on topic interpretability tasks. generate_test.ipynb allows to generate the questions randomly from the resulting trained topics.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
human_eval		human_eval
LICENSE		LICENSE
README.md		README.md
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

human_eval

human_eval

LICENSE

LICENSE

README.md

README.md

train.py

train.py

Repository files navigation

Common-sense Topic Modeling

Dependencies

train.py

Human Evaluation

About

Releases

Packages

Languages

License

D2KLab/CSTM

Folders and files

Latest commit

History

Repository files navigation

Common-sense Topic Modeling

Dependencies

train.py

Human Evaluation

About

Resources

License

Stars

Watchers

Forks

Languages