SVI_GLDA

Gaussian LDA with Stochastic Variational Inference

This Python code implements the Gaussian LDA with Stochastic Variational Inference.

Probabilistic topic models based on latent Dirichlet Allocation is widely used to extract latent topics from document collections. In recent years, a number of extended topic models have been proposed, especially Gaussian LDA(G-LDA) has attracted a lot of attention．G-LDA integrates topic modeling with word embeddings by replacing discrete topic distribution over word types with multivariate Gaussian distribution on the word embedding space. This can reflect semantic information into topics. In this paper, we use a G-LDA for our base topic model and apply Stochastic Variational Inference (SVI), an efficient inference algorithm, to estimate topics. This encourages the model to analyze massive document collections, including those arriving in a stream.

Installation

Requirements

python 3.6.6
scipy
numpy

Files

main.py : A main script of online VI package.
Estimate.py : A package of functions for fitting G-LDA using stochastic optimization.
data.py : A package of functions for data preparation.
printtopics.py : A Python script that displays the topics fit using the functions in Estimate.py.
calc_pmi.py : A Python script that calculates the PMI (Pointwise Mutual Information) for each topic.

Quick start

Online G-LDA model

Example : python main.py -d 20ng -f ../data/corpus_20ng.txt -v ../data/vocab_20ng.txt --vec ../data/20ng_vectors_50d.txt -K 50 -t 1024 -k 1.0 -b 16

Print topics

Example : python printtopics.py -d 20ng -v ../data/vocab_20ng.txt --vec ../word_vecs/word_vecs_20ng.json -K 40 -t 1024 -k 1.0 -b 16

Calculate PMI

Example : python calc_pmi.py ../topic/topic_20ng_K40.out

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
src		src
topic		topic
word_vecs		word_vecs
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

topic

topic

word_vecs

word_vecs

README.md

README.md

Repository files navigation

SVI_GLDA

Installation

Requirements

Files

Quick start

Online G-LDA model

Print topics

Calculate PMI

About

Releases

Packages

Languages

KanaOzaki/SVI_GLDA

Folders and files

Latest commit

History

Repository files navigation

SVI_GLDA

Installation

Requirements

Files

Quick start

Online G-LDA model

Print topics

Calculate PMI

About

Resources

Stars

Watchers

Forks

Languages