(Old! Deprecated!) topic modeling in Python.
Python
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
texts
.gitignore
README.md
document-topic.txt
lda.py
main.py
stopwords.txt
stopwords_shortlist.txt
topic-word.txt
utils.py

README.md

This project implements Gibbs sampling inference to LDA(Latent Dirichlet Allocation).

To-do:

  • Chenk convergence
  • speed up Gibbs sampling process

Reference:

@article{heinrich2005parameter, title={Parameter estimation for text analysis}, author={Heinrich, G.}, journal={Web: http://www.arbylon.net/publications/text-est.pdf}, year={2005} }

Note:

  • The Gibbs sampling is very slow and it is hard to check convergence.
  • The result is not very good; maybe because the corpus is not very large.
  • The result can be very different in different runs.

Topic modeling tools: