Skip to content
This repository has been archived by the owner on Jan 26, 2021. It is now read-only.

Support asymmetric Dirichlet prior optimization #5

Open
feiga opened this issue Nov 13, 2015 · 4 comments
Open

Support asymmetric Dirichlet prior optimization #5

feiga opened this issue Nov 13, 2015 · 4 comments

Comments

@feiga
Copy link
Contributor

feiga commented Nov 13, 2015

The current released lightlda doesn't support asymmetric Dirichlet prior optimization. However, our internal practice show it would be useful to get better model with such feature (Also see this).

If anyone is interested in contributing this feature, please reply or contact us through email. We can collaborate on this.

@hiyijian
Copy link
Contributor

hiyijian commented Jan 4, 2016

Hi, guys. Thank you for your amazing work on large scale LDA.
On the other hand, I think model quality is as important as scalability. So I am very intresting in improving it. It is exciting to know asymmetric Dirichlet prior could help. Would you please to share some experience on this? I will try my best to contribute

@hiyijian
Copy link
Contributor

Hi, guys,
I finished to try to add this new feature in PR#22
This PR supports asymmetric alpha in following steps:

  1. Add two extra tables to Multiverso. One is topic frequency table, a matrix to count each topics’ frequency. The other one is doc length table, a row to count how many document is with length k.
  2. Initialize the two extra tables with random initialized documents
  3. Learn alpha distribution with the two extra table every 5 iterations
  4. Build alias table for leanred alpha distribution
  5. Sample topics with learned alpha distribution and alias table. Meanwhile, update countings of topic frequency table if necessary

To use this new feature, please just run with an extra option "-num_alpha_iterations".

Please notice that there are two TODOs. One is Evaluation in asymmetric prior mode, the other is Inference with asymmetric prior.

@feiga
Copy link
Contributor Author

feiga commented Jan 19, 2016

Thanks, Jianyi! I will review the code.

@hiyijian
Copy link
Contributor

@feiga , I am sorry that I made a mistake when updating topic-frequency-table. I fixed it and commit to PR#22.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants