Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read this #4

Closed
7 tasks done
amritbhanu opened this issue Apr 5, 2016 · 3 comments
Closed
7 tasks done

Read this #4

amritbhanu opened this issue Apr 5, 2016 · 3 comments
Labels

Comments

@amritbhanu amritbhanu added the Work label Apr 5, 2016
@amritbhanu
Copy link
Contributor Author

amritbhanu commented Apr 11, 2016

do people tune lda parameters or even check stability?

@timm
Copy link

timm commented Apr 12, 2016

But it is in very initial phase, not much work seen.

ok. now you've read the papers...

  • how to measure "coherence"
  • list 10 ways to improve them sorted by ease of implementation, who has tried what before, what might work best

@amritbhanu
Copy link
Contributor Author

Will have to code the LDA from scratch so that we have flexibility. This will make any implementation easier.
Possible Approaches sorted by ease of implementation are:

  • Cohesion (intra) and separation (inter).
  • direct appraoch, asking people about topics using Amazon Mechanical Turk Machine Reading Tea Leaves #10
  • Perplexity shows how well topic-word and word-document distributions predict new test samples.
  • Computing the harmonic mean of posterior distribution.
  • pointwise mutual information (PMI) between the topic words
  • Each term was represented as a vector in a semantic space, with topic coherence calculated as mean pairwise vector similarity. (Word2vec)
  • symmetric Kullback–Leibler divergence

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants