-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9888][MLlib]User guide for new LDA features #8254
Conversation
Test build #41059 has finished for PR 8254 at commit
|
@jkbradley do you mind reviewing? |
|
||
*Note*: LDA is a new feature with some missing functionality. In particular, it does not yet | ||
support prediction on new documents, and it does not have a Python API. These will be added in the future. | ||
* `LDAOptimizer`: Optimizer to use for learning the LDA model, either |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually just called "optimizer" in public API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
|
||
* Topics: Inferred topics, each of which is a probability distribution over terms (words). | ||
* Topic distributions for documents: For each non empty document in the training set, LDA gives a probability distribution over topics. (EM only). Note that for empty documents, we don't create the topic distributions. (EM only) | ||
* Topics correspond to cluster centers, and documents correspond to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, for the future, try not to change formatting in Markdown unnecessarily since it makes reviewing harder. There aren't style guidelines for Markdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Test build #41541 has finished for PR 8254 at commit
|
See [discussion](#8254 (comment)) CC jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8422 from feynmanliang/SPARK-10230. (cherry picked from commit 881208a) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
See [discussion](#8254 (comment)) CC jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8422 from feynmanliang/SPARK-10230.
LGTM pending tests |
Test build #41569 has finished for PR 8254 at commit
|
Merging with master and branch-1.5 |
* Adds two new sections to LDA's user guide; one for each optimizer/model * Documents new features added to LDA (e.g. topXXXperXXX, asymmetric priors, hyperpam optimization) * Cleans up a TODO and sets a default parameter in LDA code jkbradley hhbyyh Author: Feynman Liang <fliang@databricks.com> Closes #8254 from feynmanliang/SPARK-9888. (cherry picked from commit 125205c) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
See [discussion](apache/spark#8254 (comment)) CC jkbradley Author: Feynman Liang <fliang@databricks.com> Closes #8422 from feynmanliang/SPARK-10230.
@jkbradley @hhbyyh