Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9888][MLlib]User guide for new LDA features #8254

Closed
wants to merge 3 commits into from

Conversation

feynmanliang
Copy link
Contributor

  • Adds two new sections to LDA's user guide; one for each optimizer/model
  • Documents new features added to LDA (e.g. topXXXperXXX, asymmetric priors, hyperpam optimization)
  • Cleans up a TODO and sets a default parameter in LDA code

@jkbradley @hhbyyh

@SparkQA
Copy link

SparkQA commented Aug 17, 2015

Test build #41059 has finished for PR 8254 at commit b8b9f9a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@feynmanliang
Copy link
Contributor Author

@jkbradley do you mind reviewing?


*Note*: LDA is a new feature with some missing functionality. In particular, it does not yet
support prediction on new documents, and it does not have a Python API. These will be added in the future.
* `LDAOptimizer`: Optimizer to use for learning the LDA model, either
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually just called "optimizer" in public API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


* Topics: Inferred topics, each of which is a probability distribution over terms (words).
* Topic distributions for documents: For each non empty document in the training set, LDA gives a probability distribution over topics. (EM only). Note that for empty documents, we don't create the topic distributions. (EM only)
* Topics correspond to cluster centers, and documents correspond to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, for the future, try not to change formatting in Markdown unnecessarily since it makes reviewing harder. There aren't style guidelines for Markdown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Aug 25, 2015

Test build #41541 has finished for PR 8254 at commit 7401012.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Aug 25, 2015
See [discussion](#8254 (comment))

CC jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8422 from feynmanliang/SPARK-10230.

(cherry picked from commit 881208a)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
asfgit pushed a commit that referenced this pull request Aug 25, 2015
See [discussion](#8254 (comment))

CC jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8422 from feynmanliang/SPARK-10230.
@jkbradley
Copy link
Member

LGTM pending tests

@SparkQA
Copy link

SparkQA commented Aug 25, 2015

Test build #41569 has finished for PR 8254 at commit c8a1013.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Merging with master and branch-1.5

asfgit pushed a commit that referenced this pull request Aug 26, 2015
 * Adds two new sections to LDA's user guide; one for each optimizer/model
 * Documents new features added to LDA (e.g. topXXXperXXX, asymmetric priors, hyperpam optimization)
 * Cleans up a TODO and sets a default parameter in LDA code

jkbradley hhbyyh

Author: Feynman Liang <fliang@databricks.com>

Closes #8254 from feynmanliang/SPARK-9888.

(cherry picked from commit 125205c)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@asfgit asfgit closed this in 125205c Aug 26, 2015
@feynmanliang feynmanliang deleted the SPARK-9888 branch August 26, 2015 02:36
kiszk pushed a commit to kiszk/spark-gpu that referenced this pull request Dec 26, 2015
See [discussion](apache/spark#8254 (comment))

CC jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8422 from feynmanliang/SPARK-10230.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants