-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-9680][MLlib][Doc] StopWordsRemovers user guide and Java compatibility test #8436
Conversation
feynmanliang
commented
Aug 25, 2015
- Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
- Cleans up scaladocs for public methods
- Adds test for Java compatibility
- Follow up Python user guide code example is tracked by SPARK-10249
|
||
remover.transform(dataset).show(); | ||
{% endhighlight %} | ||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: add Python example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually no Python example is possible until Python API is added (SPARK-9679, #8118); this TODO will be tracked by SPARK-10249
Jenkins retest this please |
I'll take a look |
frequently and don't carry as much meaning. | ||
|
||
`StopWordsRemover` takes as input a sequence of strings (e.g. the output | ||
of a [Tokenizer](ml-features.html#tokenizer) and drops all the stop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
closing parenthesis needed after Tokenizer link for "e.g." clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
That's it! |
Test build #41570 has finished for PR 8436 at commit
|
Test build #41564 has finished for PR 8436 at commit
|
Test build #41598 has finished for PR 8436 at commit
|
[`StopWordsRemover`](api/java/org/apache/spark/ml/feature/StopWordsRemover.html) | ||
takes an input column name, an output column name, a list of stop words, | ||
and a boolean indicating if the matches should be case sensitive (false | ||
by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
close paren after "default"
Test build #41702 has finished for PR 8436 at commit
|
words from the input sequences. The list of stopwords is specified by | ||
the `stopWords` parameter. We provide a list of stop words created by | ||
the [Glasgow Information Retrieval | ||
Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we put the link on a list of stop words
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
Test build #41708 has finished for PR 8436 at commit
|
LGTM. Merged into master and branch-1.5. Thanks! |
…atibility test * Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine * Cleans up scaladocs for public methods * Adds test for Java compatibility * Follow up Python user guide code example is tracked by SPARK-10249 Author: Feynman Liang <fliang@databricks.com> Closes #8436 from feynmanliang/SPARK-10230. (cherry picked from commit 5bfe9e1) Signed-off-by: Xiangrui Meng <meng@databricks.com>