[SPARK-9680][MLlib][Doc] StopWordsRemovers user guide and Java compatibility test #8436

feynmanliang · 2015-08-25T21:52:02Z

Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
Cleans up scaladocs for public methods
Adds test for Java compatibility
Follow up Python user guide code example is tracked by SPARK-10249

feynmanliang · 2015-08-25T21:52:27Z

docs/ml-features.md

+
+remover.transform(dataset).show();
+{% endhighlight %}
+</div>


TODO: add Python example

Actually no Python example is possible until Python API is added (SPARK-9679, #8118); this TODO will be tracked by SPARK-10249

feynmanliang · 2015-08-25T22:13:24Z

Jenkins retest this please

jkbradley · 2015-08-25T23:21:46Z

I'll take a look

jkbradley · 2015-08-25T23:33:14Z

docs/ml-features.md

+frequently and don't carry as much meaning.
+
+`StopWordsRemover` takes as input a sequence of strings (e.g. the output
+of a [Tokenizer](ml-features.html#tokenizer) and drops all the stop


closing parenthesis needed after Tokenizer link for "e.g." clause

jkbradley · 2015-08-25T23:33:31Z

That's it!

SparkQA · 2015-08-25T23:45:20Z

Test build #41570 has finished for PR 8436 at commit 28a3deb.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- abstract class SetOperation(left: LogicalPlan, right: LogicalPlan) extends BinaryNode
- case class Union(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)
- case class Intersect(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)
- case class Except(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)

SparkQA · 2015-08-26T00:37:15Z

Test build #41564 has finished for PR 8436 at commit 28a3deb.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class JavaStopWordsRemoverSuite

SparkQA · 2015-08-26T06:06:29Z

Test build #41598 has finished for PR 8436 at commit 5169ce0.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class JavaStopWordsRemoverSuite

jkbradley · 2015-08-26T17:32:49Z

docs/ml-features.md

+[`StopWordsRemover`](api/java/org/apache/spark/ml/feature/StopWordsRemover.html)
+takes an input column name, an output column name, a list of stop words,
+and a boolean indicating if the matches should be case sensitive (false
+by default.


close paren after "default"

SparkQA · 2015-08-27T17:36:12Z

Test build #41702 has finished for PR 8436 at commit 074583e.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- public class JavaStopWordsRemoverSuite

mengxr · 2015-08-27T21:08:29Z

docs/ml-features.md

+words from the input sequences. The list of stopwords is specified by
+the `stopWords` parameter.  We provide a list of stop words created by
+the [Glasgow Information Retrieval
+Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in


Shall we put the link on a list of stop words?

SparkQA · 2015-08-27T22:22:05Z

Test build #41708 has finished for PR 8436 at commit 24eba04.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- An [n-gram](https://en.wikipedia.org/wiki/N-gram) is a sequence of $n$ tokens (typically words) for some integer $n$. TheNGramclass can be used to transform input features into $n$-grams.
- public class JavaStopWordsRemoverSuite

mengxr · 2015-08-27T23:10:54Z

LGTM. Merged into master and branch-1.5. Thanks!

…atibility test * Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine * Cleans up scaladocs for public methods * Adds test for Java compatibility * Follow up Python user guide code example is tracked by SPARK-10249 Author: Feynman Liang <fliang@databricks.com> Closes #8436 from feynmanliang/SPARK-10230. (cherry picked from commit 5bfe9e1) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Feynman Liang added 2 commits August 25, 2015 14:02

StopWordsRemover Java Compatibility Test

8034e29

Adds javadocs

28a3deb

feynmanliang reviewed Aug 25, 2015
View reviewed changes

feynmanliang closed this Aug 25, 2015

feynmanliang deleted the SPARK-10230 branch August 25, 2015 22:56

feynmanliang restored the SPARK-10230 branch August 25, 2015 22:59

feynmanliang reopened this Aug 25, 2015

jkbradley reviewed Aug 25, 2015
View reviewed changes

Feynman Liang added 2 commits August 25, 2015 22:15

Code review

a8670e0

Adds spaces

5169ce0

jkbradley reviewed Aug 26, 2015
View reviewed changes

Close paren

074583e

mengxr reviewed Aug 27, 2015
View reviewed changes

Code review fixes

24eba04

asfgit closed this in 5bfe9e1 Aug 27, 2015

feynmanliang deleted the SPARK-10230 branch January 13, 2016 19:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-9680][MLlib][Doc] StopWordsRemovers user guide and Java compatibility test #8436

[SPARK-9680][MLlib][Doc] StopWordsRemovers user guide and Java compatibility test #8436

feynmanliang commented Aug 25, 2015

feynmanliang Aug 25, 2015

feynmanliang Aug 25, 2015

feynmanliang commented Aug 25, 2015

jkbradley commented Aug 25, 2015

jkbradley Aug 25, 2015

feynmanliang Aug 26, 2015

jkbradley commented Aug 25, 2015

SparkQA commented Aug 25, 2015

SparkQA commented Aug 26, 2015

SparkQA commented Aug 26, 2015

jkbradley Aug 26, 2015

SparkQA commented Aug 27, 2015

mengxr Aug 27, 2015

feynmanliang Aug 27, 2015

SparkQA commented Aug 27, 2015

mengxr commented Aug 27, 2015

[SPARK-9680][MLlib][Doc] StopWordsRemovers user guide and Java compatibility test #8436

[SPARK-9680][MLlib][Doc] StopWordsRemovers user guide and Java compatibility test #8436

Conversation

feynmanliang commented Aug 25, 2015

feynmanliang Aug 25, 2015

Choose a reason for hiding this comment

feynmanliang Aug 25, 2015

Choose a reason for hiding this comment

feynmanliang commented Aug 25, 2015

jkbradley commented Aug 25, 2015

jkbradley Aug 25, 2015

Choose a reason for hiding this comment

feynmanliang Aug 26, 2015

Choose a reason for hiding this comment

jkbradley commented Aug 25, 2015

SparkQA commented Aug 25, 2015

SparkQA commented Aug 26, 2015

SparkQA commented Aug 26, 2015

jkbradley Aug 26, 2015

Choose a reason for hiding this comment

SparkQA commented Aug 27, 2015

mengxr Aug 27, 2015

Choose a reason for hiding this comment

feynmanliang Aug 27, 2015

Choose a reason for hiding this comment

SparkQA commented Aug 27, 2015

mengxr commented Aug 27, 2015