Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9680][MLlib][Doc] StopWordsRemovers user guide and Java compatibility test #8436

Closed
wants to merge 6 commits into from

Conversation

feynmanliang
Copy link
Contributor

  • Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
  • Cleans up scaladocs for public methods
  • Adds test for Java compatibility
  • Follow up Python user guide code example is tracked by SPARK-10249


remover.transform(dataset).show();
{% endhighlight %}
</div>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add Python example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no Python example is possible until Python API is added (SPARK-9679, #8118); this TODO will be tracked by SPARK-10249

@feynmanliang
Copy link
Contributor Author

Jenkins retest this please

@feynmanliang feynmanliang deleted the SPARK-10230 branch August 25, 2015 22:56
@feynmanliang feynmanliang restored the SPARK-10230 branch August 25, 2015 22:59
@feynmanliang feynmanliang reopened this Aug 25, 2015
@jkbradley
Copy link
Member

I'll take a look

frequently and don't carry as much meaning.

`StopWordsRemover` takes as input a sequence of strings (e.g. the output
of a [Tokenizer](ml-features.html#tokenizer) and drops all the stop
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closing parenthesis needed after Tokenizer link for "e.g." clause

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@jkbradley
Copy link
Member

That's it!

@SparkQA
Copy link

SparkQA commented Aug 25, 2015

Test build #41570 has finished for PR 8436 at commit 28a3deb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class SetOperation(left: LogicalPlan, right: LogicalPlan) extends BinaryNode
    • case class Union(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)
    • case class Intersect(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)
    • case class Except(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)

@SparkQA
Copy link

SparkQA commented Aug 26, 2015

Test build #41564 has finished for PR 8436 at commit 28a3deb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaStopWordsRemoverSuite

@SparkQA
Copy link

SparkQA commented Aug 26, 2015

Test build #41598 has finished for PR 8436 at commit 5169ce0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaStopWordsRemoverSuite

[`StopWordsRemover`](api/java/org/apache/spark/ml/feature/StopWordsRemover.html)
takes an input column name, an output column name, a list of stop words,
and a boolean indicating if the matches should be case sensitive (false
by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

close paren after "default"

@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #41702 has finished for PR 8436 at commit 074583e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaStopWordsRemoverSuite

words from the input sequences. The list of stopwords is specified by
the `stopWords` parameter. We provide a list of stop words created by
the [Glasgow Information Retrieval
Group](http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words) in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we put the link on a list of stop words?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #41708 has finished for PR 8436 at commit 24eba04.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • An [n-gram](https://en.wikipedia.org/wiki/N-gram) is a sequence of $n$ tokens (typically words) for some integer $n$. TheNGramclass can be used to transform input features into $n$-grams.
    • public class JavaStopWordsRemoverSuite

@mengxr
Copy link
Contributor

mengxr commented Aug 27, 2015

LGTM. Merged into master and branch-1.5. Thanks!

@asfgit asfgit closed this in 5bfe9e1 Aug 27, 2015
asfgit pushed a commit that referenced this pull request Aug 27, 2015
…atibility test

* Adds user guide for ml.feature.StopWordsRemovers, ran code examples on my machine
* Cleans up scaladocs for public methods
* Adds test for Java compatibility
* Follow up Python user guide code example is tracked by SPARK-10249

Author: Feynman Liang <fliang@databricks.com>

Closes #8436 from feynmanliang/SPARK-10230.

(cherry picked from commit 5bfe9e1)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@feynmanliang feynmanliang deleted the SPARK-10230 branch January 13, 2016 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants