Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9679][ML][PYSPARK] Add Python API for Stop Words Remover #8118

Closed

Conversation

holdenk
Copy link
Contributor

@holdenk holdenk commented Aug 12, 2015

Add a python API for the Stop Words Remover.

@holdenk
Copy link
Contributor Author

holdenk commented Aug 12, 2015

jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #40679 has finished for PR 8118 at commit ec6baab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • * Set thresholds in multiclass (or binary) classification to adjust the probability of
    • class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):

@holdenk
Copy link
Contributor Author

holdenk commented Aug 13, 2015

jenkins, retest this please

@holdenk
Copy link
Contributor Author

holdenk commented Aug 13, 2015

jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #40830 has finished for PR 8118 at commit b98bb1d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 18, 2015

Test build #41169 has finished for PR 8118 at commit d8e3672.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@keyword_only
def __init__(self, inputCol=None, outputCol=None, stopWords=[]):
"""
Initialize this instace of the StopWordsRemover.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this __init__ doc string breaks the pattern of just repeating the method with default args seen elsewhere in feature.py?

@SparkQA
Copy link

SparkQA commented Aug 26, 2015

Test build #41654 has finished for PR 8118 at commit 84bc507.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):

"sensitive comparison over the stop words")
stopWordsObj = _jvm().org.apache.spark.ml.feature.StopWords
defaultStopWords = stopWordsObj.ENGLISH_STOP_WORDS()
print "Constructing java param pair for value "+str(defaultStopWords)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these prints intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no, I was checking the type when debugging something

@feynmanliang
Copy link
Contributor

some small comments, LGTM after they're fixed

@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #41672 has finished for PR 8118 at commit acfc9fe.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):

@holdenk
Copy link
Contributor Author

holdenk commented Aug 27, 2015

jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 27, 2015

Test build #41679 has finished for PR 8118 at commit 53f97b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):

@@ -29,14 +29,14 @@ import org.apache.spark.sql.types.{ArrayType, StringType, StructField, StructTyp
/**
* stop words list
*/
private object StopWords {
protected[spark] object StopWords {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private[spark] should be the same but appears more often


/**
* Use the same default stopwords list as scikit-learn.
* The original list can be found from "Glasgow Information Retrieval Group"
* [[http://ir.dcs.gla.ac.uk/resources/linguistic_utils/stop_words]]
*/
val EnglishStopWords = Array( "a", "about", "above", "across", "after", "afterwards", "again",
val ENGLISH_STOP_WORDS = Array( "a", "about", "above", "across", "after", "afterwards", "again",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: Since the object is already StopWords, would English be sufficient? We didn't use ENGLISH_STOP_WORDS because it is a mutable array.

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41761 has finished for PR 8118 at commit 7767df0.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):

@SparkQA
Copy link

SparkQA commented Aug 28, 2015

Test build #41764 has finished for PR 8118 at commit 345bde2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):

@mengxr
Copy link
Contributor

mengxr commented Aug 31, 2015

LGTM except a minor issue on the test code style.

@SparkQA
Copy link

SparkQA commented Sep 1, 2015

Test build #41848 has finished for PR 8118 at commit 62b821a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DCT(JavaTransformer, HasInputCol, HasOutputCol):
    • class SQLTransformer(JavaTransformer):
    • class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol):

@asfgit asfgit closed this in e6e483c Sep 1, 2015
@mengxr
Copy link
Contributor

mengxr commented Sep 1, 2015

Merged into master. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants