Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12711][ML] ML StopWordsRemover does not protect itself from column name duplication #10741

Closed

Conversation

grzegorz-chilkiewicz
Copy link
Contributor

Fixes problem and verifies fix by test suite.
Also - adds optional parameter: nullable (Boolean) to: SchemaUtils.appendColumn
and deduplicates SchemaUtils.appendColumn functions.

@jkbradley
Copy link
Member

Could you please add tag "[ML]" to the PR title?

@@ -89,4 +89,22 @@ class StopWordsRemoverSuite
.setCaseSensitive(true)
testDefaultReadWrite(t)
}

test("StopWordsRemover output column already exists") {
val outpuCol = "expected"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "outputCol"

@jkbradley
Copy link
Member

Also, please add a (short) PR description (in your first PR comment) since that will become part of the commit message.

@SparkQA
Copy link

SparkQA commented Jan 13, 2016

Test build #2377 has finished for PR 10741 at commit ca9a852.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…lumn name duplication

Fixes problem and verifies fix by test suite.
Also - adds optional parameter nullable (Boolean) to: SchemaUtils.appendColumn
and deduplicates SchemaUtils.appendColumn functions.
@grzegorz-chilkiewicz grzegorz-chilkiewicz changed the title [SPARK-12711] ML StopWordsRemover does not protect itself from column name duplication [SPARK-12711][ML] ML StopWordsRemover does not protect itself from column name duplication Jan 14, 2016
@grzegorz-chilkiewicz
Copy link
Contributor Author

Is everything ok with this PR?
Did I miss something?
Can we merge it with apache/spark ?

@jkbradley
Copy link
Member

LGTM
I'll run tests before merging

@SparkQA
Copy link

SparkQA commented Jan 26, 2016

Test build #2461 has finished for PR 10741 at commit 37af391.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val remover = new StopWordsRemover()
.setInputCol("raw")
.setOutputCol(outputCol)
val dataSet = sqlContext.createDataFrame(Seq(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just copy one of the datasets from an above test. That should fix the error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed out that second column in dataSet was totally empty... - and that was the problem...
I do not want to make that example too complicated, because this test does not even check correctness of execution result
I'm sorry for problems

@marmbrus
Copy link
Contributor

marmbrus commented Feb 2, 2016

ok to test

@jkbradley
Copy link
Member

LGTM pending tests

@SparkQA
Copy link

SparkQA commented Feb 2, 2016

Test build #50573 has finished for PR 10741 at commit 49fd362.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jkbradley
Copy link
Member

Merging with master and branch-1.6
Thanks for the PR!

asfgit pushed a commit that referenced this pull request Feb 2, 2016
…lumn name duplication

Fixes problem and verifies fix by test suite.
Also - adds optional parameter: nullable (Boolean) to: SchemaUtils.appendColumn
and deduplicates SchemaUtils.appendColumn functions.

Author: Grzegorz Chilkiewicz <grzegorz.chilkiewicz@codilime.com>

Closes #10741 from grzegorz-chilkiewicz/master.

(cherry picked from commit b1835d7)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
@asfgit asfgit closed this in b1835d7 Feb 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants