-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12711][ML] ML StopWordsRemover does not protect itself from column name duplication #10741
Conversation
Could you please add tag "[ML]" to the PR title? |
@@ -89,4 +89,22 @@ class StopWordsRemoverSuite | |||
.setCaseSensitive(true) | |||
testDefaultReadWrite(t) | |||
} | |||
|
|||
test("StopWordsRemover output column already exists") { | |||
val outpuCol = "expected" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "outputCol"
Also, please add a (short) PR description (in your first PR comment) since that will become part of the commit message. |
Test build #2377 has finished for PR 10741 at commit
|
…lumn name duplication Fixes problem and verifies fix by test suite. Also - adds optional parameter nullable (Boolean) to: SchemaUtils.appendColumn and deduplicates SchemaUtils.appendColumn functions.
Is everything ok with this PR? |
LGTM |
Test build #2461 has finished for PR 10741 at commit
|
val remover = new StopWordsRemover() | ||
.setInputCol("raw") | ||
.setOutputCol(outputCol) | ||
val dataSet = sqlContext.createDataFrame(Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just copy one of the datasets from an above test. That should fix the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed out that second column in dataSet was totally empty... - and that was the problem...
I do not want to make that example too complicated, because this test does not even check correctness of execution result
I'm sorry for problems
ok to test |
LGTM pending tests |
Test build #50573 has finished for PR 10741 at commit
|
Merging with master and branch-1.6 |
…lumn name duplication Fixes problem and verifies fix by test suite. Also - adds optional parameter: nullable (Boolean) to: SchemaUtils.appendColumn and deduplicates SchemaUtils.appendColumn functions. Author: Grzegorz Chilkiewicz <grzegorz.chilkiewicz@codilime.com> Closes #10741 from grzegorz-chilkiewicz/master. (cherry picked from commit b1835d7) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Fixes problem and verifies fix by test suite.
Also - adds optional parameter: nullable (Boolean) to: SchemaUtils.appendColumn
and deduplicates SchemaUtils.appendColumn functions.