Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-19397] [SQL] Make option names of LIBSVM and TEXT case insensitive #16737

Closed
wants to merge 4 commits into from

Conversation

gatorsmile
Copy link
Member

What changes were proposed in this pull request?

Prior to Spark 2.1, the option names are case sensitive for all the formats. Since Spark 2.1, the option key names become case insensitive except the format Text and LibSVM . This PR is to fix these issues.

Also, add a check to know whether the input option vector type is legal for LibSVM.

How was this patch tested?

Added test cases

@SparkQA
Copy link

SparkQA commented Jan 30, 2017

Test build #72143 has finished for PR 16737 at commit 297f87a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan @yhuai @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Oh, then, are these the last piece of the case-sensitive options?

import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap

/**
* Options for the Text data source.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Options for the LibSVM data source.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh. Thanks!

@dongjoon-hyun
Copy link
Member

+1. LGTM except one typo! @gatorsmile

@gatorsmile
Copy link
Member Author

gatorsmile commented Feb 1, 2017

I think LIBSVM and TEXT are the last two built-in sources that do not support case insensitive options.

@SparkQA
Copy link

SparkQA commented Feb 1, 2017

Test build #72238 has finished for PR 16737 at commit c2c145d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Please hold on this PR. Found a serious bug to fix in case insensitive option support.

@cloud-fan
Copy link
Contributor

what is the bug?

@gatorsmile
Copy link
Member Author

gatorsmile commented Feb 3, 2017

: ) When @sureshthalamati tried to fix the docker test failure in Oracle, he found the new code change in Spark 2.1 makes JDBC option key values always lower case. However, Oracle's connection properties are case sensitive. Thus, the changes caused the regression.

Later, we realized the data source resolution always set the options to lower cases, as discussed in #14773. Thus, I think it is ok to do it for consistency, but we need to document the workaround if users hit this issue for JDBC sources. @sureshthalamati is doing it.

Thus, this PR is ready to review. Thanks!

}

private[libsvm] object LibSVMOptions {
val NUMFEATURES = "numFeatures"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: NUM_FEATURES

@SparkQA
Copy link

SparkQA commented Feb 7, 2017

Test build #72476 has finished for PR 16737 at commit 07beaad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* Number of features. If unspecified or nonpositive, the number of features will be determined
* automatically at the cost of one additional pass.
*/
val numFeatures = parameters.get(NUM_FEATURES).map(_.toInt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about parameters.get(NUM_FEATURES).map(_.toInt).filter(_ > 0)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -125,6 +124,25 @@ class TextSuite extends QueryTest with SharedSQLContext {
}
}

test("case insensitive option") {
val extraOptions = Map[String, String](
"mApReDuCe.output.fileoutputformat.compress" -> "true",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we lower case these extra options?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is it because hadoopConf is case insensitive?

Copy link
Member Author

@gatorsmile gatorsmile Feb 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case becomes lower cases when we build the DataSource.

@SparkQA
Copy link

SparkQA commented Feb 7, 2017

Test build #72528 has finished for PR 16737 at commit ec6eb6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@asfgit asfgit closed this in e33aaa2 Feb 8, 2017
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
### What changes were proposed in this pull request?
Prior to Spark 2.1, the option names are case sensitive for all the formats. Since Spark 2.1, the option key names become case insensitive except the format `Text` and `LibSVM `. This PR is to fix these issues.

Also, add a check to know whether the input option vector type is legal for `LibSVM`.

### How was this patch tested?
Added test cases

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#16737 from gatorsmile/libSVMTextOptions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants