-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19397] [SQL] Make option names of LIBSVM and TEXT case insensitive #16737
Conversation
Test build #72143 has finished for PR 16737 at commit
|
Oh, then, are these the last piece of the case-sensitive options? |
import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap | ||
|
||
/** | ||
* Options for the Text data source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Options for the LibSVM data source.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh. Thanks!
+1. LGTM except one typo! @gatorsmile |
I think |
Test build #72238 has finished for PR 16737 at commit
|
Please hold on this PR. Found a serious bug to fix in case insensitive option support. |
what is the bug? |
: ) When @sureshthalamati tried to fix the docker test failure in Oracle, he found the new code change in Spark 2.1 makes JDBC option key values always lower case. However, Oracle's connection properties are case sensitive. Thus, the changes caused the regression. Later, we realized the data source resolution always set the options to lower cases, as discussed in #14773. Thus, I think it is ok to do it for consistency, but we need to document the workaround if users hit this issue for JDBC sources. @sureshthalamati is doing it. Thus, this PR is ready to review. Thanks! |
} | ||
|
||
private[libsvm] object LibSVMOptions { | ||
val NUMFEATURES = "numFeatures" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: NUM_FEATURES
Test build #72476 has finished for PR 16737 at commit
|
* Number of features. If unspecified or nonpositive, the number of features will be determined | ||
* automatically at the cost of one additional pass. | ||
*/ | ||
val numFeatures = parameters.get(NUM_FEATURES).map(_.toInt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about parameters.get(NUM_FEATURES).map(_.toInt).filter(_ > 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@@ -125,6 +124,25 @@ class TextSuite extends QueryTest with SharedSQLContext { | |||
} | |||
} | |||
|
|||
test("case insensitive option") { | |||
val extraOptions = Map[String, String]( | |||
"mApReDuCe.output.fileoutputformat.compress" -> "true", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we lower case these extra options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is it because hadoopConf is case insensitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case becomes lower cases when we build the DataSource.
Test build #72528 has finished for PR 16737 at commit
|
LGTM, merging to master! |
### What changes were proposed in this pull request? Prior to Spark 2.1, the option names are case sensitive for all the formats. Since Spark 2.1, the option key names become case insensitive except the format `Text` and `LibSVM `. This PR is to fix these issues. Also, add a check to know whether the input option vector type is legal for `LibSVM`. ### How was this patch tested? Added test cases Author: gatorsmile <gatorsmile@gmail.com> Closes apache#16737 from gatorsmile/libSVMTextOptions.
What changes were proposed in this pull request?
Prior to Spark 2.1, the option names are case sensitive for all the formats. Since Spark 2.1, the option key names become case insensitive except the format
Text
andLibSVM
. This PR is to fix these issues.Also, add a check to know whether the input option vector type is legal for
LibSVM
.How was this patch tested?
Added test cases