New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-26310][SQL] Verify applicability of JSON options #23257
Conversation
@gatorsmile Please, have a look at this PR. |
Test build #99844 has finished for PR 23257 at commit
|
@cloud-fan What do you think of the PR? |
Test build #99881 has finished for PR 23257 at commit
|
@gatorsmile Should I close the PR or will continue with this approach? |
Test build #100204 has finished for PR 23257 at commit
|
@gatorsmile Could you look at the PR, please. |
@HyukjinKwon What do you think of this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't object to the change. My guts say this should be more generalised but on the other hand it looks an overkill. The configuration also looks an overkill - we could warn instead but on the other hand I get why it's needed.
Also, I'm not still sure about ..OptionsRead
and ..OptionsWrite
but I don't object as I said.
def checkOptions(where: String): Unit = { | ||
val wrongOptions = notApplicableOptions.filter(parameters.contains(_)) | ||
if (!wrongOptions.isEmpty && SQLConf.get.verifyDataSourceOptions) { | ||
// scalastyle:off throwerror |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need this.
@@ -172,3 +189,34 @@ private[sql] object JSONOptionsInRead { | |||
Charset.forName("UTF-32") | |||
) | |||
} | |||
|
|||
private[sql] class JSONOptionsInWrite( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't also need private[sql]
since we're already in a private package.
val wrongOptions = notApplicableOptions.filter(parameters.contains(_)) | ||
if (!wrongOptions.isEmpty && SQLConf.get.verifyDataSourceOptions) { | ||
// scalastyle:off throwerror | ||
throw new IllegalArgumentException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also consider to warn when verifyDataSourceOptions
is disabled.
@HyukjinKwon Thank you for looking at this PR. I proposed the changes because I saw a few times already when our users tried to apply datasource specific options in wrong way. Maybe this is because of Spark's documentation is not clear enough, and we should point out more precisely when particular option can be applicable. ... but I believe it makes sense to solve the problem in code.
I considered more generalized approach like creating a trait for
It was introduced only because of behavior change.
Logs are not always available to users or can be disabled. And who among users read logs ;-) . I prefer to throw an exception because it can safe user's time in debugging/trouble shooting. I cannot imagine a situation when in write options are passed as read options in purpose, and silent behavior could be expected. |
@gatorsmile Could you look at this PR, please. |
Im okie. Let me leave it to @gatorsmile and other committees. |
@@ -1635,6 +1635,14 @@ object SQLConf { | |||
"java.time.* packages are used for the same purpose.") | |||
.booleanConf | |||
.createWithDefault(false) | |||
|
|||
val VERIFY_DATASOURCE_OPTIONS = buildConf("spark.sql.verifyDataSourceOptions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a config for this? seems more like a bug fix, even if it's a behavior change, and we can be stricter about validation in Spark 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the config but @gatorsmile asked to add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I can't say I feel too strongly about it, but when would this be set to false
@gatorsmile ? this is a case where the user is already setting useless options right, so they can a) somehow discover this flag or b) just remove the options. b) seems much more likely to happen. Is the worry about legacy code or libraries that can't be easily modified? those already may not work on Spark 3.
Test build #101099 has finished for PR 23257 at commit
|
I am closing this since I don't see enthusiasm about the changes. |
What changes were proposed in this pull request?
In the PR, I propose additional verification of JSON options. In particular, detection of using
read
options inwrite
and vice versa. The verification can be disabled by new SQL configspark.sql.verifyDataSourceOptions
(true
by default). This changes should prevent misusing of JSON options, and improve usage experience of built-in datasource.How was this patch tested?
Added new test to
JacksonGeneratorSuite
to check howin read
option is handled if it is passedin write
. Also the test checks new SQL config can disable the verification.