-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources do not respect SessionConfigSupport #22529
Conversation
…igSupport This PR proposes to respect `SessionConfigSupport` in SS datasources as well. Currently these are only respected in batch sources: https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L198-L203 https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L244-L249 If a developer makes a datasource V2 that supports both structured streaming and batch jobs, batch jobs respect a specific configuration, let's say, URL to connect and fetch data (which end users might not be aware of); however, structured streaming ends up with not supporting this (and should explicitly be set into options). Unit tests were added. Closes apache#22462 from HyukjinKwon/SPARK-25460. Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Test build #96480 has finished for PR 22529 at commit
|
Test build #96484 has finished for PR 22529 at commit
|
retest this please |
Test build #96486 has finished for PR 22529 at commit
|
var tempReader: MicroBatchReader = null | ||
val schema = try { | ||
tempReader = s.createMicroBatchReader( | ||
Optional.ofNullable(userSpecifiedSchema.orNull), | ||
Utils.createTempDir(namePrefix = s"temporaryReader").getCanonicalPath, | ||
options) | ||
dataSourceOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this part is the difference, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup. the conflict looks mainly because of renaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
cc @cloud-fan and @gatorsmile |
LGTM |
Merged to |
… SessionConfigSupport ## What changes were proposed in this pull request? This PR proposes to backport SPARK-25460 to branch-2.4: This PR proposes to respect `SessionConfigSupport` in SS datasources as well. Currently these are only respected in batch sources: https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L198-L203 https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L244-L249 If a developer makes a datasource V2 that supports both structured streaming and batch jobs, batch jobs respect a specific configuration, let's say, URL to connect and fetch data (which end users might not be aware of); however, structured streaming ends up with not supporting this (and should explicitly be set into options). ## How was this patch tested? Unit tests were added. Closes #22529 from HyukjinKwon/SPARK-25460-backport. Authored-by: hyukjinkwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Thank you, @HyukjinKwon and @cloud-fan . |
Thanks, @dongjoon-hyun and @cloud-fan. |
What changes were proposed in this pull request?
This PR proposes to backport SPARK-25460 to branch-2.4:
This PR proposes to respect
SessionConfigSupport
in SS datasources as well. Currently these are only respected in batch sources:spark/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
Lines 198 to 203 in e06da95
spark/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala
Lines 244 to 249 in e06da95
If a developer makes a datasource V2 that supports both structured streaming and batch jobs, batch jobs respect a specific configuration, let's say, URL to connect and fetch data (which end users might not be aware of); however, structured streaming ends up with not supporting this (and should explicitly be set into options).
How was this patch tested?
Unit tests were added.