-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36680][SQL] Supports Dynamic Table Options for Spark SQL #41683
Conversation
As there's a lot of rules in analyzer, please let me know if there's a better place/rule to fit this. Note: parser does not seem to support TVF in write relation, but at least it works in read relation. We should hopefully find a way to support write options for relations as well. |
cc @cloud-fan |
@@ -1118,6 +1122,38 @@ case class Range( | |||
} | |||
} | |||
|
|||
@ExpressionDescription( | |||
usage = "_FUNC_(identifier: String, options: Map) - " + | |||
"Returns the data source relation with the given configuration.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/configuration/options
Hm , just changed the UDF description in last update, build failure doesnt seem related:
|
Rebase to try to fix test |
@cloud-fan When you have a moment, could you please take a look at this PR to see if the approach is OK with you? Thanks a lot! |
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rebase this PR, @szehon-ho ?
Rebased, thanks for taking a look @dongjoon-hyun |
note: now TVF parameters can be table, so we can do BTW, how can we differentiate |
fe5b211
to
638058d
Compare
Done, latest modification makes the first argument accept table parameter ie, In this version, seems the Relation is already resolved, so all I can do is to patch the resolved Relations if they have options (DSV2). (versus before, they were all unresolved and can be loaded with options).
Sorry, not sure I understood this. |
638058d
to
f9a5cf7
Compare
I understand that this new syntax is to add a SQL API which is equivalent to |
e9a66d3
to
f9a5cf7
Compare
Hm, in the original implementation (1b63105), it did not differentiate and I made an UnresolvedRelation with options. That's the same as the DataStreamReader API. Actually after changing first argument to use TABLE(), because ResolveFunctions for TVFs expect the children to all be resolved (1), looks like with_options now gets ResolvedRelation as argument. DSV2Relation has options, but StreamingDSV2Relation does not have options, so now I can only apply options to the former. I am contemplating reverting back to the original implementation though, and not take TABLE() as argument, as it would be nice to pass options into the relation-resolution process itself, then that would cover both cases, any thoughts on that? |
Hi, @cloud-fan . Is your comment addressed? Any other concerns? |
To @szehon-ho . Could you rebase this PR to the master? The linter failure looks not a part of this PR.
|
Let's spend more time on the API design first, as different people may have different opinions and we should collect as much feedback as possible. Taking a step back, I think what we need is an SQL API to specify per-scan options, like TVF can only be used in the FROM clause, so a new SQL syntax may be better here. Inspired by the pgsql syntax, we can add a WITH clause to Spark SQL:
Streaming is orthogonal to this, and this new WITH clause won't conflict with it. E.g. we can probably do UPDATE: |
+1 on using a WITH clause.
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Thank you @cloud-fan for the nice suggestion, sorry for the long delay as I was on leave and busy with other things. The WITH syntax does seem to greatly simplify the logic, I made a new pr : #46707 and we can discuss there |
### What changes were proposed in this pull request? in Spark SQL, add 'WITH OPTIONS' syntax to support dynamic relation options. This is a continuation of #41683 based on cloud-fan's nice suggestion. That was itself a continuation of #34072. ### Why are the changes needed? This will allow Spark SQL to have equivalence to DataFrameReader API. For example, it is possible to specify options today to DataSources as follows via the API: ``` spark.read.format("jdbc").option("fetchSize", 0).load() ``` This pr allows an equivalent Spark SQL syntax to specify options: ``` SELECT * FROM jdbcTable WITH OPTIONS(`fetchSize` = 0) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test in DataSourceV2SQLSuite ### Was this patch authored or co-authored using generative AI tooling? No Closes #46707 from szehon-ho/spark-36680. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? in Spark SQL, add 'WITH OPTIONS' syntax to support dynamic relation options. This is a continuation of apache#41683 based on cloud-fan's nice suggestion. That was itself a continuation of apache#34072. ### Why are the changes needed? This will allow Spark SQL to have equivalence to DataFrameReader API. For example, it is possible to specify options today to DataSources as follows via the API: ``` spark.read.format("jdbc").option("fetchSize", 0).load() ``` This pr allows an equivalent Spark SQL syntax to specify options: ``` SELECT * FROM jdbcTable WITH OPTIONS(`fetchSize` = 0) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test in DataSourceV2SQLSuite ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46707 from szehon-ho/spark-36680. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? in Spark SQL, add 'WITH OPTIONS' syntax to support dynamic relation options. This is a continuation of apache#41683 based on cloud-fan's nice suggestion. That was itself a continuation of apache#34072. ### Why are the changes needed? This will allow Spark SQL to have equivalence to DataFrameReader API. For example, it is possible to specify options today to DataSources as follows via the API: ``` spark.read.format("jdbc").option("fetchSize", 0).load() ``` This pr allows an equivalent Spark SQL syntax to specify options: ``` SELECT * FROM jdbcTable WITH OPTIONS(`fetchSize` = 0) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test in DataSourceV2SQLSuite ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46707 from szehon-ho/spark-36680. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? in Spark SQL, add 'WITH OPTIONS' syntax to support dynamic relation options. This is a continuation of apache#41683 based on cloud-fan's nice suggestion. That was itself a continuation of apache#34072. ### Why are the changes needed? This will allow Spark SQL to have equivalence to DataFrameReader API. For example, it is possible to specify options today to DataSources as follows via the API: ``` spark.read.format("jdbc").option("fetchSize", 0).load() ``` This pr allows an equivalent Spark SQL syntax to specify options: ``` SELECT * FROM jdbcTable WITH OPTIONS(`fetchSize` = 0) ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Unit test in DataSourceV2SQLSuite ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#46707 from szehon-ho/spark-36680. Authored-by: Szehon Ho <szehon.apache@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This pr adds a SQL TVF (table-value-function):
with_options('table', map('foo', 'bar'))
Details:
Why are the changes needed?
Currently there is no way to dynamically configure individual DataSource relations in Spark SQL query.
This is a continuation of the effort in #34072, and based on the comment there #34072 (comment), to use a TVF instead of query hints.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Unit test in DataSourceV2SQLSuite